Atomic-level characterization of transport cycle thermodynamics in the glycerol-3-phosphate:phosphate antiporter

Membrane transporters actively translocate their substrate by undergoing large-scale structural transitions between inward- (IF) and outward-facing (OF) states (‘alternating-access' mechanism). Despite extensive structural studies, atomic-level mechanistic details of such structural transitions, and as importantly, their coupling to chemical events supplying the energy, remain amongst the most elusive aspects of the function of these proteins. Here we present a quantitative, atomic-level description of the functional thermodynamic cycle for the glycerol-3-phosphate:phosphate antiporter GlpT by using a novel approach in reconstructing the free energy landscape governing the IF↔OF transition along a cyclic transition pathway involving both apo and substrate-bound states. Our results provide a fully atomic description of the complete transport process, offering a structural model for the alternating-access mechanism and substantiating the close coupling between global structural transitions and local chemical events.

M embrane transporters are primary cellular gatekeepers that carry out active exchange of materials in and out of the cell using diverse sources of biochemical energy. Their central role in such essential physiological processes as neurotransmission and metabolism, and direct implication in a wide array of pathophysiological conditions have rendered them as a major class of drug targets 1 .
Major facilitator superfamily (MFS) transporters 2 constitute the largest superfamily of secondary transporters including several biomedically important proteins involved in pathophysiology of various diseases such as diabetes mellitus 3 and antibiotic resistance 4 . Highlighted as a structural and functional model for MFS transporters [5][6][7] , glycerol-3-phosphate (G3P) transporter (GlpT) 8 manifests the common MFS topology and consists of two bundles of transmembrane (TM) helices with pseudo-twofold symmetry (that is, the N-(NTD) and C-terminal (CTD) halves including TM helices H1-H6 and H7-H12, respectively) 5,8,9 . Inferred from the only available crystal structure of GlpT in the IF state 8 , a B10°rigid-body rotation of the two halves has been proposed to open the periplasmic side and close the cytoplasmic side, thereby generating the OF state 5,8,10,11 .
Under physiological conditions, GlpT facilitates uphill import of G3P using downhill export of inorganic phosphate (P i ) 12 ; Cytoplasmic binding of P i to the IF state presumably facilitates the IF-OF transition 5,8,10 , while replacement of P i by G3P on the periplasmic side induces the back transition of GlpT to the IF state through a reverse set of protein conformational changes 8,5,10 . In the absence of organic phosphates, GlpT mediates P i :P i exchange 13 , an observation that is exploited here to simplify the study of the transport mechanism. Kinetic studies have shown that the rate-limiting step in the transport cycle is the IF2OF conformational change, which is accelerated by substrate binding 14 . Based on the crystal structure 8 and previous biochemical and computational studies [14][15][16][17] , a putative binding site for the IF state has been established; however, the OF state and, more importantly, conformational changes of the protein and the binding site during the IF2OF transition remain elusive.
The scope of atomistic simulations, which are required for proper description of transport mechanism, is often limited to local conformational dynamics of individual states with known structures, unless enhanced sampling techniques are employed 18 . We have recently introduced an efficient computational framework for the study of large-scale conformational transitions, without compromising important atomic details, using a combination of several distinct enhanced sampling techniques 19,20 , further developed here into a practical approach for the study of membrane transporters.
A conformational transition pathway is reconstructed via iterative molecular dynamics (MD) simulations aimed at improving the sampling by using progressively more optimal reaction coordinates and more relaxed conformations. The main sampling tool used here is the bias-exchange umbrella sampling (BEUS) [19][20][21] . The success of the method when applied to complex structural transitions, however, relies on an extensive empirical search for mechanistically relevant collective variables using a large number of exploratory, short nonequilibrium simulations during the initial phase 19,20 . For relaxation of the pathways in high-dimensional collective variable spaces, we use a parallel implementation of string method with swarms of trajectories (SMwST) 22 . We use this method to further refine putative transition pathways that are generated by employing yet another path-finding algorithm, developed in this study, that is, post-hoc string method (PHSM), an analysis technique to extract an approximate minimum free energy path (or principal curve 23 ) from prior simulations. The PHSM algorithm plays a pivotal role in our iterative approach by allowing to extract, in an efficient manner, the most relevant set of conformations from prior BEUS simulation(s) to initiate a SMwST or another BEUS simulation.
Employing the novel approach discussed above, we have conducted an extensive computational study of the entire thermodynamic cycle of GlpT, which is aimed at describing not only the unknown OF state but also the entire conformational transition pathway between the IF and OF states both in the absence (apo) and in the presence (bound) of the transport substrate. Our results provide a detailed view of the large-scale conformational changes of GlpT that are closely coupled to chemical events within the lumen, specifically binding, translocation and unbinding of the substrate, suggesting that a complex, dynamic sequence of substrate-protein interactions is involved in the transport process. In particular, our results indicate state-specific substrate-binding modes for GlpT involving distinct binding residues in the IF and OF conformations. The binding site configuration thus undergoes a significant change during the IF2OF transition. We observe distinct structural transition pathways for substrate-bound and apo transporters indicating a central role for the substrate in the conformational transition mechanism of GlpT. These findings are in line with both alternating-access and rocker-switch mechanisms, although they suggest that the rocker-switch motion of the NTD and CTD bundles is accompanied by a relative twist of the two domains which represents a major mechanistic difference between the substrate-bound and apo IF2OF transitions.

Results
Reconstructing the transport cycle. The simulated thermodynamic cycle of GlpT involves four distinct functional states: OF-apo, OF-bound, IF-bound and IF-apo, denoted as OF a , OF b , IF b and IF a , respectively (Fig. 1a,b; also see Supplementary Movie 1). Starting from the IF a state, modelled based on the only available crystal structure of GlpT 8 , we set out to describe the transitions involved in the thermodynamic cycle shown in Fig. 1a. To design a sampling protocol that simultaneously takes into account the concerted motion of the protein and substrate binding/translocation/unbinding (which involves multiple binding residues as well as the substrate), we conducted an extensive set of simulations illustrated in Fig. 1c,d (and Supplementary Table 1) and discussed in detail in Methods section and Supplementary Note 1.
The task of finding the OF state was first done using apo GlpT simulations employing an empirical search for efficient biasing protocols 20 . The optimum protocol to generate the OF a state ( Supplementary Fig. 1a) involved imposing rotational changes on TM helices H1 and H7, followed by free energy calculations along the optimized IF a 2OF a transition pathway (Fig. 1c: simulation sets 1-3). In the second stage, two independent sets of free energy calculations were performed in the presence of P i , starting from the equilibrated IF a and OF a states to characterize the substrate binding/unbinding process, and to identify the IF b and OF b states (simulation sets 4/5 and 6/7, respectively). These two bound states were found to be distinct, not only in terms of the protein global conformation but, crucially, also in their local substrate-binding site configuration. The IF b 2OF b transition involves a B8-Å translocation of the substrate within the lumen along the z axis that requires significant local conformational changes within the biding site. Owing to its complexity, IF b 2OF b transition was thus studied using multiple sets of free energy calculations and path-finding algorithms in an iterative manner (simulation sets 8, 9, 10, 11 and 12).
Combining the optimum OF a 2OF b , OF b 2IF b , IF b 2IF a and IF a 2OF a transition pathways obtained from simulation sets 1-12, a closed curve in the ({Q}, Z P i ) space was built, representing a 'cyclic' pathway connecting OF a , OF b , IF b and IF a states of the GlpT:P i complex. Here Z P i represents the P i position along the membrane normal tracking the substrate binding, translocation and unbinding events, and {Q} is a multidimensional collective variable that describes the global conformation of the TM helices based on their orientation quaternions 19,20 . The cyclic transition pathway was discretized into 150 images (or windows) and sampled together in a single massive run (simulation set 13), to generate the results presented in Transport cycle thermodynamics.. The close substrate-protein cooperation towards facilitating the transport process is noticeable. While the transporter is a necessary element for catalysing the transport reaction, the substrate itself is needed to facilitate the conformational changes required for the process. The full-cycle free energy profile ( Fig. 1d; also see Supplementary Movie 1) is reminiscent of a catalytic reaction, in which the substrate facilitates the IF-OF interconversion of the transporter. This picture suggests an obligatory exchange mechanism in which the rates for apo transitions are considerably lower than the corresponding rates in the substrate-bound form (as qualitatively assessed from the free energy barriers). For instance, during the IF a -OF a transition, substrate binding lowers the barrier by 3.52 ± 0.92 kcal mol À 1 (mean ± s.d.; see Analysis techniques in Methods section for details), thereby facilitating the interconversion of the two states. The free energy of the OF state is 2.05±0.88 (2.88±0.59) kcal mol À 1 higher than that of the IF state in the apo (P i -bound) GlpT-which is consistent with the fact that GlpT crystallizes favourably in the IF state 8 (assuming crystal contacts have not altered the resting state conformation). The 0.82±1.06 kcal mol À 1 shift in the IF-OF free energy difference between the apo-and P i -bound GlpT is not as significant as the shift in the transition barrier. Similarly, substrate-binding free energies of the IF and OF states (DDG IF a !OF a ¼ À 5:43 AE 0:58 and DDG IFa!OF b ¼ À 4:61 AE 0:88 kcal mol À 1 , respectively) are not significantly different. We note that the free energy differences are reported based on the states determined at the corresponding free energy extrema (that is, images 5, 46, 55, 72, 102 and 122 for OF a , OF b , TS b , IF b , IF a and TS a , respectively),  ARTICLE which do not take into account the shape of the free energy landscape. In addition, certain terms such as the entropic gain of 'free' substrate in the bulk are ignored in our binding free energy calculations. The phosphate is constricted in all simulations to stay within a cylinder centred along the pore. All apo free energies thus must be shifted down by B0.1 kcal mol À 1 (as estimated based on the protocol discussed in ref. 24). A more rigorous treatment is however necessary to calculate the standard free energy of binding 25 .
Global protein conformational changes. With a putative transition pathway which, due to extensive sampling, is relaxed and reversible, we can characterize both global and local protein conformational changes along the pathway and substantiate the mechanism of their coupling. Figure 2, for instance, shows how the peri-and cytoplasmic conformational changes are coupled (also see Supplementary Fig. 2). TM helices H1 and H7 are directly involved in the opening/closing (gating) of the periplasmic side while TM helices H4, H5, H10 and H11 are more directly involved in gating of the cytoplasmic side. Relative rotational change of H1 and H7 helices along the cycle is plotted against that of H5 and H11 helices in Fig. 2c (also see Supplementary Fig. 3). In the apo simulations, these angles are correlated almost linearly indicating a coupling between the opening/closing of the peri-and cytoplasmic sides, a hallmark of the proposed rocker-switch mechanism in GlpT 8 . In the P i -bound simulations, the behaviour deviates from an ideal rocker-switch model. Note that the interhelical angles are defined based on the roll axes of the helices and may not fully capture their complex nonlinear behaviour (for example, H7 kinking as illustrated in To track the concerted movement of the TM helices, {Q} was projected onto its principal components. The first two components, QPC 1 and QPC 2 , collectively account for B78% of the variance in {Q} of all sampled conformations ( Supplementary  Fig. 5). Having a large ensemble of 'reweighted' samples, we can reconstruct the potential of mean force (PMF) in a lowdimensional space such as (QPC 1 ,QPC 2 ) as shown in Fig. 2d, which clearly illustrates how the apo and P i -bound GlpT states take different pathways during the IF2OF transition. QPC 1 and QPC 2 also provide information the common and different aspects of the concerted conformational changes along the two transition pathways, respectively. QPC 1 roughly describes the concerted motion of the NTD and CTD domains in the opposite directions perpendicular to their pseudosymmetry plane, while QPC 2 describes a motion parallel to this plane ( Supplementary Fig. 5). Change in QPC 1 results in a rocker-switch type motion of NTD and CTD bundles, which is accompanied by a twisting motion of the two domains if QPC 2 changes as well. While during the apo IF2OF transition, the change in the (QPC 1  relatively smooth, resembling a rocker-switch type movement combined with a slight twisting motion, the P i -mediated IF2OF transition is stage-wise involving distinct motions induced by P i binding/unbinding in the cyto-and periplasmic sides.  Table 3. Given the involvement of distinct IF2OF transition pathways for the substrate bound and apo (as clearly illustrated in Fig. 2d), one may postulate a substrate-specific mechanism for IF2OF transition, for example, for different organic and inorganic phosphates-in line with recent observations for another MFS transporter 29 .
Alternating-access mechanism. The global structural transitions of GlpT are coupled to mechanistically relevant localized events within the lumen, most importantly to gating motions that control binding site accessibility. As a proxy for substrate accessibility, we monitored the pore radius (and hydration) along the lumen to characterize the details of the alternating-access mechanism ( Fig. 3; also see Supplementary Figs 7 and 8 and Supplementary Movie 1). In addition to the peri-and cytoplasmic constriction regions where gating takes place, a central narrow region, termed 'central bottleneck', was also identified within the lumen, which, interestingly, is persistent even in the absence of the substrate. While monitoring the orientation of TM helices during the transition (Fig. 2c) indicates a rocker-switch mode 8 of conformational change for apo GlpT, examining the luminal radius profile (Fig. 3e) suggests an almost ideal sequential gating mode 30 , thereby revealing that both global and side-chain conformational changes are involved in the process, and the rocker-switch and sequential gating models are consistent.
During the P i -bound IF2OF transition, which is of greater functional relevance, the behaviour of the system deviates from that described by either ideal model (as also reported recently for a GlpT homolog 30 ). Substrate binding from either side of the membrane results in a conformation occluded from both sides, which remains occluded to substrate during the entire IF b 2OF b transition. The IF b and OF b states along with all the conformations visited during the IF b 2OF b transition, therefore, form a relatively large 'occluded region' (in contrast to a narrowly populated single state) in the conformational space, thereby ensuring an effective alternatingaccess mechanism (Fig. 3e). Note that some of the conformations occluded to substrate could be permeable to water; however, the leakiness of transporters such as GlpT to water does not interfere with the alternating-access mechanism required for their function 31 .  Figure 4 illustrates the coupling between substrate translocation and global conformational changes of the protein, along with the luminal residues that interact most strongly with the substrate. R45 is the main binding residue in both IF-and OF-bound states and retains its contact with the substrate during the IF b 2OF b transition. This observation is consistent with the experimental results that R45K mutation abrogates both substrate binding and transport 15 . Employing a nonequilibrium alchemical free energy calculation scheme 32,33 , we quantified the free energy changes of four major simulated states (OF a , OF b , IF b and IF a ) due to the R45K mutation. The change in the binding free energy of GlpT due to mutation was estimated to be 4.40±0.89 kcal mol À 1 (mean ± s.d.; see Nonequilibrium alchemical free energy calculations in Methods section for details), which suggests a significant decrease in binding affinity for P i ( Supplementary  Fig. 9). Our alchemical free energy calculations similarly indicate a substantial increase in conformational free energy difference of  IF and OF states (5.45±0.62 and 3.25±0.87 kcal mol À 1 for apo and bound states; Supplementary Fig. 9c,f) 15 . Interestingly, our simulations reveal that the direct involvement of these residues (except for R45) with the substrate occurs at different stages of the transport cycle, that is, at distinct functional states. Therefore, IF b and OF b states are characterized not only with distinct global structures but also with distinct local binding configurations. Such state-specific substrate-binding modes can be highlighted, for example, by a B8-Å displacement of the substrate within the lumen along the z axis during the IF b 2OF b transition (Fig. 1b). In addition to the charged residues, several tyrosine side chains make contact with P i ; Y270 and Y38 strongly interact with the substrate in the OF b and IF b states, respectively, while binding/unbinding of Y76 appear to coincide with crossing of the IF-OF transition state. H165 is another residue that interacts with the substrate, particularly around the transition state ( Supplementary Fig. 3h). H165 has been proposed to play an important functional role since its mutation to a proline severely affects both binding and transport 15 . Our analysis of H165 reveals that this residue undergoes a conformational change around the IF-OF transition state, which is clearly detectable by measuring its side-chain dihedral angle w 1 revealing that H165 side-chain flips around the IF-OF transition state in both apo and bound states ( Supplementary Fig. 3f,g). Although previous MD simulations, based on the IF conformation only, have indicated that H165 protonation can tighten substrate binding 15 Fig. 10).

Discussion
We have introduced a novel computational approach for characterizing complex structural transitions in membrane transporters, that allowed for atomic-level reconstruction of an entire thermodynamic cycle and its free energy profile in an MFS transporter. The wealth of detailed information provided by the performed simulations can be used to design several new experiments to test various aspects of the transport cycle and even trap and structurally characterize the putative outwardfacing and/or intermediate states of GlpT.
For instance, one may design site-directed cross-linking experiments to trap the structurally unknown OF state, based on our finding that helices H5 and H11 undergo significant conformational changes during the IF-OF transition and are among the primary elements contributing to the occlusion of the cytoplasmic gate. In particular, using a 8-Å C a -C a distance as an approximate measure for cross-link formation we have identified several residue pairs that are distant in the crystal structure (IF state) but form contacts in the OF state. Some of the specific residue pairs located on the cytoplasmic half of helices H5 and H11 that form contacts in the OF state include V158-G382, V158-A385 and I157-G382, which are too far from each other in the IF state (C a -C a distances of 14-16 Å in the crystal structure). In addition, G142-G386, V146-G382 and R143-G382, are among the residue pairs on helices H4 and H11 that also reduce their distances on transition to the OF state and can form cross-links in this state. In the OF conformation, helices H5 and H8 form new contacts as well, for example, between S159 and G310 whose C a atoms are B12 Å apart in the crystal structure (IF state). A similar behaviour is also detected for several other residue pairs; however, some of these pairs also form contacts on substrate binding while the transporter is still in the IF conformation (for example, V158 and G369 on helices H5 and H10, respectively).
We note that our approach can be generalized to consider the involvement of other ligands in the transport cycle. In GlpT, for instance, G3P binding and unbinding events as well as the G3P-bound IF2OF transition can be studied using a protocol similar to that used here for inorganic phosphate. This study and the powerful approach introduced open opportunities for the study of MFS and other membrane transporters in their full chemical detail using enhanced sampling techniques and state-of-the-art supercomputing.

Methods
Computational approach overview. Here we will provide an overview of our computational approach, define the collective variables used in the simulations and briefly describe the sampling protocols. A brief description of the alchemical free energy calculations, PHSM, orientation-based biasing and analysis techniques used in this study are also provided. In the end, the theoretical framework of our study is discussed. More details on the sampling protocols, alchemical free energy calculations, PHSM and data analysis are provided in Supplementary Notes 1-4, respectively.
Characterizing large-scale structural transitions of membrane transporters without compromising the full-atomic description of these systems and their environments poses a major challenge to computational techniques due to the prohibitively long timescales involved in the process. Recognizing this issue, we have recently developed an efficient computational protocol towards describing large-scale conformational transitions using a combination of several distinct enhanced sampling techniques, some of which are discussed in our recent work on a primary transporter 19,20 .
Ideally, MD simulations can be used to characterize thermodynamic and kinetic properties of biomolecular processes, given adequate sampling. However, conventional integration techniques and sampling tools such as regular MD are far from ideal for the study of complex biomolecular systems and processes due to the large number of d.f. involved. Biased MD simulations provide a more practical alternative to brute-force MD for thermodynamic characterization of such complex systems. Thus, a time-dependent or time-independent biasing potential is used along with a post-hoc reweighting scheme to recover the unbiased statistics. Designing such biasing protocols, however, is quite challenging, given the dimensionality and complexity of the phase space in biomolecular systems. One particular simplification which makes biased MD relevant for the study of structural transitions is the premise of the existence of a low-dimensional manifold on which lie most of the relevant conformations visited by a system during a transition 37,38 (see Theoretical framework section). Many enhanced sampling methods such as those used in the present study rely on this assumption that is mostly based on the high cooperativity of structural elements in biomolecules and their concerted motions. However, such a manifold or reaction coordinate is not known a priori, and poor choice of a reaction coordinate for biasing could result in irrelevant transition pathways and misleading free energy profiles. The novel sampling strategy presented in this work is designed specifically to address this problem.
The majority of the simulations performed here were based on the BEUS scheme 21,19,20 , a umbrella sampling 39 method combined with Hamiltonian replica exchange 21 . In this scheme, biases are occasionally exchanged between the neighbouring windows (umbrellas), thereby increasing the chance of continuous sampling in the orthogonal d.f. that is necessary for a reliable free energy estimate. Despite its powerful sampling capability, BEUS also relies on the relevance of reaction coordinate it uses. For complex conformational free energy landscapes, an iterative sampling approach seems necessary in which the results of each iteration are used to design better reaction coordinates, to generate better initial conformations, and to better sample the relevant regions of the configuration space 20,40 . String method with SMwST 22 or other path-finding algorithms can be used in parametrization of narrow pathways defined in high-dimensional spaces to be sampled robustly in following BEUS simulations. We have implemented a parallel version of the SMwST method in which each image consists of multiple parallel copies, restrained and released iteratively with periodic exchange of information between the replicas to update the image centres at the end of each iteration. A similar but independent implementation has been previously reported elsewhere 41 . While SMwST is ideally assumed to converge to the most probable pathway, it heavily relies on the relevance of its initial pathway which is provided by other methods (for example, targeted MD 42 ). An algorithm to extract the most relevant pathway from a prior data set is desired to provide an optimized starting point for computationally demanding SMwST simulations. To extract an approximate minimum free energy path from samples generated by prior BEUS simulations NATURE COMMUNICATIONS | DOI: 10.1038/ncomms9393 ARTICLE (or any other sampling protocol, given a reweighting scheme is available) we have developed a non-parametric analysis technique, namely PHSM which is reminiscent of the finite-temperature string method 23 . The main difference between PHSM and other string methods such as finite-temperature string and SMwST is that PHSM uses an existing set of samples to extract a transition pathway. The PHSM path extracted from BEUS simulations could be further relaxed using SMwST 22 . Once the SMwST string is converged one may perform an extensive one-dimensional (1D) BEUS simulation using the image centres as the window centres. We thus use terms image and window interchangeably in this paper. Also note that, for BEUS simulations, we use one copy/replica per window/image, while for SMwST we use multiple copies/replicas per image.
Collective variables. Our approach to design optimal reaction coordinates and biasing protocols is based on an empirical search, which relies on our knowledge of the system under study to limit the conformational sampling to the relevant regions of the phase space while keeping the calculations reliable and accurate. We design several mechanistically relevant, system-specific reaction coordinates whose usefulness and applicability to induce the transition of interest are examined using qualitative assessments along with nonequilibrium work measurements 20,43,44 . The approach provides an empirical framework for optimizing the biasing protocols in a series of short simulations. By using advanced system-specific biasing protocols, we can significantly improve the effectiveness of the search protocol in sampling complex transition pathways 19,20,45 .
To induce the IF a -OF a transition, we initially defined several collective variables based on RMSD or orientation quaternions of different helices, bundles of helices or their linear combinations. Eventually, our initial IF a -OF a pathway was generated using orientation quaternions Q 1 and Q 7 defined for TM helices H1 and H7, respectively. The reference structure was the initial IF a * conformation (modelled based on the crystal structure of GlpT and equilibrated in membrane; see Sampling protocol section). In most other simulations including the final set, results of which are provided in the main paper, we use a set of 20 orientation quaternions (denoted as {Q} for brevity) including Q i and Q i 0 ,i ¼ 1,y5, 7,y,11 in which Q i and Q i 0 are orientation quaternions of ith TM helix with respect to our IF b and OF b models, selected from simulation sets 4 and 6, respectively (see Fig. 1c,d and Supplementary Table 1). For an ideal rigid-body conformational change, using two different reference structures would be redundant; however, some helices undergo bending and kinking motions, thereby making a singlereference quaternion a poor choice (see Supplementary Fig. 3 for an example of a kinking motion observed in the simulations). To track the position of the substrate with respect to the protein along the lumen, we define Z Pi as z Pi A is the z position of atom A in residue B, and DZ ¼ 5:46 is a constant. Z Pi À DZ is the z component of the vector connecting the center of mass of a select number of C a atoms of protein (within the lumen) to the phosphorus atom of the P i . DZ does not play any role in the biasing potential since it is a constant; however, it was used to shift the results in all the plots such that Z Pi approximately represents the z position of the P i with respect to the center of mass of the protein. Defining DZ as ðz R45 C a þ z H165 C a þ z R269 C a Þ=3 À z com , in which z com is the z component of the center of mass of the C a atoms of the protein, DZ can be estimated a posteriori as the mean value of DZ from all the sampled conformations with a s.d. of B0.56 Å. ðz R45 C a þ z H165 C a þ z R269 C a Þ=3 is thus relatively stable with respect to z com and provides a numerically efficient alternative for the calculation of Z Pi % z Pi P À z com , that is, the relative position of the substrate with respect to the protein projected onto z. The collective variable set ({Q},Z Pi ) defines a coarse configuration space describing the conformation of the GlpT:P i complex. In addition, in simulation set 10, we used an RMSD-based collective variable DRMSD ¼ RMSD IMb À RMSD OFb in which RMSDs are defined based on the C a atoms of TM helical regions with respect to OF b and IM b conformations, selected from simulation sets 6 and 8, respectively (see Fig. 1c,d and Supplementary Table 1). Note that IM b is a P i -bound conformation, representing a local minimum in simulation set 8 (see Supplementary Figs 11 and 12).
Sampling protocol. In our previous studies 16,17 , we had prepared a membraneembedded model of the apo GlpT in the IF state from the crystal structure of GlpT (PDB: 1pw4) 8 and equilibrated the full-atomic model in an explicit POPE (1-palmytoil-2-oleoyl-sn-glycero-3-phosphatidylethanolamine) lipid bilayer and TIP3P (ref. 46) solvent environment. We used this equilibrated conformation of apo GlpT (denoted as IF Ã a ) as the initial conformation for the rest of the simulations which, for consistency, used similar simulation conditions to our previous simulations (see refs 16,17 for details). The protonation states of all titratable residues in the crystal structure were assigned based on pK a calculations using PROPKA 47 and the protonation sites were determined using MolProbity 36 . The simulations were performed with NAMD 2.8-10 (ref. 48). A brief description of the protocols used in simulations listed in Fig. 1c,d and Supplementary Table 1 is given here, followed by a more detailed description of the parameters involved in the protocols such as the window centres and force constants of the BEUS simulations and the definition of metric in the SMwST and PHSM algorithms.
First we aimed at generating a reliable representation of the OF state. In brief, after trying several different biasing protocols, we identified the orientation quaternions of TM helices H1 and H7 (Q 1 ,Q 7 ) as a set of collective variables capable of inducing the IF-OF transition in apo GlpT. Imposing a rotational change on these helices in a direction perpendicular to the pseudosymmetry plane induces a global conformational change in the protein resulting in a locally stable conformation, which is open to the periplasm and closed to the cytoplasm (OF state). The resulting trajectory was used for an iterative optimization process using: (i) 1D BEUS along the nonequilibrium pathway defined in the (Q 1 ,Q 7 ) space (simulation set 1); (ii) SMwST using collective variable {Q} (simulation set 2); and (iii) 1D BEUS along the converged SMwST path defined in the {Q} space (simulation set 3). In the second stage we performed several sets of BEUS simulations based on equilibrated IF a and OF a states by adding the substrate (an inorganic divalent phosphate P i À 2 , denoted as P i in this paper) to the system to identify IF b and OF b states. The BEUS simulations for both IF and OF states were performed using reaction coordinate Z Pi (simulation sets 4 and 6), followed by a second set of BEUS simulations in the ({Q},Z Pi ) space initiated from PHSMgenerated initial pathways (simulation sets 5 and 7). Finally, the most complex process (within the context of our computational approach) involved the IF b 2OF b transition. The complication was particularly due to the fact that the large-scale conformational change of GlpT is accompanied by a B8-Å translocation of the substrate and a significant local conformational change within the biding site. To find the optimum pathway coupling the global conformational changes to substrate translocation within the binding site, we performed 1D BEUS simulations along (Q 1 ,Q 7 ) and Z Pi spaces, followed by a two-dimensional BEUS in (DRMSD, Z Pi ) space (simulation sets 8, 9 and 10). A minimum free energy pathway was extracted from the generated data using PHSM which was further relaxed using SMwST simulations in the ({Q},Z Pi ) space (simulation set 11). The converged SMwST path was used to initiate a 1D BEUS simulation in the ({Q},Z Pi ) space (simulation set 12). The final set of simulations combined the most relaxed pathways discussed above (extracted using PHSM) to sample along the optimum cyclic pathway in the ({Q},Z Pi ) space connecting OF a , OF b , IF b and IF a states of the GlpT:P i complex (simulation set 13). Note that in the final set (unlike other simulation sets), the apo GlpT is modelled by restraining the substrate outside the protein at a particular Z Pi (that is, B À 35 Å). P i is also restrained in all simulations to stay within a cylinder of radius 15 Å (aligned with the membrane normal and centred in the lumen). The entropic gain of 'free' P i in the bulk is therefore ignored in our free energy calculations (Fig. 1e). In addition, the free energy differences are reported based on their values at extrema, which do not take into account the shape of the free energy landscape. A more detailed description of the parameters involved in these simulations is provided in Supplementary Notes 1.
Nonequilibrium alchemical free energy calculations. We have calculated relative binding and conformational free energies associated with GlpT R45K mutation using bidirectional fast-growth thermodynamic integration (FGTI) 32,33 on four major simulated states of GlpT:Pi complex including OF a , IF a , OF b and IF b (Supplementary Fig. 9). First we equilibrated the four representative systems associated with these minima generated from simulation set 13 and extracted using the PHSM algorithm, each for 100 ns (Supplementary Fig. 1f-i). The last 50 ns of these unbiased equilibrium simulations were used to initiate the forward wild type (WT) to R45K transformations. In all, 250 initial conformations (collected every 200 ps) were selected to build dual-topology models for R45K mutation using VMD (ref. 49) plugin Mutator (version 1.3). Each model was then minimized for 1,000 steps and equilibrated for 20 ps before a 100-ps-long FGTI simulation performed using FEP module in NAMD 2.10 (ref. 48). A scaled-shifted soft-core potential 50 was used for van der Waals interactions, electrostatic interactions were scaled for outgoing and incoming atoms in two non-overlapping stages, and only nonbonded interactions of perturbed atoms were scaled with their environment.
Since the coupling parameter l was varied at every step, all forces were also calculated at every step; however, the rest of the simulation parameters were the same as those used for other MD simulations in this study (see Sampling protocol section for details). For the reverse R45K to WT (or K45R, for brevity) transformations, the snapshots at t ¼ 50 ns from unbiased equilibrium simulations shown in Supplementary Fig. 1f-i were used to build a R45K mutant, which was then minimized for 1,000 steps and equilibrated for 5 ns before a 50-ns production run to generate 250 conformations which were then minimized, equilibrated and transformed using FGTI scheme similar to the forward simulations. For each FGTI simulation, the nonequilibrium work, W was calculated as a sum over dH dl dl values reported at every step. The free energy difference due to R45K mutation was estimated for each state using forward and reverse work distributions and employing both Crooks-Gaussian intersection 51 and Bennett acceptance ratio 52,53 methods. A replacement-based bootstrapping technique was used for the error analysis in Crooks-Gaussian intersection method while an analytic expression was employed for estimating the error in Bennett acceptance ratio method 53 . See Supplementary Note 2 for a discussion on how relative binding and conformational free energies were estimated based on relative free energies of individual states between WT protein and R45K mutants.
Post-hoc string method. Here we introduce PHSM, an analysis method to extract, from an existing sample set, a principal curve defined in a multidimensional collective variable space f, whose intersection with any perpendicular hyperplane coincides with the Boltzmann-averaged center of that hyperplane. We assume that a set of samples {f t } are generated from a single or multiple BEUS simulations, representing a continuous region of the f space. The weights {w t } associated with ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms9393 samples are assumed to approximately represent the Boltzmann distribution in the sampled region. A metric is defined in the f space to measure the distance/closeness between conformations. The choice of f space (and its metric) is based on our analysis of the sampled data to include all slow d.f. including (i) those used in prior BEUS simulations and (ii) those detected as undersampled, slow d.f. not included in those simulations. The metric provides a way of prioritizing different d.f. in ensuring their continuity in the converged pathway. We use an algorithm similar to 'k-means' clustering to find the optimum centroidal Voronoi tessellation 54 along a pathway connecting two given points in the f space in a non-parametric manner. We start with an initial pathway that is represented by a string of N images with centres {f i } (iteration 0). At each iteration, these centres are updated according to the following algorithm: (i) Voronoi cell B i is built for each image which is a set of all samples being closer to f i than any other image center: in which E is a parameter which determines the thickness (or localization) of the transition tube; (ii) the Boltzmann-averaged center of each cell f i is determined by averaging over f t of all B i samples with their appropriate weights; and (iii) the averaged centres are used to parametrize a Bézier curve fðsÞ ¼ P and N equidistant points are specified along the curve between f(0) and f(1) to be used as new image centres {f i } for the next iteration. Steps (i-iii) will be repeated until the image centres are converged to a string representing the principal curve that is an approximate minimum free energy path. The closest sampled conformation to each image center may be used to reconstruct a trajectory to be used as an initial pathway for follow-up BEUS or SMwST simulations (see Supplementary Fig. 13 for examples of the application of the PHSM algorithm). Particular parameters used in our PHSM analysis are provided in Supplementary Note 1 while a more detailed description of PHSM algorithm is provided in Supplementary Note 3.
Orientation-based biasing. One particular feature that seems to best describe a variety of large-scale conformational changes is semi-rigid-body orientation change of protein structural elements or domains. The orientation quaternion technique [55][56][57] has proven successful as a practical method for defining well-behaved collective variables, specifically aimed at inducing inter-domain orientation changes 19,20 . Quaternion Q ¼ (q 0 , q 1 , q 2 , q 3 ) can be considered as a composite of a scalar q 0 and an ordinary vector q ¼ (q 1 , q 2 , q 3 ), or as a complex number Q ¼ q 0 þ q 1 i þ q 2 j þ q 3 k with three different imaginary parts. The optimal rotation to superimpose one set of coordinates on another can be described by a unit quaternion, that is, orientation quaternion Q ¼ ðcos y 2 ; sin y 2û Þ in which y andû (a unit vector) are the optimum angle and axis of rotation, respectively.
As a collective variable, an orientation quaternion can be used to induce a rotational change on a given domain (for example, a helix or a bundle of helices) or simply restrain its orientation by using the harmonic potential: in which Q(x) is the orientation quaternion of a set of N atoms (x ¼ {x i : i ¼ 1,y,N}) with respect to a reference set, and Q c , the center of the harmonic potential, could be time-independent (for restraining) or time-dependent (for inducing a transition). O(P,Q) is the length of the geodesic between two points on the unit sphere, transformed by orientation quaternions P and Q from an arbitrary point on the same sphere. An approximate estimate for O(P,Q) (which is used in NAMD/LAMPS implementation 57 ) is arccos(|P Á Q|) in which P Á Q is the inner product of P and Q.
Note that for small O, that is, small deviation from the harmonic center-which is relevant in the stiff-spring approximation-one can show O(P,Q) 2 E|P À Q| 2 thus relation (2) reduces to UðQÞ % k 2 Q À Q c j j 2 .
Analysis techniques. The configurations were collected every 5 ps from MD trajectories and analysed using VMD 49 Fig. 1d was estimated to be smaller than 0.3 kcal mol À 1 for all points, which is also smaller than the statistical error estimated from the bootstrapping algorithm (see Theoretical framework). In our analysis of the final simulation set, an unweighted estimator was used to track the average behaviour of the transition pathway projected onto different spaces with reported errors taking into account the statistical inefficiency of the correlated data points. Statistical inefficiencies were estimated based on the autocorrelation times such as those reported in Supplementary Table 2. Note that the autocorrelation times are calculated based on the continuous trajectories associated with replicas rather than reconstructed trajectories based on images, thereby providing an upper bound for the actual autocorrelation times of images.
Theoretical framework. Suppose that the dynamics of a high-dimensional atomic system {x} can be simplified as an effective dynamics in a coarse variable space f. The effective dynamics can be described by a Brownian motion in the f space with an effective potential energy G(f) and diffusion tensor D(f). The former is the PMF of the atomic system in the f space and the latter is generally position-dependent and anisotropic 60 . One may sample the regions around a given point g in the f space by adding a biasing term to the potential of the atomic system such as U g ðf t Þ ¼ k 2 ðf t À gÞ 2 in which f t is the instantaneous value of collective variable f at time t and k is the force constant. The free energy of the biased system (or the perturbed free energy) F(g) is: Generalizing the formula in ref. 61, one can show that the perturbed free energy at F(f) and the PMF G(f) are related via: For large k, that is, in the stiff-spring approximation 62 , one may expand the above relation to extract the first two terms in 1/k (ref. 61): Thus, for large force constants, the PMF can be approximated using the perturbed free energy F(f). The validity of this approximation can be tested by a posteriori comparison of the two terms, assuming the gradient and Laplacian of the perturbed free energy are estimated as well-which is numerically challenging in a highdimensional space. Ideally, one may use a 1D collective variable for defining the effective dynamics as well as the biasing protocol. In practice, however, this may only be possible for extremely simple systems. A practical solution to this problem is to keep the collective variable space multidimensional, while sampling only around a particular pathway, represented by a 1D curve f(s), parametrized by s. The choice of the pathway is obviously crucial here and determines the relevance of the free energy results to the transition of interest. Several path-finding algorithms have been proposed which iteratively/adaptively refine an initial pathway to converge to a final pathway satisfying a given criterion, for example, by minimizing the free energy or maximizing the flux 22,23,63-65 . Among them is SMwST 22 that is used in this study as well. We have also developed a novel algorithm, that is, PHSM, for extracting pathways from the results of prior simulations to initiate other simulations, a well-suited algorithm for iterative simulations in which a good set of reaction coordinates are not known a priori.
Assuming f(s) approximately represents the minimum free energy path, and s is its arc length, relation (5) can be simplified to: in which F(s) is, up to an additive constant, the free energy associated with the system perturbed by biasing potential k 2 ðf t À fðsÞÞ 2 , and d 2 ds 2 FðsÞ s¼s 0 À d 2 ds 2 FðsÞ s¼s 00 j is assumed to dominate r 2 f FðfÞ f¼fðs 0 Þ À r 2 f FðfÞ f¼fðs 00 Þ . Under this assumption, the validation of stiff-spring approximation requires the evaluation of F(s) and its first and second derivatives with respect to the arc length s. To numerically estimate F(s), one may use umbrella sampling 39 to discretize s and define N umbrella windows/images with biasing potentials U i ðf t Þ ¼ k 2 ðf t À fðs i ÞÞ 2 for i ¼ 0,y,N À 1. This scheme can be thought of as a 1D umbrella sampling along the reaction coordinate s with an additional restraint on the (shortest) distance from the f(s) curve. Perturbed free energies F i ¼ F(f(s i )) can be estimated (up to an additive constant) by self-consistently solving the equations 66,67 : in which P t sums over all collected samples (irrespective of which replica or image they belong to) and T j is the number of samples collected for image j.
With appropriate reweighting, PMF can be reconstructed in any arbitrary collective variable space, given sufficient sampling in that space. w t , the unnormalized weight of configuration x t can be estimated via ref. 66: in which {F i } are estimated via equation (7). Alternatively 66 , one may estimate {w t } and {F i } by iteratively solving equation (8) and: The PMF in terms of n(x), an arbitrary collective variable, is estimated (up to an additive constant) as: in which K is a kernel function. The above estimator is not accurate if the sampling in n(x) is not converged which is the case if n(x) has a slow dynamics and is not strongly correlated with f. For the special case of n ¼ f, the perturbed free energies {F i } can be used directly to estimate the PMF in the stiff-spring approximation. The reweighting schemes discussed so far are based on maximum-likelihood (or maximum a posteriori) estimation, while in this study we have used a Bayesian estimation method to avoid overfitting 68 . One may use a Gibbs sampler to draw {w t } and {f i } (f i : ¼ T i exp(bF i )) from: ; P i f i e À bUiðf t Þ Þ; f i $ GðT i ; P t w t e À bUiðf t Þ Þ: in which {w t } is normalized after each iteration and xBG(a,b) indicates that x is drawn from a Gamma distribution of shape a and rate b (that is, p(x;a,b)px a À 1 exp( À bx)). On convergence, any ({w t },{F i ¼ log(f i /T i )/b}) drawn is considered equally likely and can be used for free energy or PMF reconstruction. The mean and its standard error can be used to estimate any quantity of interest and its associated error. However, the error will be underestimated if the samples are correlated. A Bayesian block bootstrapping technique can be used to efficiently estimate the error associated with such correlated data points as already discussed elsewhere 20,69 . The reweighting scheme described above is general for any arbitrary set of biasing potentials; however, to approximate G(f(s i )) by {F i } and to examine the stiff-spring approximation by evaluation of the second term of the expansion in relation (6), and more importantly to relate G(f(s i )) to the kinetics even qualitatively, one needs to make an assumption that f(s i ) is an approximation of the minimum free energy path. Assuming the above (maximum-likelihood or Bayesian) estimators result in a smooth function for F(s), the first and second derivatives can be numerically estimated via finite difference methods from {F i } to estimate the second stiff-spring approximation terms.
Finally, for averaging an arbitrary quantity A(x) along the pathway f(s), one may use the weighted average AðsÞ ¼ P t w t Aðx t Þdðf t À fðsÞÞ. However, in the stiffspring approximation, unweighted estimator A i ¼ Aðx t Þ h i i is often more efficient.
i i =g provides an estimate for the variance, given g ¼ 1 þ 2t A ac =t lag is the statistical inefficiency in which t A ac is the autocorrelation time associated with quantity A and t lag is the lag time between the data points used in the analysis 70 .