The pathway of ligand entry from the membrane bilayer to a lipid G protein-coupled receptor

The binding process through the membrane bilayer of lipid-like ligands to a protein target is an important but poorly explored recognition process at the atomic level. In this work we succeeded in resolving the binding of the lipid inhibitor ML056 to the sphingosine-1-phosphate receptor 1 (S1P1R) using unbiased molecular dynamics simulations with an aggregate sampling of over 800 μs. The binding pathway is a multi-stage process consisting of the ligand diffusing in the bilayer leaflet to contact a “membrane vestibule” at the top of TM 7, subsequently moving from this lipid-facing vestibule to the orthosteric binding cavity through a channel formed by TMs 1 and 7 and the N-terminal of the receptor. Unfolding of the N-terminal alpha-helix increases the volume of the channel upon ligand entry, helping to reach the crystallographic pose that also corresponds to the predicted favorable pose. The relaxation timescales of the binding process show that the binding of the ligand to the “membrane vestibule” is the rate-limiting step in the multi microseconds timescale. We comment on the significance and parallels of the binding process in the context of other binding studies.

Over 60% of currently marketed drugs target membrane proteins 1 . These proteins are targeted so frequently because they mediate a large variety of cellular responses to extracellular signals, and are often linked to disease. The initial event of these cellular responses is the binding of ligands or drugs to a target membrane protein. Despite its importance in the process, measuring the binding interactions in membrane proteins is substantially more difficult than for water-soluble proteins and remains a challenge 2 . Unbiased molecular dynamics (MD) simulations in the microsecond timescale have recently been used to yield novel insights into the processes of ligand binding [3][4][5] . However, in these cases, the ligand progressed from the extracellular aqueous environment to the apolar binding site crevice at the transmembrane (TM) domain of the membrane protein.
The situation is substantially complicated when the ligand is itself a lipid. In the case of phospholipid transport across cellular membranes, such as via P4-ATPases or ATP binding cassette (ABC) transporters, ligand entry into the binding site occurs from the membrane bilayer 6 . In addition, membrane phospholipids can be metabolized upon cell activation into potent lysophospholipid mediators, such as lysophosphatidic acid (LPA) and sphingosine 1-phosphate (S1P). These bioactive lipid mediators regulate diverse processes such as angiogenesis, cardiac development, neuronal survival, and immunity via the interaction with the G protein-coupled receptors (GPCRs) LPA 1 -LPA 6 Rs and S1P 1 -S1P 5 Rs, respectively 7 . The recent crystal structure of S1P 1 R in complex with the ML056 inhibitor (a.k.a. W146) (Fig. 1a) 8 has shown that, despite S1P 1 R having a molecular architecture similar to the other members of the GPCR family 9 , the ligand access to the binding site likely occurs from the lipid bilayer via an opening between TM helices, in marked contrast to other GPCRs.
In this work, we try to understand the binding process of a lipid mediator to the receptor using large scale unbiased all-atom MD simulations on the GPUGRID 10 distributed computing infrastructure. Other studies have previously used simulations to investigate the binding of ligands to GPCRs 4,11,12 from the solvent environment. However, the quantitative binding reconstruction of the endogenous ligand via the lipid bilayer was never possible due to the reduced sampling available 13 . Here, thanks to large-scale molecular simulations of over 800 microseconds, we have been able to characterize the binding of the lipid ligand to its crystal-bound pose, as well as to Scientific RepoRts | 6:22639 | DOI: 10.1038/srep22639 produce a quantitative description of major steps along the binding pathway. Finally, we highlight the role of a flexible N-terminal helix and water in the binding process. All data is available upon request to the authors.

Results
Computational binding experiment. We inserted 19 ML056 inhibitors spread roughly evenly between both leaflets of a lipid membrane containing palmitoyl-oleoyl phosphatidylcholine lipids (POPC), cholesterol, and S1P 1 R (see methods and Fig. 1b). Three different rounds of MD simulations, with an aggregate sampling over 800 microseconds, were performed (Table 1).
In the first series of simulations (I1 in Table 1), none of the ML056 inhibitors spontaneously bound the S1P 1 R binding site, but in seven cases the ligand remains bound to the surface of the receptor, mainly to amino acids in TMs 1 and 7 (six simulations, see below) and TMs 1 and 2 (one simulation). Next, fifty simulations were spawned from each of these seven surface-bound structures in order to progressively sample the binding event (R1 in Table 1). Notably, in 60% of these subsequent simulations there was a clear progression into the S1P 1 R orthosteric binding pocket located within the TM helical bundle (RMSD < 12 Å relative to the crystal bound pose), in 28% of the simulations ML056 remained surface-bound, and in 12% of the simulations the ligand moved back into the bulk membrane. For consistency, we have only respawned simulations if several similar events have previously occurred to ensure that the dominant pathway is correctly sampled as done in automated adaptive sampling schemes 14,15 (see also the Methods section for more detail).
Thus, a final series of 500 simulations (R2 in Table 1) were spawned from 28 trajectories that reach the TM binding pocket with ligand RMSD < 5 Å relative to the crystal bound pose 8 . In 82% of these simulations, ML056 remained below RMSD values of 5 Å, resulting in an average RMSD-to-bound of 3.6 Å. The simulation with the lowest RMSD value (< 2 Å) relative to the crystal bound pose is shown in Fig. 1c, which is slightly above to the average fluctuation seen in a simulation started from the crystal bound pose (see Supplementary Fig. 2). This computationally-derived binding pose reproduced the main contacts between ML056 and the receptor observed in the crystal structure: the ionic interactions between the zwitterionic head group of the ligand and R120 3.28 and    3.29 , and the hydrophobic interactions between the acyl chain of the ligand and an hydrophobic/aromatic pocket in TMs 3 and 6 mainly formed by M124 3.32 , F125 3.33 , L128 3.36 , L272 6.51 and F273 6.52 .
The pathway of ligand entry. In contrast to β -adrenergic and muscarinic receptors (and most other GPCRs), in which the binding site is fully accessible from the extracellular environment, the N-terminus (in red in Fig. 1) and ECL2 (in orange) of S1P 1 R buries the binding site from the extracellular environment in a similar manner as rhodopsin. The entrance/exit channels for retinal in rhodopsin have been proposed to occur through the lipid bilayer, via two openings located between TMs 1 and 7, and TMs 5 and 6 16,17 . Thus, we aim to identify for S1P 1 R the pathway that connects the buried orthosteric binding pocket to the membrane bilayer. We explored pockets and channels from conformations obtained in the MD trajectories that account for the intrinsic flexibility of S1P 1 R 18 . The cavity located between TMs 1 and 7 is shown in Fig. 2a and the N-terminal domain that defines an entrance channel of the ligand is shown in Fig. 2b. While the step-by-step interactions between ML056 and S1P 1 R were heterogeneous among trajectories, the binding pathway can be subdivided into four different stages (Fig. 2). An example of trajectory of binding is shown in supplementary Movie 1. ML056 moves slowly from bulk membrane to contact a lipid-facing cavity of S1P 1 R mainly formed by amino acids in TMs 1 and 7. In these conformations the phosphonate group of ML056 is attracted and captured by R292 7.34 and S38, while the protonated primary amine interacts with E294 7.36 (Fig. 2b). The end-terminal hydrophobic tail of the ligand accommodates in a hydrophobic pocket formed by V51 1.38 , F52 1.39 , I55 1.42 , V298 7.40 and L302 7.44 . The ligand spent about 250 ns (Fig. 2g), interacting with these residues. The presence of a small cavity at the entrance of the orthosteric binding site has also been described in similar studies on β -adrenergic receptors 4,19 . In these cases, this cavity was named "extracellular vestibule" 4 or "secondary binding site" 19 . Importantly, it was shown that ligands binding to this cavity in muscarinic receptors might act as allosteric modulators since they control the extent of receptor movement to govern a hierarchical order of G-protein coupling 20 .
In a third stage the ligand inserts the hydrophobic tail into this entrance channel to interact with L102 2.61 and L297 7.39 . Eventually the zwitterionic head group of the ligand is moved from the "membrane vestibule" to the channel by a diverse set of polar interactions involving, in addition to R292 7.34 and E294 7.36 , the side chains of D40, K41, or E42 in the N-terminal α -helix. However, it is important to note that these interactions are transient. Figure 2c shows a conformation in which the polar head group interacts with R120 3.28 and N101 2.60 , whereas the hydrophobic tail interacts with F52 1.39 , M124 3.32 and L297 7.39 .
In the final step of the binding process, ML056 enters the orthosteric binding cavity from the channel. The hydrophobic tail of the ligand enters the binding cavity before the polar head group. However, in a few cases the head went in first. In such cases we never saw a full binding event, and it is even possible that the ligand could leave the binding cavity and reentered tail-first. Once inside, the zwitterionic head group transiently interacts with R120 3.28 and E121 3.29 , which are important to the bound pose (see above). Reaching the final stage requires the hydrophobic tail to pass between TMs 3 and 6 (sterically hindered by the bulky and centrally located M124 3.32 ) and relocating the water molecules of the binding cavity. At this point the dominant motions are rearrangements of the head and tail such that all the favorable interactions can be formed (Fig. 2d).
Ligand hydration and the role of water. Water molecules have a critical role in ligand binding processes 21,22 . We calculated the number of water molecules within 5 Å of the ligand in all the steps of the binding process (Fig. 2g). As ML056 moves progressively from the bulk membrane to the "membrane vestibule", the number of waters molecules decreases from roughly 40 to 30. The entrance of the ligand into the channel and the orthosteric binding cavity further drops the number of water molecules to 15-20. While the number of bound water molecules does not change substantially inside the binding cavity, their position appears important. Because water molecules show a complex network of interactions with the key R120 3.28 and E121 3.29 residues and the incoming ligand, the competition between these transiently stable interactions relative to the native contacts seems to be a final barrier that must be overcome.
Characterization of the binding process. The fact that we have 19 undistinguishable ligand molecules in the membrane complicates a simple application of a Markov state model (MSM) analysis such as in 3,15,23,24 for single ligands. Therefore, we use an alternative approach by computing the energetics of the ligand in a similar manner of refs 3,25 by reconstructing the energetic volumetric map of the ligand around the protein. We therefore built the equilibrium three dimensional occupancy map of the position of the phosphate group of the ligand ( Fig. 3 and Methods).
The free energy isosurface shows the bound pose of the ligand as well as its binding pathway (Fig. 3a). The free energy minimum corresponds to the position of the phosphate as in the crystal bound pose. The pathway of entry is clearly shown to be between TMs 1 and 7. The population difference between the ligand being in the binding cavity and in the membrane bilayer is approximately 2 kcal/mol, i.e. at least an order of magnitude higher occupancy in the orthosteric binding volume. Note that the concentration of the ligand in the membrane is already high (10:5:1, POPC:cholesterol:ML056). The relaxation timescales of the slowest transitions in the system (Supplementary Fig. 1) show three separate processes and a well converged MSM. By analyzing the sign of eigenvectors of the MSM transition matrix these relaxations can be assigned to specific processes (see Methods). The slowest relaxation process can be identified with the flipping of the ligand between membrane leaflets (~100 μs) (Fig. 3b). The second slowest process is given by the transition from the upper leaflet to the "membrane vestibule" (Fig. 3c) between TMs 1 and 7 (1-10 μs). The third slowest process (Fig. 3d) is the transition from the "membrane vestibule" to the bound pose (~500 ns). As the leaflets flipping process is not important for binding, being both leaflets populated with the same amount of ligands, the second process of formation of the surface-bound structures ("membrane vestibule") at the top of TMs 1 and 7 becomes the rate limiting step. The polar head group of ML056 (orange sticks) is attracted to a "membrane vestibule" by the interaction of S38, T48 1.35 , R292 7.34 and E294 7.36 (green sticks). The hydrophobic tail of the ligand is accommodated in a hydrophobic pocket between TMs 1 and 7. The entrance channel is shown as a transparent gray surface and the crystal structure of S1P 1 R is shown in white for comparison. The simulation time for this step is shown as an orange bar in panel (g). (c) ML056 (orange sticks) moves from the "membrane vestibule" (shown in light blue sticks for comparison purposes) to the entrance channel. The crystal structure of S1P 1 R and an earlier snapshot, in which the N-terminal helix is unfolded due to ligand entering the channel, are shown in white and gray, respectively. The simulation time for this step is shown as a dark orange bar in panel (g). (d) In the final step, ML056 enters the orthosteric binding cavity from the channel. The computationally-derived binding pose (orange and green sticks for ML056 and S1P 1 R, respectively) reproduced the main contacts observed in the crystal structure (white helices and sticks). The simulation time for this step is shown as a red bar in panel (g). The role of the N-terminal α-helix in ligand entry. An interesting feature that was apparent in the various binding trajectories was the role of the α -helix of the N-terminal domain in defining the entrance channel of the ligand. The observed binding simulations suggest that ligand entry into this channel perturbs this N-terminal α -helix, causing partial unfolding (Fig. 2c) and the corresponding increase of the volume of the channel (Fig. 2f), which facilitates the entrance of the ligand into the orthosteric binding cavity. To probe this behavior, we show the RMSD of this α -helix relative to the conformation observed in the crystal structure against ligand RMSD relative to the crystal bound pose of all the 28 binding trajectories mentioned above (Fig. 4b, Supp. Movie 2). Clearly, as the ligand enters into the binding cavity (low values of RMSD) the conformation of the α -helix is more distant to the crystal structure (high values of RMSD). Importantly, in some trajectories this N-terminal domain refolds into the α -helix conformation seen in the crystal structure (Fig. 2c) once the ligand is located in the binding site apart from the entrance channel. In order to quantify the flexibility of this α -helix and assess its role in ligand entrance, we compared the average RMSF of the entire protein between the 28 respawned trajectories and the initial simulations that did not cause spontaneous ligand binding. We found a clear increase in average RMSF of residues 34 to 48 forming this α -helix (Fig. 4a). Furthermore, the beta factors of the crystal structure (Fig. 4c) show that this part of the N-terminus is inherently less stable than the rest of the protein, supporting the observation that the helix could undergo a helix-coil transition as shown in the simulations. Remarkably, this N-terminal helix of S1P 1 R is a loop in the recent crystal structure of LPA 1 R 26 . As the authors noted, this different conformation of this domain suggests that LPA 1 R preferentially receives ligands from the extracellular milieu, in contrast to membrane access for S1P 1 R ligands.

Discussion
In this work we have shown the binding of a lipid-like ligand from the membrane bilayer directly to the orthosteric binding site of a GPCR using unbiased MD simulations. The ML056 inhibitor, studied here, binds S1P 1 R through a multi-step process that finally leads to the crystallographic binding pose. Our computer simulations have permitted to elucidate the binding pathway along with critical atomic interactions and conformational changes.
Similar to other studies of ligand binding to GPCRs 4 , there is a barrier to binding that occurs far away from the final binding pose. This barrier is formed by the interaction between the zwitterionic head group of the ligand and R292 7.34 and E294 7.36 in the "membrane vestibule" and the desolvation necessary for the ligand to enter the channel. Others have shown that for S1P binding, which has a similar zwitterionic head group to ML056, mutation of R292 7.34 results in a drastic reduction in the EC50 for the receptor in whole cell experiments 27 , although the specific mechanism for this is unclear. This site could be a target for allosteric modulators, similar to locations recently identified on the muscarinic receptor 11 . In the final steps, the lipid tail fits snugly between hydrophobic and aromatic side chains in TMs 3 and 6, and the zwitterionic head group binds R120 3.28 and E121 3.29 . A final barrier is due to rearrangements of the water molecules in the binding cavity and of dihedral angles in the head and tail of the ligand such that all these favorable interactions can be formed. Remarkably, in membrane-embedded ligands the favorable entropic desolvation of hydrophobic parts has already occurred, in contrast with lipid ligands that enter the binding cavity from the extracellular aqueous environment.
Along the binding process, the widening of the cavity by the flexible N-terminus and the desolvation of the ligand head group appear as general trends. However, the functional importance of the rearrangement of the N-terminal helix in binding also remains unclear. While it certainly needs to move somewhat in order for the ligand to be able to enter, its interactions with the ligand may be important to productive binding. Finally, it is unclear whether and what role residues along the binding pathway could play in allostery or biased signaling 11,28,29 . Residues such as those in the N-terminal helix or at the top of TM7 outlined here, could be targets for the design of such modulators. Methods System Preparation. The crystal structure of S1P 1 R was downloaded from the Protein Data bank (PDB: 3V2Y). Residues 1002 to 1161 were removed and uncharged terminal caps were placed at residues 231 and 245. The glucosamine attached to residue 30 was removed. The ML056 ligand was moved to its own file to aid in parameterization. The phosphate and amine of the ML056 ligand were assumed protonated, resulting in a zwitterionic, neutrally charged ligand. The receptor was then fed into the CHARMM-GUI 30 online membrane building tool, where the receptor was automatically positioned in a 90 Å × 90 Å square membrane. A lipid membrane composed of 10:5:1 of POPC, Cholesterol, and a dummy lipid was used, with the dummy lipid later being mutated into ML056. The membrane was solvated with TIP3P water and 0.15 M NaCl. The system was rebuilt in VMD 31 in order to generate a correct topology file, protonation and angles and dihedrals. CHARMM27 force field 32 with CMAP terms 33 was used for the protein. Lipids were parameterized with the CHARMM36 parameters, while the ParamChem 34,35 web server was used to assign CGenFF 36,37 parameters for the ML056 ligand. The ML056 molecules were situated > 20 Å away from the receptor to minimize biased binding due to initial coordinates.
Simulation protocol. All simulations were performed with ACEMD 38 molecular dynamics software. The system was minimized for 500 steps, followed by equilibration in the NPT ensemble for 20 ns to allow the membrane to equilibrate. Frames were saved every 100 ps for all simulations. During the equilibration, restraints were placed on the heavy atoms of the protein and slowly relaxed over the first 5 ns, and the protein was allowed to move freely thereafter. The system was then simulated for 100 ns under the NVT ensemble. Structural snapshots were taken every 20 ns from this trajectory and used to spawn 200 simulations each on GPUGRID 10 , resulting in 1,000 simulations total. These simulations were ran for up to 1 microsecond each, resulting in an average length of 580 nanoseconds or 580 microseconds aggregate simulation. As these simulations ran, the ligand's RMSD to the crystal bound pose was continuously monitored manually, and those with the lowest RMSD were inspected visually to discern if ML056 had begun interact with the receptor. In seven simulations, we saw that ML056 was below 15 Å RMSD to the crystal structure and appeared to interact with the receptor between either TM1 & TM7 (6 cases) or TM1 & TM2 (one case). We respawned 50 new simulations from snapshots from these seven trajectories, resulting in 350 new simulations. These simulations were run for an average length of 630 nanoseconds, giving additional aggregate sampling of 203 microseconds. A final structure was taken from the closest bound pose of these last runs in order to ensure the bound pose was stable and adequately sampled. A respawning round of 500 simulations of 100 ns each was sent, resulting in an additional 48 microseconds of aggregate simulation. For all cases of respawning, initial velocities for each simulation were reassigned based on velocities drawn from a Maxwell-Boltzmann distribution. The Langevin thermostat with a damping coefficient of 0.1/ps was used for all production simulations to maintain constant temperature for the simulations thereafter.

Three-dimensional Markov state model. The master equation of a Markov process is built as
where P i (t) is the probability of state i at time t, and k ij are the transition rates from j to i, and K = (K ij ) is the rate matrix with elements K ij = k ij for i ≠ j and = −∑ ≠ K k ii j i ji . The equation dP/dt = KP has solution with initial condition P(0) given by P(t) = T(t)P(0), where we defined the transition probability matrix T ij (t) = (exp[Kt]) ij i.e. the probability of being in state i at time t, given that the system was in state j at time 0. T ij (Δt) is estimated from the simulation trajectories for a given lag time Δt using a maximum likelihood estimator compatible with detailed balance 39 . The eigenvector π with eigenvalue 1 of the matrix T(Δt) corresponds to the stationary, equilibrium probability. Higher eigenvectors correspond to exponentially decaying relaxation modes 40 for which the relaxation timescale is computed as τ = λ ∆t s log( ) s , where λ s is to the largest eigenvalue above 1. For long enough lag times Δt the model will be Markovian. A Markov state model of the system was built based on the three dimensional position of the ligand phosphate similarly to refs 3,40 using the HTMD software environment 41 . The phosphate coordinates of each lipid-ligand is treated independently and the data is discretized in space by using a simple regular space clustering algorithm and the discrete trajectories used to construct the MSM transition matrix T ij (t), giving the probabilities of jumps between clusters i and j at different lag times t. A maximum likelihood estimator was used to build the transition matrix which guarantees detailed balance. The relaxation timescales of the slowest processes were then computed ( Supplementary Fig. 1). A lag time of 30 ns was finally used to construct the Markov model at which the relaxations are already into exponential behavior (flat in log-scale, see Supplementary Fig. 1).
Hydration analysis. The hydration of the ligand and other residues was determined using VMD. The number of waters with an atom < 5 Å from the residue of interest were counted and plotted over time.
N-terminal helix flexibility analysis. To better understand the prevalence and significance of the flexibility in the N-terminal helix covering the binding pocket, we compared the mean RMSF of all residues between 28 trajectories in which the ligand bound < 5 angstrom and all trajectories from the initial round of simulations (Fig. 4a). To further strengthen confidence in the trend seen, we plotted the RMSD of the loop to its crystal structure against the RMSD of the closest bound ML056 ligand (Fig. 4b).