Internal water channel formation in CXCR4 is crucial for Gi-protein coupling upon activation by CXCL12

Chemokine receptor CXCR4 is a major drug target for numerous diseases because of its involvement in the regulation of cell migration and the developmental process. In this study, atomic-level molecular dynamics simulations were used to determine the activation mechanism and internal water formation of CXCR4 in complex with chemokine CXCL12 and Gi-protein. The results indicated that CXCL12-bound CXCR4 underwent transmembrane 6 (TM6) outward movement and a decrease in tyrosine toggle switch by eliciting the breakage of hydrophobic layer to form a continuous internal water channel. In the GDP-bound Gαi-protein state, the rotation and translation of the α5-helix of Gαi-protein toward the cytoplasmic pocket of CXCR4 induced an increase in interdomain distance for GDP leaving. Finally, an internal water channel formation model was proposed based on our simulations for CXCL12-bound CXCR4 in complex with Gαi-protein upon activation for downstream signaling. This model could be useful in anticancer drug development.

C XCL12 is a pleiotropic chemokine commonly present in numerous tissues and acts as a chemoattractant, playing a crucial role in inflammation and immune surveillance of tissues 1 . CXCL12 is the only known endogenous ligand for CXC chemokine receptor 4 (CXCR4), also known as fusin or cluster of differentiation 184 (CD184). CXCR4 is composed of 352 amino acids and has a molecular weight of 48 kDa. Cells expressing CXCR4 migrate along the CXCL12 concentration gradient and are involved in diverse physiological functions and organ development 2 . The CXCL12/CXCR4 axis regulates various cellular behaviors, such as cell migration, adhesion, and invasion 3 .
The crystal structure of CXCR4 lacking the N-terminus was solved in 2010 with antagonists bound 4 . Furthermore, the NMR structure of CXCL12 bound to the N-terminus of CXCR4 was released to assess their critical interactions 5 . Studies have also reported that CXCL12 binding to CXCR4 follows a two-site binding mechanism, which suggests that binding at Site 1 occurs when the N-loop (RFFESH) of CXCL12 interacts with the Nterminus of CXCR4 for the initial binding, whereas binding at Site 2 occurs when the N-terminus of CXCL12 interacts with the top groove of the transmembrane (TM) helices of CXCR4 [5][6][7] . Moreover, CXCR4 binding to its chemokine induces the activation of various intracellular (IC) signaling transduction pathways and downstream effectors that mediate cell survival, proliferation, chemotaxis, migration, and adhesion through the transmembrane helices 8 . Substantial conformational changes in G-proteincoupled receptor (GPCR) TM helices have been demonstrated to mediate signaling upon activation 9,10 . Conformational changes occur in the transmembrane and IC regions of CXCR4 following binding of chemokine CXCL12 to CXCR4, which act as signals for the heterotrimeric inhibitory G-protein (G i ) binding. Trimeric G i -protein couples with the activated receptor, which causes the G αi subunit to undergo a conformational change that promotes the domain separation in G αi -protein and the exchange of GDP to GTP for downstream signaling 11 .
The structures reported 4,12 to date have been in an inactive state, and thus information regarding the complete conformational change process of CXCR4 from the inactive to active state is still lacking. Experimental data provided extensive information regarding the interactions between CXCL12 and CXCR4 but did not address the interactions between the activated CXCR4 and Gprotein. Although studies have displayed the complex structures of nucleotide-free G-protein bound to GPCR [13][14][15] , the detailed dynamic process of how activated GPCR induces GDP/GTP exchange inside the G-protein remains unclear. The complete atomic-level signaling network from agonist CXCL12 binding to G-protein through CXCR4 is still not available. To address these questions, we used atomic-level molecular dynamics (MD) simulations to clarify the activation mechanism and internal water channel formation of receptor CXCR4 in complex with chemokine CXCL12 and G-protein. Other simulation systems, such as a small-molecule antagonist isothiourea derivative (IT1t) bound to CXCR4 and CXCL12 bound to mutant CXCR4 (mCXCR4), were also used to perform microsecond-scale MD simulations for comparison. The G i -protein was then coupled to the cytoplasmic region of CXCL12-bound CXCR4 to investigate the dynamics of conformational changes of the G-protein. Our simulations suggested that electrostatic interactions may dominate the binding of CXCL12 to receptor CXCR4. The CXCL12bound CXCR4 undergoes conformational changes in the TM region, which in turn enables the internal waters to flow through the TM region by eliciting the breakage of hydrophobic layer (HL). When GDP-bound G αi -protein couples with the CXCL12bound CXCR4, the increase in interdomain distance and the rotation and translation of the α5-helix of G αi -protein toward the cytoplasmic pocket of CXCR4 may cause the unbinding of GDP from the nucleotide-binding site of G αi -protein. Therefore, on the basis of our simulations, we proposed an internal water channel formation model for CXCL12-bound CXCR4 in complex with G αi -protein upon activation for downstream signaling. CXCR4 has been demonstrated to be a prognostic marker associated with numerous cancers, such as breast, prostate, lung, and colon cancers, where it promotes metastasis, angiogenesis, and tumor growth or survival; therefore, our results provide valuable information for the understanding the downstream signal transduction process of the CXCL12-CXCR4-G αi tricomplex, which could prove useful in the development of anticancer and antimetastatic drugs.

Results and discussion
Electrostatic interactions dominate the binding of chemokine CXCL12 to receptor CXCR4. To explore the ligand binding to receptor CXCR4, various ligands docked to receptor CXCR4 were studied under three circumstances, namely CXCL12 docked into the TM region of CXCR4, small-molecule antagonist IT1t redocked to receptor CXCR4, and CXCL12 docked to mCXCR4 (L244 6.40 P and L246 6.42 P). It was found in previous mutagenesis experiments that the mutations at these two positions did not affect the CXCL12 binding, but noticeably reduced calcium flux 9,16 , causing the inactivation of CXCR4. Thus, the proline mutation in this region can eliminate the downstream signaling. The docking results are presented in Supplementary Tables S1 and S2 and Fig. 1, Supplementary Figs. S1a and S2b. Although the difference of RDOCK energy between different docking poses was not particularly large, the binding conformations of CXCL12 docked to CXCR4 system were noticeably different, and only the docking pose1 displayed the conformation of CXCL12 embedded into CXCR4 (Supplementary Fig. S1b). We selected the pose1 as the preferable pose for CXCL12 binding to CXCR4, on the basis of findings from experimental 6,9,17 and simulation 7,18 studies. The preferred pose determined was used for further MD simulations. The predicted binding interface for CXCL12-CXCR4 revealed substantial overlap with the binding of the N-terminus of CXCR4 to the CXCL12 structure (PDB: 2N55; Supplementary  Fig. S1c) 17 . The superposition RMSD of the two structures was 3.2 Å. The docked CXCL12−CXCR4 complex structure was superposed to the model proposed by Ziarek et al. 17 , which had an RMSD of 4.5 Å, indicating that these two models were comparable ( Supplementary Fig. S1d). Surface charge distribution maps demonstrated that the N-terminus and N-loop of CXCL12 with more positively charged residues (K1, R8, and R12) were embedded into the extracellular (EC) region of CXCR4 with stronger negative electric fields (D10 N-term , E14 N-term , E15 N-term , E31 1.25 , E32 1.26 , E179 ECL2 , D181 ECL2 , D182 ECL2 , D187 ECL2 , D193 5.32 , D262 6.58 , and E288 7.39 ), which suggested that electrostatic interactions may play a major role in the binding of CXCL12 to CXCR4 (Fig. 1). This result was consistent with previous findings 7, 9 and also resembled other chemokinereceptor bindings (CXCL8-CXCR1) 19,20 . CXCL12 bound to mCXCR4 also exhibited similar results of electrostatic interactions dominating the binding ( Supplementary Fig. S1a).
Previous studies have suggested a two-site binding model for CXCL12 binding to CXCR4 6,7,9,18,21 in which the N-loop, β-sheet, and 40s-loop of CXCL12 interact with the extracellular region of CXCR4 (Site 1) and the N-terminus of CXCL12 buries into the TM region of CXCR4 with electrostatic interactions to trigger the activation and signaling of CXCR4 (Site 2). For the CXCL12bound CXCR4 system, after 1.8-μs MD simulations, the Nterminus of CXCL12 was still buried deep in CXCR4 and occupied the entire EC region and most of the TM region. The interactions of the RFFESH loop of CXCL12 with the N-terminal domain of CXCR4 and C9 of CXCL12 with E277 7.28 and H281 7.32 of CXCR4 were stable during the MD simulations, which was consistent with studies that indicated that Site 1 is the recognition site for CXCL12 binding to CXCR4 (Fig. 2). Moreover, the interactions between the K1 of CXCL12 and the E288 7.39 of CXCR4 and between the S4, S6, and R8 of CXCL12 and the D187 ECL2 , D262 6.58 , and E277 7.28 of CXCR4 were predominant, which supports the hypothesis that Site 2 binding triggers G-protein signaling. The heatmap of these interactions over time also indicated that most of these interactions were maintained under 4.0 Å (probability >50%) during the simulation time ( Supplementary Fig. S2a). Simulations were also performed for the replicate CXCL12−CXCR4 system and another CXCL12−CXCR4 model proposed by Floudas et al. 18 . The results showed that tyrosine toggle switch and RMSD profiles over time were comparable to our model ( Supplementary  Fig. S1e, f). The tyrosine toggle switch was determined by measuring the distance between Y219 5.58 and Y302 7.53 . Previous studies have indicated that the two residues are closer during GPCR activation 9,22 . The preferable docking pose of antagonist IT1t to CXCR4 was similar to the solved crystal structure of CXCR4 with IT1t bound and the RMSD of superposition of the two structures was 0.59 Å ( Supplementary Fig. S2b) 4 . The IT1t-bound CXCR4 structure embedded into a hydrated POPC lipid bilayer displayed stable RMSD values in fluctuations around 0.45 nm after 1.5-μs MD simulations. The IT1t−CXCR4 system may maintain the inactive state during the simulation time compared with the initial crystal structure system (Supplementary Figs. S1e and S2c). Mutagenesis experiments have suggested that mutant CXCR4 (L244 6.40 P and L246 6.42 P) does not affect the CXCL12 binding, but does eliminate the downstream signaling 4,9,21 . The simulations indicated that CXCL12-bound mCXCR4 exhibited similar binding interactions to the CXCL12bound CXCR4 system during the 1.5-μs MD simulations. However, the down half-helix of TM6 with two mutated residues moved dynamically, and the superposition of the structures at different time frames (0, 500, 1000, and 1500 ns) showed the inward tilt of 5.1 Å without affecting the ligand binding ( Supplementary Fig. S2d). The inward movement of the lower half of TM6 may diminish the cytoplasmic region of CXCR4. This movement is suboptimal for G i -protein binding and inhibits the downstream signaling, which is consistent with the previous experiments 4, 16 . The CXCL12−mCXCR4 complex may also be in an inactive state.
Conformational changes of receptor CXCR4 as agonist CXCL12 binding. Studies have reported that GPCRs undergo substantial structural changes upon activation, to associate with Gprotein for downstream signaling, especially at TM5 and TM6, but also at TM3 and TM7 in certain cases 9,23,24 . The conformational change of CXCR4 induced by CXCL12 binding during long-term MD simulations was illustrated by B-factor (Fig. 3a). The EC region of CXCR4 was mobile because of its interactions with CXCL12, and the lower halves of TM5, 6, and 7 exhibited more fluctuations when coupled with G-protein. However, the B-factor represented as a ribbon with agonist CXCL12 colored cyan and receptor CXCR4 colored gray. Blue color corresponds to positive and red color to negative electrostatic potential. Residues around the binding interface are labeled and depicted as sticks; blue is for CXCL12, and green is for receptor. a Surface charge distribution around the binding interface for CXCL12 binding to CXCR4. b Surface charge distribution of CXCR4 inside the receptor region presented for clarity are more negative charge potential. Positively charged residues of the N-terminus of CXCL12 are labeled to show interactions with CXCR4. c Surface charge distribution of the N-terminus of CXCL12 showing more positive charge potential. Negative charged residues of the extracellular region of CXCR4 are labeled to show interactions with CXCL12.
of other systems exhibited less mobility in the TM helices compared with the CXCL12-CXCR4 system and other regions might have more mobility than TM regions, which indicated that these systems might have less conformational changes in TM helices ( Supplementary Fig. S3a-c). For the CXCL12-bound CXCR4 system, the superposition of CXCR4 frames at various simulation time points demonstrated that the IC part of TM5 moved to TM6, whereas the IC region of TM6 vibrated and moved outward. To clearly represent the outward movement of TM6, we superposed the CXCR4 structure frames at different simulation time (0, 500, 1000, 1500, and 1800 ns) and measured the C α atom movement of the intracellular end residue of TM6 (K234 6.30 ). The outward movement of the intracellular end residue of TM6 between the initial and final frames was measured as 5.5 Å (Fig. 3b). This system may report slightly less outward movement compared with other GPCR systems (motion ranges from 6 to 14 Å) 10 . However, our simulation of the CXCL12−CXCR4 system indicated that the outward movement increased activation, which contrasted with other GPCR−G αi systems (motion ranges about 6 to 8 Å) 14,15 . The movement and tilt of TM6 were relatively similar in the apo CXCR4 system; however, interestingly, TM6 moved inward in the mutant CXCR4 system, indicating that the inward movement of TM6 may reduce the cytoplasmic region and inhibit G i -protein binding ( Supplementary Fig. S3d, e). Three C α atoms of residues (I245 6.41 , P254 6.50 , and G258 6.54 ) were selected to measure the kink angle of TM6 (the angle between up half and down half of TM6). The kink angle of TM6 with time for various simulation systems also indicated that larger kink angles (150°~160°) were observed in apo and IT1t-bound CXCR4 systems similar to the inactive CXCR4 crystal structure (150°, PDB: 3ODU), while smaller kink angles (135°~145°) were found in CXCL12-bound CXCR4 and CXCL12−CXCR4−empty G αi systems close to the active β 2 AR−G s complex structure (130°, PDB: 3SN6). The negative kink angle for CXCL12-bound mCXCR4 system meant inward tilt of down half of TM6. Larger down half tilt movement  induced by CXCL12 binding during MD simulation is indicated by B-factor of C α atoms. In the color spectrum, red indicates more flexibility, and blue represents more rigidity. b Superposition of CXCR4 frames at various simulation time points shown as intracellular perspective view. CXCR4 frames are represented as gray ribbons; gray is for TM6 at the initial time, blue is for TM6 at 500 ns, cyan is for TM6 at 1000 ns, green is for TM6 at 1500 ns, and red is for TM6 at 1800 ns. c Tyrosine toggle switch. The distance between the oxygen atoms of the side chains of Y219 5.58 and Y302 7.53 was measured throughout MD simulation. Apo CXCR4, CXCL12bound mCXCR4, CXCL12, and antagonist IT1t-bound CXCR4 are represented using black, red, blue, and green lines, respectively. of TM6 means the smaller kink angle of TM6 ( Supplementary  Fig. S3f).
Although GPCR activation is based on the action of several molecular switches 25,26 , only the tyrosine toggle switch was observed in our long-term MD simulations for CXCR4 activation because of the difference of GPCR sequences. Two conserved tyrosine residues (Y219 5.58 and Y302 7.53 ) in TM5 and TM7 appeared to act as a hydrophobic switch. Upon activation, the two residues moved closer together, causing the breakage of the hydrophobic barrier. The distance measured between the oxygen atoms of the side chains of Y219 5.58 and Y302 7.53 was~1.1 nm after 1.8 μs of simulation time, which was less than that at the initial time (~1.5 nm) for the CXCL12-bound CXCR4 system, whereas the distances measured in other CXCR4 systems after the simulation time ranged from 1.9 nm to 2.4 nm longer than the CXCL12-bound CXCR4 system (Fig. 3c). This finding indicates that the hydrophobic barrier is maintained to inhibit the internal water flowing. The conformation of CXCL12-bound CXCR4 system showed that Y302 7.53 flipped inward and TM7 tilted inward, whereas the conformation of CXCR4 in the inactive state for IT1t-bound CXCR4 system revealed that the two residues Y219 5.58 and Y302 7.53 did not flip inward, and TM7 tilted outward, which increased the distance ( Supplementary  Fig. S3g).
Formation of internal water flow in receptor CXCR4 during activation. Closer observation of the internal water distribution from the final frame of each CXCR4 system revealed the presence of two HLs inside the TM region of CXCR4 (Fig. 4), which is similar to other GPCR systems 26,27 . For the CXCL12-bound CXCR4 system, water entered the receptor from both the EC and IC regions of the receptor, which is consistent with the observations of a previous computational study 27 . Internal water flow was concentrated near the EC region and hindered by the HL1 (F87 2.53 , Y116 3.32 , L120 3.36 , and F292 7.43 ), which enabled the internal water molecules to bypass HL1 and become trapped between Y256 6.52 and W252 6.48 . The TM7 moved inward and Y302 7.53 swung into the receptor interior, closer to Y219 5.58 . The water entered from the IC region and broke the HL2 (L80 2.46 , L127 3.43 , I130 3.46 , and L244 6.40 ), forming hydrogen bonding networks among the surrounding waters and residues through Y302 7.53 to reach the middle of CXCR4, but remained blocked by the HL1. The interaction of H294 7.45 with W252 6.48 can link the internal water pathway from both EC and IC regions during activation (Fig. 4a). The water entrance pathway computed by HOLE program 28 was also depicted in Supplementary Fig. S4a. In CXCL12-bound CXCR4 system, water molecules could easily enter from both EC and IC regions of CXCR4, mediated by W252 6.48 and H294 7.45 . However, two HLs appeared to hinder the internal water pathway in the apo and antagonist IT1t-bound CXCR4 systems, and several water molecules were trapped between HL1 and HL2 (Fig. 4b, c). In the IT1t-bound CXCR4 system, Y302 7.53 flipped down and did not swing to disrupt the HL2, which blocked the water entrance. A similar phenomenon to the apo CXCR4 system was observed in the CXCL12-bound mCXCR4 system (Fig. 4d), whereby water molecules were trapped between HL1 and HL2, Y302 7.53 flipped up, and the mutant residue P244 6.40 moved inward to maintain the HL2, without forming a continuous water pathway.
The water density maps observed in the transmembrane region of respective simulation systems, analyzed from the simulation trajectories, revealed that HL1 was present in all systems, whereas HL2 was broken in the CXCL12-bound CXCR4 system to form an almost continuous water channel. Barely scattered waters in the inactive CXCR4 (apo, mCXCR4, IT1t-bound) systems were trapped between HL1 and HL2, and thus a continuous water pathway was not formed. (Supplementary Fig. S4b). The observed HL2 was also similar to previous studies in which a hydrophobic barrier was mentioned in the rhodopsin structure 25 . Sequence alignment of these residues from HL1 demonstrated that they were conserved among the Class A chemokine family of GPCRs (Tables 1 and 2). The HL1 residues Y116 3.32 and F292 7.43 flipped inside, adopting a closed conformation and acting as a barrier that caused the internal water to employ an alternative pathway in all simulation systems (Fig. 4). Other GPCR systems reported that HL1 and HL2 were broken to form a continuous water pathway when the receptor was activated 26,27 , which differed from the CXCL12−CXCR4 system that water molecules bypass HL1, instead of breaking HL1. Nevertheless, these results indicated that HL1 may be present in all systems. The CXCL12-bound CXCR4 structure was more activated than the other three systems because of the breaking of HL2, which may form an approximately continuous water pathway and hydrogen bonding networks with surrounding residues for activation and downstream signal transmission. Moreover, as compared to previous comprehensive library mutations on CXCR4 receptor 9 , some residues found in HL2 or regulating internal water flow were verified by mutagenesis experiments (Y219 5.58 F, L244 6.40 P, Y302 7.53 H, W252 6.48 R) to decline Ca 2+ mobilization as they were mutated. The formation of continuous internal water flow of GPCR may be crucial during GPCR activation.
Interaction of the α5-helix of G αi -protein with CXCL12-bound CXCR4 receptor. Studies have suggested that the G βγ subunit of the G i -protein facilitates the coupling of the G αi subunit to the receptor, and the GDP/GTP exchange in the G αi subunit engenders the dissociation and interaction with downstream effectors, such as G αi inhibition of adenylyl cyclase and G βγ activation of ion channels 29,30 . Furthermore, recently solved structure of rhodopsin-G i complex clearly revealed that the interactions between rhodopsin and G i -protein are mainly mediated by the G αi subunit and somewhat by G βγ subunit 14,15,31 . Therefore, we only focused on a variation in the interaction pattern of G αi -protein with CXCR4 receptor to reduce the computing consumption in our systems. To reveal the downstream signaling of G αi -protein coupling with the activated CXCR4, we selected the last frame of 1.8-μs MD simulations of the CXCL12bound CXCR4 system to couple with G αi -protein by performing additional microsecond-scale MD simulations for the CXCL12-CXCR4-G αi tricomplex structure at various G αi -protein states. The complex structure of nucleotide-bound G-protein binding to GPCR warrants clarification. Only less complex structures of nucleotide-free G-protein bound to GPCRs have been reported 14,15 . Therefore, three states of G αi -protein coupling with the CXCL12-bound CXCR4 were simulated to clarify the interactions between G αi -protein and CXCR4. The three states were GDP-bound G αi -protein docked to CXCL12-bound CXCR4, nucleotide-free G αi -protein coupled with CXCL12-bound CXCR4 through homology modeling using the μ-opioid receptor-G αi complex as the template 15 , and GTP-bound G αi -protein coupled with CXCL12-bound CXCR4 through homology modeling. The docking results revealed that the α5-helix of GDP-bound G αi -protein initially binds near the loop ICL3 of CXCR4 and the C-terminus of α5-helix facing the loop ICL2 of CXCR4. The α5-helix occupies the cytoplasmic space of CXCR4 among TM6, TM5, and TM3 ( Supplementary Fig. S5). These findings all accord to those from previous GPCR-G αi complex structure studies [13][14][15] .
MD simulations demonstrated that G αi -protein undergoes various conformational changes when bound to GDP, GTP, and in the absence of the nucleotide. In the GDP-bound state, the α5-helix residues were attracted to the ICL2, TM3, and TM7 of the CXCL12-bound CXCR4 receptor. The surface charge distribution maps indicated that the binding interface of the Ras domain of G αi -protein (more negatively charged) is attracted to the cytoplasmic region of CXCR4 (more positively charged) through electrostatic interactions ( Supplementary  Fig. S6a, b), which may trigger a counterclockwise rotation change from −8°to approximately −35°and an upper translation of approximately 4.0 Å of the α5-helix during MD simulations (Fig. 5a). In the nucleotide-free state, the α5-helix interaction changed to TM6, TM7, ICL1, and ICL3 of the CXCL12-bound CXCR4, causing a dissimilar clockwise rotation of~68°, which is similar to the compared simulation of the solved rhodopsin (RHO)-G αi -protein complex structure 14,15 . In the GTP bound to G αi -protein state, the rotation of the α5-helix  changed to approximately −28° (Fig. 5b). Our simulations suggested that the rotation and translation of the α5-helix of G αi -protein is crucial in the nucleotide-exchange mechanism of G αi -protein bound to CXCR4.
Interdomain distance of G αi -protein increases with GDP dissociation from the binding pocket. During the simulation of CXCL12-bound CXCR4 in complex with GDP-bound G αi -protein, the Ras and the helical domains, which are initially tightly bound to the nucleotide, may gradually separate. The initial interdomain distance between A238 of the helical domain and E276 of the Ras domain was 15.6 Å. Over the course of the simulation, the distance rapidly increased and fluctuated around 29.5 Å before 750 ns and then decreased to 16.0 Å at 900 ns. The distance gradually increased again to 25.0 Å at the end of the simulation period (Fig. 5c), indicating that the separation of the helical domain from the Ras domain may exhibit spontaneous fluctuations similar to that of GDP-bound G-protein with receptor-free system 11 . In the nucleotide-free state, the distance between the two domains initially decreased before fluctuating around 40.0 Å, to maintain the opening of the two domains during the whole simulation. The compared simulation for the solved complex structure of empty (nucleotide-free) G αi -bound RHO receptor also maintained a similar opening profile, in which the interdomain distance fluctuated and increased during the simulation time. In the GTP bound to the empty G αi -protein state, the interdomain distance gradually decreased from 56.0 to 45.5 Å to stabilize GTP at the nucleotide-binding pocket (Fig. 5c). Moreover, during the simulation of GDP-bound G αi system, the GDP was measured to be gradually released from the nucleotidebinding pocket ( Supplementary Fig. S6c), which may be associated with the interdomain distance increase and fluctuation, and the rotation and uptranslation of the α5-helix upon G αiprotein activation. Although our current simulations could not present the GDP/GTP exchange during G i -protein activation, we determined that the α5-helix rotation changed at different G αi states, the interdomain distances slowly increased at GDP-bound G αi state, and gradually decreased at GTP-bound G αi state, which were crucial for G i -protein activation 11,13-15 .
G αi -protein binding redirects the internal water flow in the CXCL12-CXCR4-G αi tricomplex structure. We finally examined the internal water flow in the CXCL12-CXCR4-G αi tricomplex and further investigated the internal water distribution from the final frame of each CXCL12-CXCR4-G αi tricomplex system. During the binding of GDP-bound G αi -protein, the nearly continuous water pathway was disturbed, with less water in the TM region of CXCR4 where HL2 reformed, thereby disrupting the water pathway (Fig. 6a). In the nucleotide-free G αiprotein state, Y302 7.53 swung into HL2 to break the hydrophobic gate and reformed an almost continuous internal water pathway (Fig. 6b). In the GTP-bound G αi -protein state, the continuous water molecules gradually decreased; the number of molecules was lower than that in the nucleotide-free state but higher than that in the GDP-bound G αi -protein state (Fig. 6c). The water density maps indicated that two HLs reformed to break the continuous water channel when GDP-bound G αi -protein coupled with CXCR4, whereas the continuous water flow reformed in the nucleotide-free G αi -protein state ( Supplementary Fig. S4). This difference in the internal water pathway among G αi -protein coupling states may be associated with the G i -protein activation.
Molecular switches and HLs mediate internal water flow upon CXCR4 activation. Internal water is essential to the function of numerous membrane proteins, such as channel and transport proteins, and continuous water flow has also been reported in activated GPCRs 27,[32][33][34][35] . The presence of two HLs was proposed for the A 2A R receptor, and during the active state of A 2A R, the NPxxY motif as a hydrophobic gate broke to enable water to flow through the TM of the receptor 27 . A similar mechanism was observed in our previous study, in which the HL broke during the active state of D3R receptor 26 . In the current study, we observed a relatively distinct water channel pathway, given that CXCR4 lacks ionic lock (distance between R3.50 and E6.30) and 3-7 lock (distance between E3.28 and K7.43) molecular switches which were proposed in previous experiments 25 . In the inactive states (apo, IT1t-bound CXCR4, and CXCL12-bound mCXCR4), the continuous water channel was hindered by the presence of two HLs. In the CXCL12-bound CXCR4 state, the HL2 broke because of the rearrangement of TM5 and TM7, and the distance between the two tyrosine residues Y219 5.58 and Y302 7.53 (tyrosine toggle switch) decreased (Figs. 4 and 3c). This observation is consistent with a previous study which reported that Y219 5.58 and Y302 7.53 underwent particularly large conformational changes from the inactive to active state 9 . HL1 was retained throughout the simulation time because of the presence of hydrophobic residue clusters (F87 2.53 , Y116 3.32 , L120 3.36 , and F292 7.43 ) near the EC region to divert the internal water flow. Sequence alignment revealed that the presence of HL1 residues may be limited to only the chemokine family receptors (Tables 1 and 2); therefore, CXCR4 receptor had a highly divergent water diversion pathway when compared with A 2A R and D3R receptors. The hydrogen bonding network with time for various systems between TM residues and internal water molecules were calculated to reveal that more hydrogen bonding numbers were observed in the nucleotide-free G αi state than in GDP-bound G αi state, apo CXCR4 and IT1t-bound CXCR4 systems ( Supplementary  Fig. S7).
Conformational changes of G αi -protein cause GDP leaving induced by binding to CXCL12-bound CXCR4. The α5-helix initially buried itself deep into the cytoplasmic cervices of CXCL12-bound CXCR4, and the α5-helix residues (I344, K345, and N347) in the GDP-bound state interacted with R148 ICL2 , A303 7.54 , A137 3.53 , and K236 6.32 of the CXCR4 receptor. During the MD simulations, the α5-helix of G αi -protein rotated counterclockwise and translocated closer to the cytoplasmic region of CXCR4 through electrostatic interactions ( Supplementary  Fig. S6a, b). In the meantime, the separation of the helical domain away from the Ras domain gradually increased the interdomain distance in the GDP-bound G αi -protein state. Furthermore, the distances between switch I (C α atom of T182) and switch II (C α atom of G202) in the empty (nucleotide-free) G αi state appeared larger than in the GDP-bound G αi state (Fig. 7a). A free energy landscape as a function of the α5-helix and interdomain distance of G αi -protein was performed to assess the stability of conformational states of tricomplex because these features are crucial in G-protein activation. The energy landscape for the α5-helix rotation and G αi -protein interdomain distances clearly revealed three conformational states of CXCL12-CXCR4-G αi tricomplex, GDP-bound G αi , nucleotide-free G αi , and GTP-bound G αi states. The representative conformation of each state also demonstrated that more internal water was present in nucleotide-free G αi state than in other states. (Fig. 7b). Moreover, the tyrosine toggle switch profiles with time for three different G i -protein states indicated that the switch distance between Y219 5.58 and Y302 7.53 after G i -protein binding was lower than in the inactive CXCR4 state (Supplementary Fig. S8). In summary, these results did not reveal the large domain opening of G αi -protein for the GDP/GTP exchange, but did highlight trends in the rotation and uptranslation of the α5-helix and the increase in distance between switch I and switch II when the helical domain underwent a conformational change of G αi -protein to increase the interdomain distance for GDP leaving from the binding pocket. These findings were supported by previous computational results 11,13 that the α5-helix interacts with the cytoplasmic pocket of the activated RHO receptor, causing displacement of the helical domain and GDP release. Recently solved structure of β 2 AR in complex with 14 a.a. from C-terminal G s with GDP-bound revealed that α5helix rotation and tilt may differ depending on the G αs -bound β 2 AR states, which also corresponds to our simulation results 36 .
Internal water channel reformation for CXCL12-bound CXCR4 in complex with G αi -protein. Based the findings in the different simulation systems, an internal water channel formation model for CXCL12-bound CXCR4 in complex with G αi -protein was proposed (Fig. 8). Because CXCL12 initially binds to CXCR4 through electrostatic interactions, K1 of the embedded Nterminus of CXCL12 interacted with E288 7.39 of CXCR4 to activate the receptor and water flowing from the EC region was hindered by HL1, diverting the pathway, whereas the water from the IC region flowed through the broken HL2. During the activation, H294 7.45 interacted with W252 6.48 to link the water molecules from both the EC and IC regions, forming a nearly continuous water pathway. These two critical residues (W252 6.48 and H294 7.45 ) have also been associated with CXCR4 activation in previous mutagenesis experiments 9,37-39 . TM6 moved outward for G-protein coupling, and TM7 moved inward to bring Y302 7.53 closer to Y219 5.58 , disrupting the HL2. During the GDP-bound G i -protein association with the cytoplasmic region of CXCR4, the continuous water pathway was disturbed and the α5-helix of G αiprotein was rotated and uptranslated to gradually enlarge the interdomain distance for GDP leaving. Then, Y302 7.53 swung to break HL2 and reform a continuous water pathway in the nucleotide-free G αi -protein state (Fig. 8). During GTP binding to nucleotide-free G αi -protein, α5-helix rotation decreased; this induced an interdomain distance decrease caused by G i -protein conformational changes, which might have disrupted the continuous water pathway again (Fig. 7b).
Conclusions. In this study, the microsecond-scale MD simulations could not simulate the entire CXCR4 activation process. However, the atomic-level dynamic information allowed us to observe certain key intermolecular changes during CXCR4 activation, such as tyrosine toggle switch, HL breaks that form the continuous internal water pathway, and conformational differences of α5-helix of G αi -protein. The proposed internal water channel formation model for CXCL12-bound CXCR4 in complex with G αi -protein based on our MD results is valuable, and it might be useful for further understanding the activation mechanism of CXCR4 and in anticancer drug development.

Methods
Construction of receptor CXCR4 with N-terminus. Because the solved crystal structure of CXCR4 (PDB: 3ODU) lacks the N-terminus 4 , which is crucial in the initial binding of CXCL12 to CXCR4 6 , a full-length CXCR4 was constructed by attaching a missing N-terminus (a.  Fig. 7 Molecular switches of G αi -protein and energy landscape at various G αi -proteins bound to pre-activated CXCR4 states. a Distance between the C α atoms of T182 of switch I and G202 of switch II of G αi protein in diverse states. GDP-bound G αi , empty G αi -bound RHO, empty G αi -bound CXCR4, and GTP-bound G αi , are marked using black, green, red, and blue lines, respectively. The blue dash line represents the crystal structure of G αi protein. In these simulations, the switch I and switch II distance increased in the empty state (11.5 Å) compared with the GDP-bound G αi state (8.8 Å). The empty G αi state is also called nucleotide-free G αi state. b Free energy landscape of CXCL12-bound CXCR4 in complex with various G αi states as a function of α5-helix rotation and interdomain distance. The energy unit used is kcal/mol. involves the interactions: between the N-loop of CXCL12 and the N-terminus of CXCR4 (site I), which was suggested as the interaction of the CXCL12 RFFESH loop with the N-terminal region of CXCR4, and between the N-terminal of CXCL12 and the extracellular region of CXCR4 (site II), which was reported as the N-terminus of CXCL12 embedded into the TM region of CXCR4 [5][6][7] . Therefore, based on these experiments, we performed the docking of CXCL12 to receptor CXCR4. Initially, the chemokine CXCL12 was docked into receptor CXCR4, where the most part of TM region of the receptor and all the intracellular residues were blocked to allow the N-terminal domain, extracellular loops and several transmembrane residues for ligand binding. The ligand binding residues from S6-A21 were specified to interact with N-terminus of CXCR4 to enhance the docking accuracy 6,7,21 . ZDOCK searches conformational space by rotating the ligand around its geometric center with the receptor maintained fixed in space. The rotational search sampling grid was used as a 6°grid which sampled a total of 54,000 docked poses. The ZRANK function, part of the ZDOCK protocol, was used to rerank the top 2000 docked poses. Higher scores obtained from the ZDOCK program suggested that the complex structures were of higher quality. The poses generated from ZDOCK were clustered into a maximum of 50 groups. The RDOCK protocol was used for further refinement of the poses with higher ZDOCK scores, using a CHARMm-based energy minimization scheme for the optimization of intermolecular interactions. For more detailed settings of the ZDOCK module, please refer to our previous studies 20, 40 . To determine the preferable docking poses, the lower RDOCK scores with lower binding energies, and the binding information from previous experiments were both evaluated. The structure with the lowest RDOCK scores was selected for further MD simulations.
Molecular docking of antagonist IT1t to receptor CXCR4. Antagonist IT1t was manually built using the MOE software package. The topology and parameter files of IT1t, not supported in the GROMACS program, were obtained from the Gly-coBioChem PRODRG2 web server (http://davapc1.bioch.dundee.ac.uk/cgi-bin/ prodrg) provided by Prof. Daan van Aalten 41 under the GROMOS 53A6 force field, which is also suitable for biomolecules. It was also confirmed using the Automated Topology Builder and Repository web server (ATB ver. 2.2) 42 . For more details on the docking of small compounds, please refer to our previous study 26 .
Molecular docking of GDP-bound G αi -protein to CXCL12-bound CXCR4. To date, the complex structure of nucleotide-bound G protein binding to GPCR is still not available. To traverse the complete downstream signaling process of CXCL12bound CXCR4 in complex with G αi -protein, the GDP-bound G αi protein was docked to the cytoplasmic region of the pre-activated CXCR4. After carrying out 1.8-µs MD simulations for CXCL12-bound CXCR4, the final structure was selected to dock with GDP-bound G αi using the ZDOCK module of Discovery Studio 3.5 (BIOVIA, http://accelrys.com), which the docking protocol is the same as CXCL12 docked to CXCR4. To increase the accuracy of docking, the TM and the extracellular regions of the receptor were blocked, and only the cytoplasmic region of CXCR4 was filtered.
Homology modeling of CXCL12-bound CXCR4 in complex with G αi -protein.
The RHO-G αi complex (PDB code: 6CMO) 14 and μ-OR-G αi complex (PDB code: 6DDE) 15 were solved in 2018. To our knowledge, only the complex structures of GPCR-nucleotide-free G protein were solved, to obtain the preferable binding pose of CXCL12−CXCR4 bound to G i -protein in the nucleotide-free and GTP-bound states, we followed the homology modeling method 13 by using the solved complex structure of the RHO-G αi complex as a template to construct a comparative model for the CXCL12-G αi -bound CXCR4 tricomplex, such as CXCL12-nucleotide-free G αi -bound CXCR4 and CXCL12-G αi -GTP-bound CXCR4.
Molecular dynamics (MD) simulations. All MD simulation protocols were carried out according to our previous studies 20,26,40 using GROMOS 53A6 force field with the GROMACS 4.6.7 software package and an integration step size of 2 fs. All systems were embedded in the POPC lipid bilayer systems (2 × 144 lipids), and the overlapping lipid was removed. The systems were hydrated using SPC216 water molecule. They were subsequently neutralized by adding ions (Na + and Cl − ) to generate 0.15 mol/L NaCl solution. The simulations were conducted in the NPT ensemble, employing a velocity-rescaling thermostat at the constant temperature of 310 K and 1 bar. Semi-isotropic pressure coupling was applied with a coupling time of 0.1 ps and a compressibility of 4.5 × 10 −5 bar −1 for the xy-plane as well as for the z-direction. Long-range electrostatic interactions were calculated using the particle-mesh Ewald (PME) summation algorithm with grid dimensions of 0.12 nm and an interpolation order of 4. Lennard-Jones and short-range Coulomb interaction cut off values were 1.4 and 1.0 nm, respectively. The equilibration protocol was based on our previous studies 20,26,43 shown in the following, (i) the temperature was gradually increased from 100 K to 200 K and 310 K. The system was run for 500 ps for each temperature. During these simulations the complex structure remained fully restraint (k = 1000 kJ mol −1 nm −2 ). (ii) At 310 K the restraints kept on the complex structure via the force constant k, were released in 3 steps (k = 500, 250, 100 kJ mol −1 nm −2 ). Each step was run for 2.0 ns. After equilibration, production runs were carried out without any constraint in all structures. Details for all simulations were listed in Table 3. Two replicates were performed for each system with different initial random numbers to obtain similar results.
Energy landscape calculations. Previous studies have used free energy landscape (FEL) or potential of mean force (PMF) to assess the stability of protein conformations in local minimum energy state from numerous conformation changes 44,45 . Previous studies have suggested that the Ras domain α5-helix interacts with the cytoplasmic pocket of GPCR to trigger the displacement of the helical domain and GDP release. Upon activation of G protein coupling with GPCR, the interdomain distance may increase for GDP/GTP exchange 11,13,36 . In this study, we analyzed the distribution of the conformational states in terms of FEL or PMF as a function of α5-helix rotation and interdomain distance. The details were described in the following, G(α5, interdomain) = −k B TlnP(α5, interdomain) Where P, T, and k B , are the probability distribution function, the absolute temperature, and the Boltzmann constant, respectively. The α5-helix rotation and interdomain distance with time were used to obtain the probability distribution P (α5, interdomain), which was computed based on the trajectory for each system at 310 K. GROMACS code g_sham was used to calculate the free energy landscape.
MD simulation analysis. VMD 46 with in-house scripts, GROMACS, and MOE software were used for visualization and analyses. The calculations of residue distance were performed by using GROMACS code g_bond. For tyrosine toggle switch, we measured the distance between the oxygen atoms of the side chains of Y219 5.58 and Y302 7.53 throughout MD simulation. For interdomain distance, the distance between C α atoms of residues A238 and E276 of G αi -protein were measured. The hydrogen bonding network was calculated by using "Hydrogen Bonds" implemented in VMD for the MD trajectories of various systems. The water density maps were created by using GROMACS code g_densmap which analyzed the MD trajectories of various systems. The kink angle of TM6 with time for various simulation systems was calculated by using GROMACS code g_angle which three C α atoms of residues (I245 6.41 , P254 6.50 , and G258 6.54 ) were selected to measure the angle of TM6. The rotational angle of α5-helix of G αi -protein was calculated by using GROMACS code g_helixorient which calculated the coordinates and direction of the average axis inside the helix. Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request. The initial structure files and MD trajectories used in the study are deposited in persistent repository, figshare (https://figshare.com/s/ 8bd1a3615833d5b0e58f; https://figshare.com/s/b17197c734685c50dc01) (ref. 47 ).