Treatment of flexibility of protein backbone in simulations of protein–ligand interactions using steered molecular dynamics

To ensure that an external force can break the interaction between a protein and a ligand, the steered molecular dynamics simulation requires a harmonic restrained potential applied to the protein backbone. A usual practice is that all or a certain number of protein’s heavy atoms or Cα atoms are fixed, being restrained by a small force. This present study reveals that while fixing both either all heavy atoms and or all Cα atoms is not a good approach, while fixing a too small number of few atoms sometimes cannot prevent the protein from rotating under the influence of the bulk water layer, and the pulled molecule may smack into the wall of the active site. We found that restraining the Cα atoms under certain conditions is more relevant. Thus, we would propose an alternative solution in which only the Cα atoms of the protein at a distance larger than 1.2 nm from the ligand are restrained. A more flexible, but not too flexible, protein will be expected to lead to a more natural release of the ligand.

www.nature.com/scientificreports/ In a SMD study, Zhang 58 changed the k-constant of the restrained harmonic potential to explore the influence of protein flexibility.This author demonstrated that a lower harmonic force applied to all C-atoms of the protein results in a larger variety of protein-ligand conformations.Except for Zhang's study, to our best knowledge, no additional information on this aspect is available.Such an absence has indeed led to a deeper inconsistency when various restrained methods are still used in SMD simulations, as mentioned in the paragraph above (cf.Table1).Evidence from Zhang's study has clearly warned that a rigid fixing of all heavy atoms or all Cα atoms could neglect the contribution of protein motion to the unbinding process.In contrast, from the opposite viewpoint, we are concerned that a too weak restrained force or a too flexible protein is not be able to stop the drifting of the whole structure 59,60 .A legitimate question is what could happen when the external force focuses on stretching the protein rather than on rupturing the ligand-protein interaction.
Another issue is as to whether the entire ligand-protein complex drifts under the influence of the water bulk layer.In this context, the questions of interest that motivate the present study is how the protein motion could be restrained in SMD simulations, and what differences could be expected by these different methods.
To investigate the differences in relevant approaches, we are developing various restricted ways before applying them to the same ligand-receptor complex.To take these ways into account, we propose six parallel SMD simulation systems in which six different groups of Cα atoms of a protein are held.Unlike the previous study of Zhang 58 where the author reduced the k-constant of the harmonic potential from 1000 to 5 kcal/mol.nm 2 , we relax in the present study the protein by narrowing the portion of the protein backbone where a restrained harmonic potential could be applied.Each of the six independent preparations is described in detail in the following section.
Under the influence of six different approaches, changes in the protein's geometric structure are expected to be observed.The ligand-protein complex chosen from the PDB Bank needs to satisfy the following conditions, namely, (a) the protein has a wide enough tunnel to prevent the ligand from collapsing the protein gorge; (b) no metallic atoms are present in the protein binding pocket [61][62][63] and (c) only one protein chain has the ability to carry out the protein's function.During the output data collection, the intermolecular interaction between protein and ligand is calculated, including the number of contacts and the number of hydrogen bonds.The activity of every major residue located near the active site is carefully explored.The force-time dependence and the displacement-time dependence are monitored, in a similar way as in previous traditional SMD simulations.A non-equilibrium process with a smaller unbinding barrier would represent a state closer to an equilibrium process.The most sufficient way of using a restrained harmonic potential could be introduced for further investigation aiming to demonstrate the ligand-protein unbinding pathway by SMD simulation.

System preparations
The PDB Bank provides atomistic structures of ligand-protein binding complexes.Ligand-contained proteins are sourced from the PDB Bank, the relevant literature, and previous reviews, etc. Six proteins including the ones noted as PDB-IDs 4JNJ, 2JFZ, 1PYE, 1TSL, 2YDV, 1EVE are selected.Before adding hydrogen atoms using the Gromacs software (version 2020 64 ), missing residues are repaired with Pymol Software 65,66 .The Amber ff99SB-ILDN force field is implemented.The Gaussian 16 67 package is used to optimize geometry structure of the ligands and subsequently to determine the charge distribution.The ligand's conformations are optimized and its electrostatic potential maps are calculated at the B3LYP/6-31 + G(d,p) level.Atomic net charges of the ligands are derived using the RESP 68 method.The Antechamber module of AMBER Tools is applied to calculate additional parameters for the compounds using the General Amber Force Field (GAFF) 69,70 .Simulated cubic boxes are set to ensure a distance greater than 0.6 nm between the protein surface and boundaries.After solvating in water, Na or Cl ions are added to neutralize the system.Particle mesh Ewald (PME) 71 method is used for longrange electrostatic interactions, and periodic boundary conditions are set.The SHAKE 72 algorithm is applied to covalent bonds of hydrogen atoms.The non-bonded contact between two atoms is avoided when the pair distance is larger than 1.0 nm.The system preparation is similar to our previous setup. 24The tables mentioned above, (Table 2 and Tables S1 and S2 in the Supplementary Information file (SI)) list the PDB-IDs of the complexes utilized.Pulling directions are identified based on the support of the Carver web server version 1.0 and aligned to the Z-axis of coordinate systems 73 .

Restraining method
The present study aims to determine the influence of different restrained methods on the results obtained by conventional steered molecular dynamic simulations.Six ways of fixing protein backbone are prepared: (1) fixing all heavy atoms of protein (C, N, S, O); (2) fixing all Cα atoms of protein; (3) fixing all Cα atoms with a distance to ligand greater than 1.2 nm; (4) fixing all Cα atoms with the perpendicular distance in the pulling direction to ligand greater than 1.2 nm; (5) fixing all Cα atoms with the perpendicular distance in the pulling direction to ligand greater than 1.2 nm and all Cα atoms with the perpendicular distance in the x-y direction to ligand smaller than 1.2 nm, and (6) fixing all Cα atoms with the perpendicular distance in the pulling direction to ligand greater than 1.8 nm and all Cα atoms with the perpendicular distance in the x-y direction to ligand smaller than 1.2 nm.Six groups of protein atoms are presented in Fig. 1.In the discussion hereafter, each way of fixing is named as mode, from mode 1 to mode 6 corresponding to the six fixing conditions defined above.

Steered molecular dynamics simulations
Some of the complexes selected here have already been examined in previous studies 39,74 .However, creating uniformity within existing results is not possible due to large variations in parameters including force field, pulling velocity, and the pulling direction-defined method… implemented in those protocols.Therefore, simulations Vol:.( 1234567890 www.nature.com/scientificreports/with a consistent set of parameters are carried out.In our protocol, a hundred independent trajectories will be performed in each mode of the restrained method.A constant pulling velocity (v = 1.0 nm/ns) and a spring constant (k = 600 kJ/mol/nm 2 ) are chosen.Snapshots are saved every 5000 steps (2 fs for each step).The duration time of pulling is set to 3 ns to ensure that all ligands are pulled far away from the protein.In each trajectory, the time-dependent force/displacement is recorded every 10 fs.External pulling work and unbinding barrier free energy are computed using the protocol defined in our previous studies 24 .

Hydrogen bond and contact
When a protein's heavy atom has the smallest distance to one atom of the ligand, being less than 0.6 nm, an intermolecular contact is formed.If the acceptor-donor distance is less than 0.35 nm and the acceptor-hydrogendonor angle is greater than 135 0 , a hydrogen bond is considered available.Tasks are performed using the Gromacs packages including gmx hbond and gmx mindist.All hydrogen bonds created between inhibitors and the receptors are taken into account.In each snapshot of an independent SMD trajectory, the number of hydrogen bonds and the ligand's center of mass (COM) are analyzed.Following this way, we can average the number of hydrogen bonds depending on the position of the COM.

Protein root mean square deviation
The root mean square deviation (RMSD) of the protein is calculated as the dissimilarity of all atom coordinates to its initial structure in three dimensions.These analytical procedures are carried out using the gmx_rms tool supported in the Gromacs package, and the appropriate formula is as follows: where N is the number of atoms of the protein, and x represents the three-dimensional coordinate.

Convergence of numerical data and distorted Gaussian-type distribution of values
Every physical quantity considered in this study is derived from a non-equilibrium process.Due to our limited computing power, we could only apply a fast growth evolution when the pulling velocity is chosen at 1 nm/ns.In replicating the results of the AFM experiment and analyzing the SMD data, we monitor the time-dependent force and record the maximum value, known as F max , right after a rupture event occurred.To achieve the outcome convergence, previous studies have indicated that steered molecular dynamics (SMD) simulations require a suitable number of independent trajectories, neither too large nor too small.In the context of using a relatively small velocity (v = 1 m/s), although there are minor differences in the number of trajectories each system required to reach result convergence, we recognize that about 100 orbits are sufficiently large for all of them.Figure 2 serves as an example, including the rupture force and external work in dependence on the number of pulling trajectories.Accordingly (Fig. 2), data convergence is found in every restraint mode, from mode 1 to mode 6, and in every ligand-protein complex (data not shown).

Force-time and work-time profile obtained from different restrained modes of the ligand release
All values of F max and W pull are listed in Table 3. Overall, mode 1 and mode 2 of every system are consistently held the first and second positions, significantly larger than the others.The mean values of rupture force and  Over the past two decades, there has been much discussion about how a pulling rate 14,75,76 influences the rupture force and external work of a non-equilibrium process.There is a consensus that a decrease of the pulling rate does not only decrease the rupture force and the external work but also push the non-equilibrium process to evolve into an equilibrium one.In other words, the smaller the value we obtain, the faster the system reaches an equilibrium.According to this knowledge, neither mode 1 nor mode 2 is expected to attain a sufficient way to www.nature.com/scientificreports/generate a natural process of ligand-protein dissociation.This information could normally be predicted, but our evidence serves as a useful warning for an eventual selection of mode 1 or mode 2 in a SMD simulation.Figure 3 shows the time-dependent pulling force, displacement, RMSD, and external work in a representative trajectory from pulling the 4JNJ system.The averaged lines from 100 trajectories are shown in Figs.S1-S6 of the SI file.In the paragraph above, we mention that the modes in class A have stronger rupture forces and larger pulling works, that are outstanding as compared to the outcomes attained by the approaches in class B.Here we are going into more detail about the results of simulations generated using lighter restraining approaches, from mode 3 to mode 6. Collected data in almost all cases, listed in Table 3, seem to interpret two common understandings, namely, i) the more flexible the protein is, the smaller the rupture force will be, and ii) the more flexible the protein is, the lower the pulling work the external force will perform.
When examining the 1EVE system, the mean rupture forces in modes 4, 5 and 6 are measured at values: 608 ± 7 pN, 597 ± 7 pN, 605 ± 7 pN.Similarities are also detected in the 2YDV system and 4JNJ system, modes 3, 4 and 5; in 1TSL system, modes 4, 5and 6.A question of concern is as to whether there is any misinformation.It is reasonable to see that these equivalent values of F max are caused due to the geometric feature of protein structure where some distinct modes have restrained some quite similar groups of Cα atoms.Protein 1EVE and 1TSL are formed from globulin structures while 2YDV and 4JNJ are formed via beta-barrel structures.Data collected from our simulations truly confirm the first statements given above.To prove the second statement, we now calculate the pulling work.
Looking now at mode 6 in all systems considered, although six systems create the smallest value of averaging rupture forces F max , the mean value of pulling work W pull is not always the lowest one.There are 4/6 protein-ligand complexes like that.The systems 1EVE and 4JNJ, under mode 6 conditions, have received the mean values W pull of 54.8 kcal/mol and 124 kcal/mol, respectively.These results are higher than the mean value of pulling work in mode 5, which are 51.9 kcal/mol and 116 kcal/mol.
To investigate these issues, we plot in Fig. 3 four curves including the time-dependent force, the timedependent displacement, the time-dependent work, and the time-dependent root-mean-square deviation (RMSD) from one representative trajectory in the 4JNJ system.According to the RMSD black line (in Fig. 3), it is rather easy to recognize that the ligand leaves its initial position immediately when the external force starts increasing.The averaged value of RMSD of the protein backbone is collected in Table 3.Since mode 5 and mode 6 usually own a higher amount of averaged RMSD, this raises a suspect: instead of breaking the protein-ligand www.nature.com/scientificreports/dissociation, the external force aims to stretch the protein and induce an unnecessary value of pulling work.More seriously, we also find in some cases of mode 6: the bulk water layer induces the protein to spin perpendicularly to the pulling direction.The ligand thereby collapses to the protein wall as captured in Fig. S25 (SI file).Although waste trajectories are not appeared frequently and are manually ejected in this examination, this leads to more concerns when using mode 6 or even mode 5 in the steered molecular dynamic simulations.In the biophysical literature, an unbinding process is usually conceptualized as a barrier crossing in which the system is transformed from a higher free energy conformation to a lower.Thus, we now construct the free energy profile of a relevant non-equilibrium kinetic process 22 .The unbinding free energy barriers of six 4JNJ representative trajectories are plotted in Fig. 4. Collected results show that if all heavy atoms of the protein are constrained, ligands must overcome the highest energy barrier to successfully escape far away from the protein.
Every complex clearly displays the remarkable values of mode 1 and mode 2 unbinding barriers which are listed in part (c) of Table 3.In the case of 1EVE, a globulin protein complex, two highly fixing strategies, all heavy atoms mode 1 and all Cα atoms mode 2, give the mean values of 60.2 kcal/mol and 27.3 kcal/mol, respectively, whereas the mode 6 restrained method only produces an unbinding barrier of 18.1 kcal/mol.All other protein-ligand complexes likewise exhibit a notable variation between the unbinding barrier's lowest value and its two greatest values.The ligand in the case of the least tightly fixing approach needs to cross small barriers of 12.1 kcal/mol for 1PYE, 12.5 kcal/mol for 2JFZ, 19.5 kcal/mol for 2YDV, 24.6 kcal/mol for 4JNJ and 8.2 kcal/ mol for 1TSL.In contrast, all remarkable values of mode 1 and mode 2 are recorded at 99. 3

Flexibly restraining mode allows more residues to form contact with ligand
In order to better comprehend about how changes in protein structure could affect the ligand escape, we explore in this paragraph the interactions between proteins and ligands.Looking over 100 trajectories of a mode and 300 frames in each trajectory, a few steps are carried out: (1) we count the number of residues that make at least one contact with the ligand; full data are collected in Table S2(SI file), (2) we figure out the averaging number of hydrogen bonds in dependence on displacement, and (3) we plot the curve of the displacement-dependent interaction energy (IE).
Interestingly, the flexibility of the protein is found to be diversified with respect to the protein-ligand interactive picture, because more residues are found to be coming into contact with the ligand during the dissociation process.Table S2 (SI file) shows the number of residues that have a time being in contact with the ligand.With 1EVE system, mode 1 results in only 55 residues that form contact with the ligand during the ligand exit.For mode 2 through mode 6, this quantity rises from 59, 66, 69, 70 to 72 residues.The proportional increase of this quantity of 1TSL and 2YDV systems is illustrated in Fig. 5.
In all complexes considered, numerical data indicates that the use of mode 1 and mode 2 tends to prevent the ability of the protein and narrow the space of configuration sampling.Notably, our displacement-dependent number of hydrogen bond in Fig. 6 and Figs.S20-S24 (SI file) demonstrates that this increase is mostly derived in the period of time after the rupture event occurs.That leads to a conclusion that when the protein is relaxed, it increases the fluctuation of residues in the nearby region.A simple fixing of the atoms allows as much as  possible the residues to participate in the interaction with ligand.From the point of view of a natural event, a participation of more residues seems to encourage the ligand escaping.This observation needs to be explored with more evidence in the subsequent investigations.

Flexibly restraining mode lengthens the ligand-protein harmonic potential
For a deeper analysis of the SMD simulations, we compute the interaction energy (IE) between both inhibitors and receptor, as a summation of polar (Coulomb) and non-polar (Van der Waals) interaction: Each snapshot of saving data is submitted into one MD-step run for computing the ligand-protein interaction.Then the profile of non-bound interaction energy plotted within the dependence on ligand's displacement is shown in Fig. 6.The full image of the interaction energies, the Coulomb potentials, and the Van der Waals potentials is shown in the SI file, from Figs. S7-S12.In particular, the length (L0) of the ligand-protein interaction potential is found to be changed as a result of the protein flexibility, which has ever been assumed to be constant in a previous theoretical study 60 .Notably, the interaction energy between the protein and ligand in the 4JNJ system is decreased into zero at the displacement of 0.5 nm (mode 2) and 2.0 nm (mode 6).This illustrates a significant difference from the smallest value of a narrower potential to the largest one of a wider potential.Since the length (L0) is an important parameter to construct the analytical expression of the protein-ligand harmonic potential, our collected data will make a reference to the building of the dependence of work on the pulling velocity in SMD study 76 .

Concluding remarks
In summary, we have presented in the theoretical study six approaches for restraining the protein movement during a steered molecular dynamics simulation.Some mean values of rupture force and pulling work from neighboring modes, with the standard error added, are indistinguishable from each other.This is caused by the geometric features of the protein structure, where two neighboring modes are found to restrain quite similarly the groups of Cα atoms.The most important fact recognized in this scheme is that the 1st and 2nd standing positions, according to the largest values of F max and W pull obtained from modes 1 and 2, are clear-cut and invariant.This observation confirms that restrain of all heavy atoms or all Cα atoms cannot be considered as a good choice in applying steered molecular dynamics simulations.
Because significantly larger values have been generated in modes 1 and 2, it will create more difficulty when trying to come closer to an equilibrium dissociation.In contrast, if the protein is too flexible, the force will occasionally be applied primarily to stretching the protein structure rather than breaking the bonds between the ligand and amino acids.Unnecessary work could be additionally created.This leads to incorrect information and much challenge for the use of the pulling work method to determine the binding affinities between small molecules and macro proteins.More seriously, bulk water layer may push the protein rotating.Although it is rather hard to give a quantitative suggestion for every kind of protein, we would recommend a sufficiently suitable approach for applying in SMD simulations, that is fixing all Cα atoms at a distance larger than 1.2 nm from the ligand, as we have applied in mode 3 and mode 4.
Determination of a physical pathway for ligand release or entering is always a principle in molecular dynamic simulations.The evidence obtained in this study has raised a common status, that is, let protein move flexibly in such a way that the ligand can move out of the protein binding pocket.Lower unbinding energy barriers, lower pulling work, lower rupture force constitute a strong set of foundation for an easier escape of the ligand.In addition, a certain suspicion emerges, as to whether a protein, in the natural process when external force is absent, intentionally transform to release the ligand.We have reason to believe in the entrance or release process of the ligand, many intermediate structures of the protein impact more importantly than a functional conformation when protein has successfully folded.This issue is still not clear because of the limitation of protein-mediated conformations.Relatively little is actually known on the observed effect to reveal the role of mediated conformations. https://doi.org/10.1038/s41598-024-59899-3

Figure 1 .Figure 2 .
Figure 1.Cartoon mapping of five restrained modes from 2 to 6. Representative group of atoms of CDK-2 protein, PDB-ID 1PYE (lower).Y-Z directions are shown in (2) (restraining all heavy atoms not shown).Small images are arranged from 2 to 6, respectively, with increasing protein flexibility.

Figure 3 .
Figure 3. Time-dependent force (upper-left) and time-dependent ligand's displacement (upper-right), timedependent work (lower-left), and time-dependent RMSD (root mean square deviation) (lower-right).Results are randomly plotted from six representative trajectories under six different restrained methods of the 4JNJ system.Averaged values of rupture force and pulling work, obtained over 100 trajectories in six systems, are shown in the SI file.

Figure 4 .Figure 5 .
Figure 4. Free energy profile of six representative trajectories when a ligand was pulled out of the streptavidin protein, PDB ID 4JNJ.Results are obtained under six restrained methods.

Table 1 .
A summary of different restrain methods employed in the steered molecular dynamics simulation during the last three decades.

Table 2 .
General information of simulation.

Table 3 .
The averaged value of rupture force (a), pulling work (b), unbinding barrier (c) and root-meansquare deviation (d) obtained from 100 independent trajectories.
Vol.:(0123456789) Scientific Reports | (2024) 14:10475 | https://doi.org/10.1038/s41598-024-59899-3 and 37.2 kcal/mol for 1PYE, 147.2 and 35.4 kcal/mol for 2JFZ, 168.1 and 49.8 kcal/mol for 2YDV, 290.3 and 179.6 kcal/mol for 4JNJ, 45.4 and 15.7 kcal/mol for 1TSL.The mean curves of unbinding free energy from six systems are shown in Figs.S13-S18 of the SI file.When all free energy profiles are established it is easy to find that the unbinding barrier tends to decrease in the context of relaxing the protein.