A molecular dynamics simulation study decodes the Zika virus NS5 methyltransferase bound to SAH and RNA analogue

Since 2015, widespread Zika virus outbreaks in Central and South America have caused increases in microcephaly cases, and this acute problem requires urgent attention. We employed molecular dynamics and Gaussian accelerated molecular dynamics techniques to investigate the structure of Zika NS5 protein with S-adenosyl-L-homocysteine (SAH) and an RNA analogue, namely 7-methylguanosine 5′-triphosphate (m7GTP). For the binding motif of Zika virus NS5 protein and SAH, we suggest that the four Zika NS5 substructures (residue orders: 101–112, 54–86, 127–136 and 146–161) and the residues (Ser56, Gly81, Arg84, Trp87, Thr104, Gly106, Gly107, His110, Asp146, Ile147, and Gly148) might be responsible for the selectivity of the new Zika virus drugs. For the binding motif of Zika NS5 protein and m7GTP, we suggest that the three Zika NS5 substructures (residue orders: 11–31, 146–161 and 207–218) and the residues (Asn17, Phe24, Lys28, Lys29, Ser150, Arg213, and Ser215) might be responsible for the selectivity of the new Zika virus drugs.

Currently, two major issues occur due to (1) the conformational states of protein receptors between binding and unbinding states and (2) the pathway of the inhibitor-protein interaction process.
All-atoms molecular dynamics (MD) simulations remain limited to the conformational ensembles obtained from a single long-time-scale conventional molecular dynamics (cMD) simulation due to the possible energy barriers between various intermediate states. Therefore, a multiscale simulation method that combines an enhanced sampling technique, which can take samples at various intermediate states, with an all-atoms simulation is required. Enhanced sampling techniques have been successfully applied to calculate the free-energy profiles 9 and to perform conformational sampling through accelerated molecular dynamics (aMD) 10 . These enhanced sampling methods can provide key insights into the free-energy profiles and intermediate protein structures. In general, the disadvantage of enhanced sampling techniques is the requirement to predefine protein structures, potential energies, additional energies, and reaction coordinates. Simulations using aMD or Gaussian aMD (GaMD) are enhanced sampling methods that can avoid such requirements. Through aMD, a boost of potential energy is added to the potential energy surface system resulting in a decreased energy barrier, which enables the acceleration of the transitions between low-energy states [10][11][12] . This method has been successfully applied to simulations of biological systems, and hundreds-of-nanoseconds aMD simulations can yield the same results as millisecond cMD simulations [13][14][15][16] .
In recent studies, the aMD method has produced substantial energetic noise during reweighting 17 . In aMD simulations, the applied boost potential is typically of the order of tens-to-hundreds of kilocalories per mole, which is usually greater than those of other enhanced sampling methods. Accurately reweighting aMD simulations has been problematic, especially for large protein molecules 18 . By introducing GaMD, Miao et al. provided an approach to improving the aMD method. The boost potential of the GaMD method follows a near-Gaussian distribution, for which the cumulated second-order expansion improves the reweighting of aMD simulations 19 . The reweighted free-energy profiles yielded by GaMD are in close agreement with long-time-scale cMD simulations 20 .
In the present study, the experimental biological binding affinities 21 (IC50) of SAH and 7-methylguanosine 5′-triphosphate (m7GTP) were compared using GaMD simulations; these compounds are listed in Table 1. The transfer function (ΔGbind = −RT ln(IC50)) is used to convert the IC50 values into experimental ΔGbind values, which are listed in Table 1. We applied GaMD to simulate the Zika virus NS5 protein with SAH and m7GTP. Starting from the full-length of the Zika NS5 x-ray structure, the interaction of the two molecules with the Zika virus NS5 protein was studied. We demonstrated that GaMD simulation enables a detailed analysis of the interaction of these two molecules with the Zika NS5 protein.

Theoretical Calculations Methods
Gaussian accelerated molecular dynamics. GaMD is an enhanced conformational sampling method for biomolecules that adds a harmonic boost potential to smooth the system potential energy surface 19 . When the system potential (V) is lower than a referenced energy (E), a harmonic boost potential (ΔV) is added as follows: 2 where K is a harmonic force constant. The modified system potential (V*) is given by If the system potential (V) is greater than a referenced energy (E), the harmonic boost potential (ΔV) is equal to zero. By smoothing the potential energy surface for overcoming intermedia energy barriers, the boost potential satisfies the following step. For two potential energy values V1 and V2, assume that V1 < V2 and the biased V1* < V2*. By replacing V* with equation (2), the relationship is expressed as follows: Step (1) If V1 < V2, the potential difference on the smoothed energy surface should be smaller than that of the original energy surface. By replacing V* with equation (2), the relationship is expressed as Step ( where Vmin and Vmax are the minimum and maximum potential energies. Step (3) By equation (5), we can obtain where K constant is defined as and K0 is the magnitude of the applied boost potential.
Step (4) The standard deviation (SD) of ΔV must be sufficiently small to ensure accurate reweighting 22 .
where the Vave and σ V are the average and SD of the potential energies, and σ ΔV is the SD of ΔV with σ0 as a user-specified upper limit for accurate reweighting of potential energies. In our simulations, the SDs of the total potential and dihedral potential boosts are 10 kcal/mol.
Step (5) To extend step (2), if E = Vmax, we can use equation (5) to obtain According to equations (21) and (19), K0 can be defined as: Step (6) To extend step (2), if E = Vmin + 1/k, we can use equation (8) to obtain Step (7) GaMD provides the total potential boost, dihedral potential boost, and dual potential boost in order to accelerate the molecular simulations. The boost potential (ΔV) is given as follows: where K0 is the magnitude of the applied boost potential, and Vmin and Vmax are the minimum and maximum potential energies of the system. Initially, K0 is equal to 1.0, and Vmax and Vmin are obtained through cMD simulations. The distribution and anharmonicity of the GaMD method were applied to the alanine dipeptide, chignolin, and lysozyme simulations to characterize the extent to which ΔV follows a Gaussian distribution 19 .
Reweighted free-energy calculations for GaMD simulations. The probability distribution of the selected reaction 22 coordinate A(r) is defined as P*(A), where r can be distance, angle, root-mean-square deviation, or other factors. Using the GaMD boost energies of each reaction coordinate, P*(A) can be reweighted and defined as where M is the number of bins, β is equal to the KBT, and β∆ e V r j ( ) is the ensemble-average factor of the jth bin.
For reducing the energetic noise, the ensemble-average factor can be defined as follows: According to equation (14), the first three cumulants can be defined as follows: The reweighted free energies can then be calculated by

Free-energy calculations (WHAM) for umbrella sampling simulations. A harmonic potential was
applied to the stretching 23 constraints, i.e. the distance constraints between the center of mass of the ligands and the binding pockets (Figs S1 and S7) with a force constant of 10.0 kcal/mol. The RC1 reaction coordinates follow the distance from the centre of the SAH mass to the centre of mass defined by the residues that shape the binding pocket (Gly86, Trp87, Thr104, Lys105, Asp131, Val132, Asp146, and Ile147; Fig. S1). The RC2 reaction coordinates follow the distance from the centre of the m7GTP mass to the centre of mass defined by the residues that shape the binding pocket (Lys13, Leu16, Asn17, Met19, Ser150, Ser151, and Ser215; Fig. S7). The RC1 value varied from 3.0 to 24.0 Å in 1 Å increments. The RC2 value varied from 6.0 to 22.0 Å in 1 Å increments. The MD simulations for PMF determination were performed with an initial 5-ns equilibration followed by 10-ns sampling at a given reaction coordinate value. Moreover, the umbrella sampling simulations were performed with GENESIS 1.2.0 software 24 . Then the free energy profiles (PMF) were analysed with the WHAM software 25 .
GaMD simulation of the Zika virus NS5 protein system. First, we modified a partial length of the Zika NS5 protein structure (PDB ID: 5kqs), and we used PyMOL software to modify the ligand (m7GDP to m7GTP). Second, we aligned the partial length of the Zika NS5 protein structure and the full length of the Zika NS5 protein structure (PDB ID: 5u0b). We then inserted the ligand (m7GTP) into the full-length Zika NS5 protein structure. The initial complex structures are Zika NS5 with SAH and Zika NS5 with m7GTP. These complex structures were generated and then inserted into TIP3P solvent molecules. The size of the complex structures was approximately 10.00 × 11.00 × 11.00 nm 3 . These initial complexes were then simulated using the AMBER 16 package with the AMBER FF12SB all-hydrogen amino acid and amber GAFF parameters. The geometries of the SAH and m7GTP were fully optimized, and their electrostatic potentials were obtained using a single-point calculation at the Hatree-Fock level with the 6-31 G(d,p) basis set using the GAUSSIAN 09 program 26 . Subsequently, partial charges were obtained by employing the restrained electrostatic potential procedure using the Antechamber package. All cMD simulations were performed in the isothermal-isobaric (NPT) assembly with a simulation temperature of 310 K, unless stated otherwise, by using the Verlet integrator with an integration time step of 0.002 ps and SHAKE constraints 27 for all covalent bonds involving hydrogen atoms. In the electrostatic interactions, atom-based truncation was performed using the PME 28 method, and the switch van der Waals function was used with a 2.00 nm cut-off for atom-pair lists. These complex structures were minimized for 100,000 conjugate gradient steps and then subjected to a 100-ns, isothermal, constant-volume MD simulation. Moreover, the final structures from these simulations were used in five dependent 5000-ns GaMD simulation calculations, and these structures were used in the umbrella sampling simulations.
Reweighted Free-energy calculation through potential of mean force and preresidue displacement. For the Zika NS5 protein with SAH (RC1 = 3-8 Å) and m7GTP (RC2 = 6-11 Å), all residues were obtained using preresidue displacement calculations. The preresidue displacement calculations were applied to the major barriers: RC1 = 3-8 Å and RC2 = 6-11 Å). The reaction coordinate profiles and preresidue displacement calculations were analysed using the AmberTools 16 and VMD software packages. The reaction coordinates profiles were calculated for the reaction coordinates used for the potential of mean force (PMF) calculations. The PyReweighting toolkit 22 was used to reweight the GaMD simulations for the PMF profile calculations and to examine the boost potential distributions. One-dimensional PMF profiles were also constructed using the reaction coordinates for the Zika NS5 protein with a bin size of 1.0 Å. When the number of simulation frames within a bin was lower than a certain limit, the bin was insufficiently sampled and thus was excluded for reweighting.

Results and Discussion
Potential of mean force calculation for Zika virus NS5 proteins: SAH-and m7GTP-complex GaMD simulations. Following the original paper "Reweighted free-energy calculations (PyReweighting-1D. py), " 22 we used five individual and independent 5000-ns GaMD simulations to estimate the error bar and average of the PMF, a measure of free energy, using the second-order cumulant expansion method 22 . Our results are listed in Table 1 and Fig. 1  structure-based approach was applied to identify the functionally important residues of the major energy barriers at RC1 = 3-8 and RC2 = 6-11 Å. From the complex structures at RC1 = 3-8 and RC2 = 6-11 Å, the important residues and pharmacophore regions were analysed using the Ligandscout program. Because there were many snapshots, residues with probability of >0.5 were selected for the binding mode analysis. Our results are listed in Tables 2 and 3 (Figs S1-S12 present the representative snapshots at RC1  Table 4 and Figs S13-S17. For the preresidue displacement analysis of m7GTP, the three Zika NS5 substructures (residue orders: 11-31, 146-161, and 207-218) had clear fluctuations; the results are displayed in Table 5 and Figs S18-S22.
Comparing the m7GTP binding residues of the Zika virus NS5 proteins with those of other Flavivirus NS5 proteins. The Asn17, Phe24, Lys28, Lys29, Ser150, Arg213, and Ser215 residues (PDB ID: 5U0B) were selected for comparison with other Zika NS5 proteins (PDB IDs: 5GOZ, 5GP1, 5KQR, 5KQS, 5M5B, 5TFR, 5TMH, 5ULP, 5VIM, 5WXB, 5WZ1, and 5WZ2). Our results indicated that these residues were conserved among the Zika NS5 proteins, and the results are shown in Fig. S23. The Asn17, Phe24, Lys28, Lys29, Ser150, Arg213, and Ser215 residues (PDB ID: 5U0B) were selected for comparison with the other Flavivirus NS5 proteins (PDB IDs: 3EVF (Yellow fever), 4V0R (Dengue fever), 2HKS (West Nile fever) and 4K6M (Japanese encephalitis)). The results are shown in Fig. S24. Except for two residues (Lys28 and Lys29), the residues were conserved among the NS5 proteins. The two residues of the West Nile fever, Japanese encephalitis, and Yellow fever viruses were Arg-Lys, Arg-Arg, and Lys-Arg, respectively. The Arg and Lys residues have similar structures and isoelectric points. Thus, we think that these residue differences might have a reduced impact on the binding affinity of NS65 proteins with m7GTP.

Conclusions
In this article, we used full-length Zika NS5 proteins (PDB ID: 5u0b; NS5 bound to SAH and m7GTP) as our initial structures. We employed 100-ns cMD simulations to optimize the two Zika virus NS5 protein complex structures. The RC1 reaction coordinates were defined as the distance between the centre of mass of SAH and the centre of mass of the binding pocket (Gly86, Trp87, Thr104, Lys105, Asp131, Val132, Asp146, and Ile147; Fig. S1). The RC2 reaction coordinates were defined as the distance between the centre of mass of m7GTP and the centre of mass of the binding pocket (Lys13, Leu16, Asn17, Met19, Ser150, Ser151, and Ser215). Then, we performed GaMD, preresidue displacements, and PMF calculations to predict the binding mechanisms of these molecules with Zika virus NS5 proteins. For the Zika virus NS5 protein and SAH complex, there is a major energy barrier of 8.42 ± 1.98 kcal/mol RC1 = 3-8 Å. For the Zika virus NS5 protein and m7GTP complex, there is a major energy barrier of 5.21 ± 1.21 kcal/mol at RC2 = 6-11 Å. Our energy barrier predictions were similar to the experimental data (Table 1). Moreover, we used the WHAM/umbrella sampling methods to check our reweighted free energy calculations. The energy barrier predictions were relatively close to our reweighted free-energy calculations. For the Zika NS5 protein and SAH complex, our results indicated that the four Zika NS5 substructures (residue orders: 101-112, 54-86, 127-136, and 146-161) and the Ser56, Gly81, Arg84, Trp87, Thr104, Gly106, Gly107, His110, Asp146, Ile147, and Gly148 residues affect the ability SAH to bind with the full-length Zika NS5 protein.