Serine 477 plays a crucial role in the interaction of the SARS-CoV-2 spike protein with the human receptor ACE2

Since the worldwide outbreak of the infectious disease COVID-19, several studies have been published to understand the structural mechanism of the novel coronavirus SARS-CoV-2. During the infection process, the SARS-CoV-2 spike (S) protein plays a crucial role in the receptor recognition and cell membrane fusion process by interacting with the human angiotensin-converting enzyme 2 (hACE2) receptor. However, new variants of these spike proteins emerge as the virus passes through the disease reservoir. This poses a major challenge for designing a potent antigen for an effective immune response against the spike protein. Through a normal mode analysis (NMA) we identified the highly flexible region in the receptor binding domain (RBD) of SARS-CoV-2, starting from residue 475 up to residue 485. Structurally, the position S477 shows the highest flexibility among them. At the same time, S477 is hitherto the most frequently exchanged amino acid residue in the RBDs of SARS-CoV-2 mutants. Therefore, using MD simulations, we have investigated the role of S477 and its two frequent mutations (S477G and S477N) at the RBD during the binding to hACE2. We found that the amino acid exchanges S477G and S477N strengthen the binding of the SARS-COV-2 spike with the hACE2 receptor.

In the recent global outbreak of COVID-19, more than 80 million individuals have as yet been infected with this severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). SARS-CoV-2 is responsible for around 1.8 million deaths as of December 2020. This novel virus is a member of structurally "crowned" viruses, first appreciated and defined as coronavirus in the 1960's when Tyrrell and Bynoe discovered the first human coronaviruses (HCoVs) 1 . These single-stranded RNA viruses have round enveloped virions and are covered by trimeric aggregates of spike (S) proteins. The S protein is a type I fusion protein and is involved in receptor binding, eventually leading to virus fusion with the host cell. Specifically, the SARS-CoV-2 spike receptor binding domain (RBD) is known to interact with the human angiotensin-converting enzyme 2 (hACE2) receptor 2,3 . A conformational change in the trimeric S protein exposes one of its RBD domains in an active "up" state and triggers the binding of the receptor binding motif (RBM, residues 437 to 508) to hACE2 4,5 . Subsequently, the S protein is also recognized by the immune cells as a primary antigenic target to neutralize the virus. However, favourable mutations for the virus could be acquired through natural selection, following this as of now hundreds of new mutations were found on different residue positions of the S protein. While preparing potential targets of the SARS-CoV-2 proteome for our recently published large-scale virtual screening 6,7 aiming to identify potential binders and inhibitors, we analysed the diversity of the SARS-CoV-2 genome landscape: Till 30th August 2020, among 73,042 reported SARS-CoV-2 spike sequences on GISAID we have found that a total of 185 amino acid substitutions are reported on the 223 residues long RBD and 68 exchanges are reported in the 72 residues long RBM (Fig. 1a) 8 . Within the SARS-CoV-2 spike protein, residue D614 is a highly variable site, whereas residue S477 has the highest number of mutation events within the RBD (Fig. 1b). Interestingly, S477N occurs very frequently alongside the D614G variant which has been reported to be associated with a very high viral load 9 , structurally these two sites are located distinctly in the spike protein, but the very high frequency of co-occurrence among these two sites needs to be investigated. www.nature.com/scientificreports/ Further, the rate of mutation in the RBD is posing a greater threat since the interaction of this domain to hACE2 is the key to entering a host cell and recent studies are highlighting that several mutations in the RBD indeed strengthened the SARS-CoV-2 infectivity [10][11][12][13][14] . Each mutation is said to have an impact on the protein-protein interaction interface in the heterodimeric complex of hACE2 and the RBD. However, the site in the RBD, which looks very promising to investigate, is the residue S477. As of 30th August 2020, position 477 has the highest frequency of mutation in the RBD (Fig. 1b) with reported substitutions to G(2) I(7) N(3400) R(2) T(2) X (20), where X denotes any amino acid. Deep mutational scanning of the RBD has shown that a mutation at S477 has a potential to affect the stability of the RBD as well as its binding to hACE2 15 , whereas the structural effect of residue S477 in context of intra-RBD interactions is lesser understood while it is reported that the residues S477 and T478 in SARS-CoV-2 are responsible for specific and efficient interactions with hACE2 and provide an edge over SARS-CoV 16,17 . Therefore, we investigate the structural contribution of residue S477 in the RBD in order to understand its role in hACE2 binding.
Unbound proteins are predisposed to undergo conformational fluctuations which are relevant for protein-protein interactions and intra-protein interactions and have a profound role in presenting surface exposed residues in inter-protein interactions [18][19][20] . A structural analysis of the unbound RBD is therefore a very critical step for designing a potent inhibitor or an antibody to potentially block hACE2 and RBD interactions. In this study, we are investigating the intrinsic motion of the RBD in an unbound state using the normal mode analysis (NMA) technique. NMA is a time dependent harmonic approximation of internal motion in proteins around an energy minimum [21][22][23] and proved to be a very handy tool to study the harmonic motions relating to protein dynamics and conformational fluctuations 21,22,24 . NMA has been used previously to resolve the domain motions and dynamical correlation among protein residues 24,25 .

Results
Flexibility analysis highlighted a highly correlated domain motion in the RBD. The RBD is very efficient in finding its attachment surface on hACE2 and is therefore SARS-CoV-2's first line tool to explore its potential binding target on a human cell. It has been shown that the flexibility of surface exposed residues plays a key role in the recognition of potential binding sites during protein-protein interactions [26][27][28] . Therefore, we performed a detailed flexibility analysis to identify "flexible" and "rigid" regions of the SARS-CoV-2 RBD. We found significant differences among the dynamics of those residues, especially in the receptor binding motif (RBM, residues 437 to 508) of the RBD (Fig. 2). The key flexible region within the RBM covers the residues from 475 to 487, which might have a key role for an induced fit binding with hACE2. It has also been reported that the region, which corresponds to residues 475-483 in RBD, is a site of interest due to a higher frequency of mutations in this region 9,15 .
Normal mode flexibility analyses have shown an array of highly dynamic residues from residue 475 to 487 in the RBM. Surprisingly, these residues have a very high dynamical pair correlation among them with a correlation coefficient of ≥ 0.9 (Fig. 2c). Apart from RBM, RBD has no other highly correlated dynamical pair of residues in it.
Specifically, we have noticed an interesting site among these highly flexible residues in the RBM, the residue S477, preceded by G476. A mutation corresponding to S477G results in two consecutive Gly residues at position 476 and 477, with the resulting arrangement A475-G476-S477G (AGG), This AGG motif might favour easier structural arrangements of side chains which are otherwise restricted 29,30 . Our detailed NMA suggested that there is a flexible domain of highly correlated dynamical residues on RBM, and local amino acid exchanges S477G and S477N have a very high potential to affect local conformation significantly upon binding. Further using MD simulations, we are investigating a critical loop on RBM, which has a prominent role in binding with hACE2.    53 Volmap toolkit that generated a map of the weighted atomic density of every ACE2 atom within 5 Å of RBD at each gridpoint. This is done by replacing each atom in the selection with a normalized gaussian distribution of width (standard deviation) equal to its atomic radius. www.nature.com/scientificreports/ tion hotspot on hACE2 opposite to residue S477 of the RBD in the native system as well as in the S477N variant, while it is missing in the S477G variant. This difference is most likely due to steric reasons because of the lack of side chain functional groups in glycine. Qualitatively this analysis indicated a larger "footprint" of the variants compared to native RBD (Fig. 3). In line with that, the average total numbers of interface hydrogen bonds per trajectory frame of the 100 ns simulations are 5.7, 6.0, 6.7, for the native, S477G and S477N variants, respectively. The binding interface of 100 ns equilibrated RBD:hACE2 complex annotated with the residues involved in interchain interactions (Fig. S1) also shows the divergent interactions at the interface. Therefore, further analyses were carried out to understand the structural impact of the amino acid exchange at position 477.
S477 variants are showing contrasting change in flexibility upon binding to hACE2. In MD analyses, the root mean square fluctuation (RMSF) is a broadly used tool to identify residue-specific-flexibility 31,32 . Plotting the RMSF of each residue in the RBD, in its unbound form as well as in complex with hACE2, we noticed an overall decrease in flexibility of the RBD upon binding with hACE2 compared to the unbound RBD (Fig. 4). This decrease was particularly pronounced in the region around residue 477, suggesting an active participation of this residue at the interaction interface. Although we expected an increased local flexibility of the loop in the S477G variant, this hypothesis surprisingly turned out not to be true.

S477G delayed the detachment of RBD from hACE2. Steered molecular dynamics (SMD) is an
advanced version of MD simulations, widely used to study the binding and unbinding events among biomolecules in structural biology [33][34][35][36] , where the dissociation process is being accelerated using an external forced pulling via a harmonic force constant. SMD simulations have proved their worthiness in resolving the effect of single point mutation on protein-protein interaction interfaces 37,38 .
To study the impact of the amino acid exchanges at S477 in the detachment and association process of the spike protein to hACE2, an SMD simulation was carried out. The forced dissociation of the RBD from hACE2 depends on the spring constant and tension exerted by the surroundings (Fig. 5). Interestingly, there is a significant difference in the rupture force between the RBD and its variants S477G and S477N throughout different force constants. The chosen spring constant of 250 kJ/mol/nm 2 allows for a consistent and smooth transition between bound and unbound complex. A final center-of-mass (COM) distance between hACE2 and the RBD of approximately 14 nm was achieved.
Energy and conformational changes show that the rupture force varied greatly between the RBD and its S477 variants. Van der Waals energies show lesser fluctuations in the event of forceful dissociation of RBD from hACE2, whereas in the beginning the electrostatic forces dominated (Fig. S2). The complex of native RBD and hACE2 started dissociating after 400 ps (Fig. 6), S477N and hACE2 started dissociating at 390 ps, whereas the S477G variant delayed the disassociation process with hACE2 for around 100 ps. To further confirm the time www.nature.com/scientificreports/ point of disassociation of the RBD:hACE2 complex, we plotted the distance between RBD and hACE2 vs. time, indicating that the S477G RBD variant is significantly delaying the disengagement process (Fig. 7). Potential of mean force calculations (Fig. 8) show that the S477N variant features the highest binding affinity with hACE2 compared to its native RBD and S477G (Table 1). Our results are in line with experimentally reported changes in the binding affinity of S477G and S477N variants 15 .
To visualize the changes in the dissociation process we took structural snapshots at the breaking point for both, the RBD and its S477 variants. The inset structural annotation as shown in Fig. 8 at the breaking point indicates that the interaction with the highly flexible loop residues of the RBD will remain until the end of the unbinding event.

Discussion
Our flexibility and correlational analyses reveal a highly dynamic loop in the SARS-CoV-2 spike protein RBM from residue 475 to 485. It is predicted by induced fit theory 39 that the binding event induces a conformational change at the interaction site, and biased dynamics of these highly correlated residues 14 at the RBM are emphasizing their relevance for hACE2 binding. In fact, loops are the most common structural features found at protein-protein interaction interfaces 40 . Forced pulling dynamics revealed that this highly flexible loop remained interacting with hACE2 till the end of the rupturing event, suggesting that it provides the required mobility to mediate the hACE2:RBD interaction. Following this, it is worth considering that any alteration in the flexibility of this loop could have a strong impact on the binding of the RBD with hACE2. The comparative analysis of the RBD and its S477 variants highlights the potential effects of single amino acid substitutions on protein-protein interactions. Both single mutations of S477 have shown an increased binding affinity for hACE2. Based on our flexibility and SMD analyses, we are emphasizing that any mutation subject to a structural change in this region could potentially be critical for the interaction between RBD and hACE2. Missing heavy atoms, residues and hydrogen were then modelled by using the web based CHARMM package 42,43 . The amino acid exchanges S477N and S477G were modelled using the standalone CHARMM package.

Simulation system details. System 1-unbound RBD. The coordinates of the RBD and its S477 variants
were extracted from the above mentioned fully prepared structure. All structures were subjected to an energy minimization step followed by 100 ps of NVT. A 100 ns equilibration run was performed to relax the interface side-chain conformations constrained by the hACE2. This relaxed unbound RBD structure was used for the NMA on the DynaMut server 44 . For a comparative flexibility analysis based on the root-mean-square fluctuation (RMSF), all three structures were further subjected to three independent 10 ns production runs each.
System 2-complex of hACE2:RBD. The coordinates of the hACE2:RBD complexes (native and S477 variants) were taken from the above-mentioned template crystal structure. All complexes were further subjected to an energy minimization step followed by 100 ps of NVT and a 100 ns equilibration run. These equilibrated structures were used as input for the SMD simulations.
As above, the thus equilibrated complex structures were further subjected to three independent 10 ns production runs each for RMSF analyses. www.nature.com/scientificreports/ Molecular dynamics simulation details. Each system was placed in the center of a box with a minimum of 10 Å distance from the edge of the box, to avoid imaginary interactions, followed by solvation and system neutralization by adding ions. Potential energy terms for the protein and ions were derived from CHARMM36 45 all atom additive force field, and TIP3P 46 potential was used to represent water molecules. All classical simulations were performed using the Gromacs 2018 47,48 package with enhanced GPU support. A step size of 2 fs was chosen with a 10 Å cutoff for non-bonded interactions and Particle Mesh Ewald (PME) 49 summation was used to deal with long range interactions. All covalent bonds involving hydrogen were constrained using LINCS 50 . The temperature was maintained at 300 K with a Velocity-rescaling algorithm 51 using two indexed groups: protein and non-protein, for effective temperature coupling. Pressure coupling was applied isotropically at 1 bar with a weak coupling constant of 2 ps and an isothermal compressibility value of 4.5 × 10 -5 bar −1 . www.nature.com/scientificreports/ Steered MD simulation details. For SMD, the structures were placed in a rectangular box with the dimensions (100 Å × 100 Å × 350 Å)-sufficient to avoid imaginary interactions and providing space for pulling simulations to take place along the Z-axis-followed by solvation and neutralization by adding ions. Subsequently, bad contacts were removed by energy minimization, followed by a 100 ps NVT and a 500 ps NPT equilibration. After equilibration, position restraints were removed from the RBD whereas hACE2 remained as an immobile reference for the pulling simulations with position restraint of 1000 kJ/mol/nm 2 on heavy atoms in xyz direction. In each of the 10 different pulling simulations, the RBD was pulled away from the hACE2 along the Z-axis over 1000 ps with a fixed spring constant value, starting from 50 kJ/mol/nm 2 for the first simulation to 500 kJ/mol/nm 2 for the 10th simulation, with an increment of 50 kJ/mol/nm 2 in each step between the first and last simulations. A pull rate of 0.1 Å/ps has been used throughout all 10 pulling simulations. All pulling simulations were performed with the GROMACS package.
To generate the starting configurations for umbrella sampling we have chosen trajectories resulting from the pulling simulations with the spring constant of 250 kJ/mol/nm 2 . Symmetrical distribution of sampling windows was used with a window spacing of 0.1 nm, up to a 14 nm center-of-mass (COM) separation. Such finer spacing resulted in 90 windows at different COM distances. In each window, 5 ns of MD was performed for a total simulation time of 450 ns utilized for umbrella sampling. Finally, analysis of the umbrella sampling was performed with the weighted histogram method (WHAM) 52 with 300 bootstrapping steps in order to estimate standard deviations of the binding free energy.
Normal mode analysis (NMA). Normal modes are represented by eigenvectors of the Hessian matrix and eigenvalues are the squares of the associated frequencies. Each eigenvector describes a state of the protein where all Cα atoms are oscillating with the same characteristic frequency.
Using the characteristic frequency of each mode, atomic fluctuations have been calculated, given by: where x 2 i is the time averaged square displacement of atom i, ω j is the frequency of mode j, a ij is the displacement of atom i under mode j, and N is the number of residues.
The DynaMut online server takes advantage of the graph-theory and normal mode analysis (NMA) and has been employed to analyse and visualize the RBD dynamics, to generate a consensus prediction of protein motion and flexibility 44 . Root mean square fluctuations (RMSF). RMSF analysis described the average deviation of a residue over a time from a reference position. Using RMSF the portions of the structure that are flexible or rigid could be Figure 8. Potential of mean force calculation from SMD simulation of RBD and its S477 variants. The standard deviation is indicated by a semi-transparent band in the same colour. www.nature.com/scientificreports/ identified. After removing the periodic boundary conditions using Gromacs trjconv tool, the gmx rmsf module has been used to calculate RMSF using the following equation: where T is the total time of trajectory, t j is the time at jth frame of trajectory, r i is the position of the atom i and r ref i is the initial reference position of the atom i.
Root mean square deviations (RMSD). The RMSD of a protein is taken as the root of the average square displacement over all N Cα atoms, very often the measure of how much the protein conformation has changed. After removing the periodic boundary conditions using the Gromacs trjconv tool, the gmx rms module has been used to calculate the RMSD using the following equation: where M is the total mass of the protein, N is the total number of amino acids, m i is the mass of amino acid i, r i (t) is the position of the atom i at time t and r ref i is the initial reference position of the atom i.

Data availability
The molecular dynamics trajectories, input files and parameter files generated within this work are available for download at https ://doi.org/10.6084/m9.figsh are.13234 529.v1. www.nature.com/scientificreports/