S494 O-glycosylation site on the SARS-CoV-2 RBD affects the virus affinity to ACE2 and its infectivity; a molecular dynamics study

SARS-CoV-2 is a strain of Coronavirus family that caused the ongoing pandemic of COVID-19. Several studies showed that the glycosylation of virus spike (S) protein and the Angiotensin-Converting Enzyme 2 (ACE2) receptor on the host cell is critical for the virus infectivity. Molecular Dynamics (MD) simulations were used to explore the role of a novel mutated O-glycosylation site (D494S) on the Receptor Binding Domain (RBD) of S protein. This site was suggested as a key mediator of virus-host interaction. By exploring the dynamics of three O-glycosylated models and the control systems of unglcosylated S4944 and S494D complexes, it was shown that the decoration of S494 with elongated O-glycans results in stabilized interactions on the direct RBD-ACE2. Calculation of the distances between RBD and two major H1, H2 helices of ACE2 and the interacting pairs of amino acids in the interface showed that the elongated O-glycan maintains these interactions by forming several polar contacts with the neighbouring residues while it would not interfere in the direct binding interface. Relative binding free energy of RBD-ACE2 is also more favorable in the O-glycosylated models with longer glycans. The increase of RBD binding affinity to ACE2 depends on the size of attached O-glycan. By increasing the size of O-glycan, the RBD-ACE2 binding affinity will increase. Hence, this crucial factor must be taken into account for any further inhibitory approaches towards RBD-ACE2 interaction.

compared to SARS-CoV 2 . While, the other suggests a similar binding affinity for SARS-CoV-2 and SARS-CoV to ACE2 3,10 . Either way, the mutations in SARS-CoV-2 RBD certainly results in attenuated binding affinity of all designed inhibitors which successfully work on SARS-CoV 2 . This was a challenging issue for anti-SARS-CoV-2 drug design attempts since the pandemic has started.
In addition to the 3D structures of RBD-ACE2 complex, several recent Molecular Dynamics (MD) studies addressed the complex dynamics to design peptide inhibitors. That block the ACE2-RBD interaction and evaluate the currently targeted binding epitopes' accessibility for further improvements 4,11,12 . Earlier MD studies explored the effect of pH 13 and temperature 14 on the dynamics of SARS-CoV-ACE2 complex.To the best of our knowledge, there is currently no approved therapeutic inhibitor for SARS-CoV-2.
A pioneering study showed that the observed mutations at the junction of S1 and S2 subdomains of the SARS-CoV-2 S protein result in the emergence of a polybasic cleavage site adjacent to three O-glycosylation sites which are novel to SARS-CoV-2 and were not observed in any other related virion 1 . It was proposed that O-glycosylation (addition of glycan building blocks to hydroxyl oxygen of Ser/Thr residues which is an enzymatic post transnational modification 15 ) could lead to recognition of polybasic cleavage site by Furin enzyme and thus resulted in higher infectivity and broadens the host range of the virus 1 .
A Recent comprehensive experimental mass-spectrometry study on both N-and O-glycans attached to the spike protein did not report O-glycosylation of the S494. However, two O-glycosylated sites (S325/T323) were reported on the RBD 16 . Another recent computational study that was validated by experimental data reported that the S325/T323 O-glycosylation sites that were observed by Shajahan et al. is present with 11% frequency along side other O-glycosylation site that were decorated with lower frequencies 17 .
Six mutations have been reported on the RBD of SARS-CoV-2 compared to the SARS-CoV. One of these six occurring mutations is D494S on the RBD of S protein. Based on the proposed mechanism for the O-glycosylation of Furin cleavage site of SARS-CoV-2 1 , and the experimental strucutral data that was mentioned above 16,17 , decoration of Serine494 of RBD with the O-glycan is also plausible even though it is not experimentally confirmed. Our comprehensive review of literature on pathology and glycosylation pattern of the envelope proteins of all other coronaviruses and their close relatives showed strong evidence for the critical role of O-glycosylation in regulation and immune evasion of the viruses 1,18-20 . Also, former studies showed that the O-glycosylation of the membrane (M) protein of Mouse Hepatitis Virus (MHV) triggers a lower interferon level than the N-glycosylated M protein 18 . 3A protein of the MHV, an accessory protein to the M protein is also known to be O-glycosylated. Thus, it could be shielded from Peptide-N Glycosidase F (PNGase F) which cleaves most monosaccharide attached to protein. On the other hand, a complex nontemplate mechanism of the O-glycosylation could occur via several Polypeptide N-acetyl Galactoseamine Transferases (PPGalNAcTs) specific to various tissues was shown to be a common mechanism for the pathology of several high-evasion viruses such as Influenza 21 . In fact, the experimental findings suggest that the mucin proteins of human respiratory airways are heavily O-glycosylated with carbohydrate chains [22][23][24] . These dense O-glycan chains are known to trap invading microorganisms such as various strains of bacteria 25 .
Hence the lack of O-glycosylation in the currently available structures of RBD-ACE2 complex could be related to the expression host of the proteins and various glycosylation patterns in different cell lines.
Putting all the information mentioned above together, it is highly plausible that D494S mutation has an evolutionary origin and leads to O-glycosylation of RBD in the SARS-CoV-2. Hence, the O-glycosylation could be the overlooked factor for the SARS-CoV-2 increased binding affinity to ACE2, explaining the virus's high infectivity in the human host and exposing people with a higher level of blood sugar at higher risk 26 . Our observations may explain the reason for the higher vulnerability of diabetic patients to infection.
Herein, we explored this possibility by modeling the O-glycosylated RBD structure and its dynamics in interaction with ACE2 (Fig. 2). We showed that O-glycosylation does indeed increase SARS-CoV-2 RBD-ACE2 binding affinity. And attachment of the elongated O-glycans will increase the binding affinity. Furthermore, we tested the validity of these observations by considering the structural role of S494 substitution with D in the SARS-CoV-2 RBD to entirely eliminate the structural role of S494 in the dynamics.
In summary, this work provides strong evidence that O-glycosylation of SARS-CoV-2 RBD in the respiratory airways leads to stronger interactions between the virion and the host cell receptor. www.nature.com/scientificreports/ Hence, further inhibitor design attempts must take the detailed atomistic glycan-protein interaction reported here into account as a critical factor for future investigations.

Methods
Model building of RBD-ACE2 complexes. The crystal structure of SARS-CoV-2 RBD (residue 333 to 526) in complex with the extracellular domain of ACE2 (residue 19 to 615) (PDB ID: 6M0J) was used as the starting structure. Zn 2+ and Cl − ions which were resolved in the crystal structure for their crucial role in stabilizing ACE2 structure were also preserved at S1 and S2 subunits of ACE2. All Aspargines located within N-glycosylation motifs from RBD (Asn343) and ACE2 (Asn53, Asn90, Asn103, Asn322, Asn432, and Asn546) were glycosylated. The GLYCAM online builder 27 was used to attach the oligosaccharide N-glycans to each site. This oligosaccharide model was selected as a primary model of glycosylation which occurs in normal human cells 15 . Supporting Table S1 provides details on the structure of attached N-glycans.
O-glycosylation of Serine494 of RBD was carried out by attaching three models of core and branched typical human O-glycans using GLYCAM online builder 27 . Models I, II, and III are consist of two, five, and six monosaccharide units, respectively (Fig. 2).Details of all the glycosidic linkages and the bonding atoms can be found in Fig. S.2 and Table S.2. These O-glycan models were attached to RBD, while RBD-ACE2 complex was kept fully N-glycosylated (Fig. 1). Control system of RBD-ACE2 model without the attached O-glycan was also considered and will be referred to as model S. The model with the Serine 494 substitution with Aspartic acid in the RBD was prepared to completely eliminate the structural role of D494S mutation of SARS-CoV-2 in the RBD-ACE2 interactions.This model will be referred to as model D.

Molecular dynamics simulations.
All systems were solvated using TIP3P water model 28 and were neutralized by attaining a buffered environment in 150 mM NaCl. The resulting water box approximately has a dimension of 121 × 111 × 149 Å 3 and are typically consist of 190000 atoms each. All systems were parameterized with the CHARMM36m force field 29,30 , and NAMD2.12 31 was used to perform the MD simulations.All of the production runs were carried out under periodic boundary conditions under NPT ensemble, while employing Langevin thermostat and Nose-Hoover Langevin piston method to keep systems' temperature and pressure at 310 K and 1 bar, respectively. A cutoff of 12 Å was assigned for computation of short range nonbonded van der Waals interactions. The particle mesh Ewald method 32 was used for long-range electrostatic interactions. R-RESPA multiple time-step schemes were used for the integration of motion equations 31 . Using this integrator, the Lennard-Jones interactions and bonded ones were updated every step and electrostatic interactions every two steps. Along with the restraining of all covalent hydrogen bonds by SHAKE, the time steps for integration were set to 2fs. Final models of all complexes were minimized for 5000 steps to remove any steric clashes. The systems were then equilibrated for 0.5 ns under the NVT ensemble followed by another 0.5 ns relaxation under the NPT ensemble. The position of atoms in the complex (proteins, Zn +2 and Cl −1 ions) was restrained using a harmonic potential with spring constant k = 1 kcal/(mol Å 2 ). Production runs were then performed in three replicates of 100 ns for models S, I, II and III. For the control model D, two simulations of 100 ns was carried out.
(with the exception one simulation for the control model D). By selecting the PDB ID 6M0J crystal structure of the complex, performing several minimization steps and extensive equilibration, the structure of the ACE2bound RBD is very well relaxed. Thus, the Orientation of the O-glycosylated RBD-ACE2 is reliable.
During the production runs, all restraints on the protein were turned off, while dummy bonds between Zn +2 and Cl −1 ions and their adjunct atoms within radius of 3 Å were kept intact.
Binding free energy calculations. The binding free energy between RBD and ACE2 was calculated with Molecular Mechanics Poison Boltzmann Surface Area (MMPBSA) method. Here we used CaFE pipeline tool 33 on 250 frames from the last 10 ns of MD simulations and an ensemble-averaged over all replicates of each model. For electrostatic energy calculations, APBS method 34 was used. The dielectric constants for solvent and protein were set to 80.0 and 1.2 respectively (More details could be found in Supplementary method).Probe radius for calculation of SASA was set to 1.4 Å.

MD simulations analyses. Root Mean Squared Deviation (RMSDs), Root Mean Squared Fluctuation
(RMSFs), Solvent Accessible Surface Area (SASA) and the distances between the interacting pairs of residues, RBD and ACE2, H1 and H2 were calculated by tcl scripts taking advantage of VMD 35 . RMSD of each system was calculated by considering the backbone atoms (C α , C, and N). RMSF and the distance among the geometric centers of domains were calculated for C α atoms. SASA was calculated bey selecting the interface residues between RBD (405-505) and ACE2 . All trajectories were fitted to the starting conformation. All the plots were generated by Matplotlib 36 library of Python 37 and the figures by VMD 35 .

Results and discussion
Dynamics of the O-and N-glycosylated RBD-ACE2. It is worth mentioning again that as all the systems are fully N-glycosylated here, the N-glycans' global effect is explicit in the dynamics. However, the interpretation of results is mainly focused on O-glycosylation. All systems are simulated in three replicates except control model D with two replicates. In each RMSD plot, the probability distribution function (PDF) shows the the fluctuations of RMSD data which is interpreted as flexibility in this text. One should note that the RMSD plots are not presenting the stability of the complex in terms of the evolutionary dynamics and they only repre- www.nature.com/scientificreports/ after. (Fig. 3 A,B). Model D (σ = 0.203 ) shows dramatically high fluctuations in the RMSD values (Fig. 3). This observation suggest that the D494S mutation stabilizes the RBD-ACE2 interaction in SARS-COV-2 regardless of the O-glycosylation. The histograms of the PDF for RMSD values show a clear decrease in the peaks values of models III, II compared to models I, S and D (Fig. 3). This observation is in agreement with several other experimental and comuputational studies that reported the increased stability of the glubular and transmembrane protein complexes upon glycosylation with elongated glycans [38][39][40] .
Overall RMSF values of the RBD show that different models present high peaks in RMSF at different locations (Figs. 4, 5). However, calculation of the average RMSF for the direct interface residues 41 showed the lowest values for models III an II, while model S comes after (Table 1). Model D shows the hightest RMSF values (Table 1). This is supportive of the RMSD results that showed model D as the most flexible complex. The most dramatic increase of fluctuations occurs in model I residues 430 to 450 of RBD (Fig. 4). Two residues (GLY446 and Tyr449) within this region are known to interact with the ACE2 directly 42 . Attachment of a two-units O-glycan to model I, destabilizes these interactions as the glycan is not long enough to form contacts with the neighbouring residues. These results are interesting since the experimental findings has shown that the O-glycans of the respiratory airways are often elongated oligosaccharides 25 .      (Fig. 6). But the effect of the attached O-glycans is be visible in ACE2 RMSD plots as the attached O-glycans interact with ACE2 α-Helices. leading to a less flexible dynamics at the ACE2 interface ( Fig. 7A-C). This observation also shows that the D494S mutation plays a critical role in stabilizing the RBD-ACE2 interaction. ACE2 extracellular domain is a sizable receptor with about 600 amino acids; hence, it is expected that the overall RMSD of the receptor would not change much by the O-glycosylation of the RBD (Fig. 7A). As we aimed to study the alterations in ACE2-RBD interface, we calculated the RMSD of two α-Helices (Figs. 7B,C) which are known as RBD interacting sites within ACE2 3 .
Average RMSD plots of H1 and H2 show that the O-glycosylated systems show different flexibility behaviour at these regions. Model I and III is less flexible and model II is more flexible in this region.((σ III−H2 = 0.006) > (σ III−H1 = 0.004) and (σ I−H2 = 0.013) > (σ I−H1 = 0.006) while (σ II−H2 = 0.01) < (σ II−H1 = 0.02) ). Model III with the longest glycan is the least flexible (Fig. 7B). The two sharp peaks in RMSD plot of H1 for model D is resulted from high fluctuations of the dangling residues 19-22 at the N termini segment of H1 (Fig. S3) that is quite distant form the interface. On the other hand, H2 is more flexible in models I ( σ I−H2 = 0.013 ) and II ( σ II−H2 = 0.01 ) while model III ( σ III−H2 = 0.006 ) is more persistent (Fig. 7C) (Fig. 5A). To highlight the differences, we have shown the relative differences compared to model S in Fig. 5B. The relative difference is presented as D RMSF = RMSF 0 −RMSF i RMSF 0 +RMSF i where RMSF i is per residue RMSF of each model. According to the definition more positive D RMSF means lower relative RMSF.
Visualization of the dynamics shows that the O-glycan forms the most numerous polar contacts with H2 (five sites in H2 (N61,K68,A71,F72,E75) versus three sites in H1 (N38, L39, N49) Fig. 8.) In contrast, according to the crystal structures, H1 forms more contacts with RBD than H2 3 . RMSD plots of the attached O-glycans also show that the elongated models II and III are more flexible Fig. S2. That is due to the several polar interactions that they form with H2. One should note that the RBD is not O-glycosylated in the X-ray crystal structures. In fact that we see more interactions between O-glycan and H2 in our simulations suggests that the O-glycan keeps the RBD-ACE2 together by not interfering in the main ligand-receptor interactions that take place at H1.
Biding affinity of O-glycosylated RBD to ACE2. The decreased fluctuations of O-glycosylated models II and III, does not necessarily prove stronger RBD-ACE2 interaction. However, calculating the distance between RBD-ACE2, RBD-H1, RBD-H2 and all the interacting pairs at the interface could provide hints on this mechanism. Distribution of the RBD-ACE2 distance values shows a similar pattern (peaks at 49 Å) in models III and S simulations (Fig. 9A). While, in models II, I and D simulations, RBD and ACE2 are by 1, 1.5 and 2Å more distant respectively (Fig. 9A). The distance fluctuation between RBD and ACE2 in model D is more altering. In addition to the ACE2-RBD, distances between RBD, H1, and H2 of ACE2 were also measured (Fig. 9B,C). Plots show that O-glycosylation always leads to a decrease between the RBD and the two helices (Fig. 9B,C). This decrease is more significant in H1-RBD distance for all O-glycosylated models compare to Model S (Fig. 9B). While, H2-RBD distance distribution also presents a decrease in the O-glycosylated models. Model D shows the largest distance values in both H1-RBD and H2-RBD plots (Fig. 9B,C). These observations suggest that RBD-ACE2 binding should be stronger at the direct interface upon O-glycosylation of S494 with elongated oligosaccharides. The SASA plots for the RBD-ACE2 interface show that model I is the most exposed in the binding interface (Figs. 9D and 9E )with an increase of about 1000 Å E 2 compared to other models. This is supportive of the distance distribution and RMSD/RMSF plots that showed a destabilized interaction in model I due to the small two-units glycan that increased the flexibility in the interface leading to form weak interactions with the neighbouring residues in model I.
The distance distribution plots of all the interacting pairs of residues on the RBD-ACE2 interface were also calculated (Fig. 10). The overall trend of the plots is very similar in models S, III and II. Showing that the main interactions in the interface are either not altered or strengthened by the O-glycans. This is due to the polar contacts between the glycans and the neighbouring residues that lead to stronger RBD-ACE2 interaction (Fig. 8). The most fluctuating distance distributions occur in model D. Where in 7 out of 15 interacting pairs (ACE38-RBD498, there is a noticeable increase in the peak location compared to other systems (Fig. 10). In most of the interacting pairs, the O-glycosylated models with elongated oligosaccharides show the smallest peak values (Fig. 10). In two of the interacting pairs (ACE353-RBD498, ACE31-RBD493 and ACE355-RBD500) model D shows a decrease in the distribution peaks (Fig. 10). The noticeable increases and decreases in model D plots for different interacting pairs show the system's flexibility and weakened RBD-ACE2 interactions. Among the O-glycosylated models, I is the only system that present increased peak values for ACE38-RBD449 and ACE42-RBD446 interacting pairs. Model I was shown to be the most flexible O-glycosylated model (Fig. 3). Due to the flexibility that is added to the system by the short two-units glycan that is unable to form contacts with neighbouring residues in the interface in a manner that occurs for models II and III simulations (Fig. 8).
The binding free energy between the RBD and ACE2 also shows that the interaction is most favorable upon O-glycosylation in models II and III with longer attached glycans (Fig. 11). The binding free energies between Figure 9. (A) PDF of RBD to ACE2 averaged distance distribution with a histogram and maximum likelihood gaussian distribution fit: calculated from all replicate of each system and are represented in black, lime, orange and blue for models S and I-III respectively. PDF of model D is shown in red. (B) PDF of RBD to H1 averaged distance distribution with a histogram and maximum likelihood gaussian distribution fit: calculated from all simulations of each system and represented in black, lime, orange and blue for models S and I-III respectively. PDF of model D is shown in red. (C) PDF of RBD to H2 averaged distance distribution with a histogram and maximum likelihood gaussian distribution fit: calculated from all simulations of each system and represented in black, lime, orange and blue for models S and I-III respectively. PDF of model D is shown in red. Sampling were done from the last 40 ns of simulations. (D) SASA plots of RBD-ACE2 interface that calculated from all simulations of each system and represented in black, lime, orange and blue for models S and I-III respectively. The SASA plot of model D is shown in red.  (Fig. 11). Control models S and D present less favourable binding energies. Model D shows the smallest G value − 10.00±5.193 kcal/mol among all systems. This is due to the dramatic destability of the complex upon S494D substitution that shows the appearance of S494 in the RBD of Figure 10. PDF of RBD-ACE2 interface interacting pairs averaged distance distribution with a histogram and maximum likelihood gaussian distribution fit: calculated from all replicate of each system and are represented in black, lime, orange and blue for models S and I-III respectively. PDFs of model D is shown in red. This is due to the increased flexibility at the ACE2-RBD binding interface and the decrease of polar interactions made by the core glycan due to its small length. RMSD and RMSF plots of the RBD and overall ACE2-RBD (Figs. 3, 4 and 6) are supportive of this observations and show an increase in the flexibility of the complex for model I. Chosen value of solute dielectric constant has shown to have a dramatic effect on the prediction of binding free energy, while using the MMPBSA method [43][44][45][46] (see method section of Supporting information). Regardless of the given value to the dielectric constant, the decrease of binding free energy in the O-glycosylated models II and III with longer glycans is always persistent and statistically significant (Fig. 11). The increase of the binding free energy in the glycosylated protein-ligand complexes are also reported in other studies 39,47 .
The stabilized dynamics and more favorable binding energy between ACE2 and RBD with elongated O-glycans attached, could increase the uptake of SARS-CoV-2 and the possibility of its high evasion. However, performing further simulations in parallel with experiments to test the effect of glycosylation on each residue in the binding interface could be a useful validation of the observations reported here.

Conclusions
S494 that only occurs in SARS-CoV-2 and not in SARS-CoV is located in direct RBD-ACE2 binding interface has the potential to be decorated by O-glycans. Previous experimental and computational studies emphasized on the role of O-glycosylation in SARS-CoV-2 high infectivity. The results of atomistic molecular dynamics simulations of SARS-CoV-2 RBD in complex with ACE2 suggest that the O-glycosylation of S494 leads to the stronger interaction between RBD-ACE2 which can increase the virus infectivity. Three models of core and elongated O-glycans attached to RBD were tested and the results were compared with the unglcosylated S494 and S494D systems here. We observed that attachment of the elongated O-glycans lead to less flexible ACE2-RBD dynamics and decreased distances between RBD and two major H1, H2 helices of ACE2 that maintains the interacting pairs of amino acids in the direct binding interface. Relative binding free energy of RBD-ACE2 is also more favorable in the O-glycosylated models with longer glycans. The increase of RBD binding affinity to ACE2 depends on the size of attached O-glycan. By increasing the size of O-glycan, the RBD-ACE2 binding affinity will increase. These observations were confirmed by atomistic simulations of the unglycosylated S494 and the substitution of S494 to D that occurs in SARS-COV. Both of these system are more flexible with weakened interactions at the binding interface and less favourable binding affinity to ACE2.
The findings of this study add insightful information to the current status of SARS-CoV-2-ACE2 glycosylation and its role in the virus's high evasion rate. This hypothesis is a suitable target for experimental validation, and if proven to be critical, must be considered in further therapeutic designs. Figure 11. Relative binding free energy components for RBD-ACE2 complex. Overall binding free energy of models D, S, I-III are shown in red, black, lime, orange and blue.