Structural insights into SARS-CoV-2 spike protein and its natural mutants found in Mexican population

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a newly emerged coronavirus responsible for coronavirus disease 2019 (COVID-19); it become a pandemic since March 2020. To date, there have been described three lineages of SARS-CoV-2 circulating worldwide, two of them are found among Mexican population, within these, we observed three mutations of spike (S) protein located at amino acids H49Y, D614G, and T573I. To understand if these mutations could affect the structural behavior of S protein of SARS-CoV-2, as well as the binding with S protein inhibitors (cepharanthine, nelfinavir, and hydroxychloroquine), molecular dynamic simulations and molecular docking were employed. It was found that these punctual mutations affect considerably the structural behavior of the S protein compared to wild type, which also affect the binding of its inhibitors into their respective binding site. Thus, further experimental studies are needed to explore if these affectations have an impact on drug-S protein binding and its possible clinical effect.


Results and discussion
Multiple sequence alignments of reported SARS-CoV-2 S proteins sequence show three mutants present in Mexican population (H49Y, T573I, and D614G) (Fig. 1, Table S1). These mutations mentioned above are far from the S protein's RBD domain, a crucial domain for the virus interaction with the host cell. However, it is known that outside binding site mutations affect the ligand recognition as occurs with oseltamivir on neuraminidase from H1N1 34 . D614G change is caused by an A/G nucleotide mutation at position 23,403 in the WT strain 35 , this mutant is being associated with higher viral loads and enhances viral infection effectivity in patients 8,35,36 . Despite the high viral loads, mutant G614 is neutralized by polyclonal antibody similarly to WT D614 35 . To date, this mutant has become the dominant circulating protein displacing the WT, according to the levels of mutations worldwide, presented on the nextstrain database (www.nexts train .org). H49Y mutant is produced by C/T change at position 21,707 in the WT from S protein sequence 10 , producing a residue change from a positive histidine to an aromatic and polar tyrosine 9 (H/Y). On the other hand, D614G mutant substitutions change from negative charge and high hindrance effects to none-radical group (H), which showed a stabilizing structure, suggesting a prevalent role in S protein evolution 13 . Finally, T573I mutant has no information available until now, but the change of T/I implies a modification from polar residue to non-polar hydrophobic residue changing the chemical environment to a more hydrophobic site. Even though these are punctual mutations due to the substitution's chemical nature, perturbations at the structural and energetical level are expected.
The principal peak of all structure fluctuations is located on the Receptor Binding Domain (RBD) from F329 to P521. However, the highest values on this zone belong to H49Y and followed by T573I.
Clustering analysis of apo-proteins. Wild type and mutants. Representative ensembles were calculated from the last 80 ns of the MD simulation. 70% of the most populated conformations were grouped into the first 17, 15, 24, and 17 clusters for WT, D614G, H49Y, and T573I, respectively (Table S2); this clustering dispersion indicates that proteins possess a complex structural behavior. WT S protein showed similar clustering dispersion with D614G and T573I, while H49Y mutant showed higher clustering dispersion, which suggests it has complex structural behavior. Only the most populated cluster conformation was retrieved from each one of the systems, represented in Fig. 3. Regarding the WT structure, the most populated cluster conformation showed a higher difference regarding the initial conformation with an RMSD of 15.934 Å (Fig. 3A), followed by D614G (RMSD = 9.391 Å, Fig. 4B), H49Y (RMSD = 9.347 Å, Fig. 3C), and T573I (RMSD = 9.368 Å, Fig. 3D). Specifically, WT structural differences were observed in most of the structure, it is RBD region, HR1 (T912-L984) region, central helix (CH, D985-G1035) region, and small differences were observed from amino acids D287-F318 that are part of the N-terminal domain (NTD), and G550-T696 that include fusion peptide domain (FPD) belonging to the S1 subunit, in previous reports higher fluctuations on RBD region in monomeric and trimeric form were found 37,38 ; contrastingly, D614G mutant showed differences in regions spanning from D287-Q321 to T530-T696 which indicates that a single residue mutation affected the stability of the central portion of the S protein structure compared to the WT S protein. WT and D614G proteins exhibited differences in their secondary structure and displacements in loops, α-helices, and β-sheets structures, leading to a more compacted structure than WT S in agreement with our Rg findings. Contrastingly, the RBD region is more exposed, which is in line with previous observations where an increment between S1 and S2 distance is suggested to facilitate the viral attachment to the host cell, increasing its transmission mutant 39 . While in the H49Y mutant, structural changes were observed in RBD and NTD domain compared to WT; H49Y induces fewer structural changes on the HR1 region, which correspond to what we observed in the above RMSD and RMSF. These results are consistent with the in silico predictions of energetic stability induced mainly by H49Y mutation and, to a lesser extent, by D614G mutant 13 . In T573I mutant, reduced fluctuations from amino acids I587-I720 were observed, which correspond to the central portion of the protein, this lead   4B). By the projections of the essential subspace of PC1 vs. PC2, it was observed that the WT system ( Fig. 4C) showed different mobility behavior compared to the mutant systems. Sampling different regions than the S protein mutants, which point out that the mutation of a single residue affects its interactions with their corresponding receptor and ligands 37 . On the other hand, www.nature.com/scientificreports/ D614G showed a more compact cluster distribution than the other systems, suggesting a reduction in conformational mobility due to single residue mutation; additionally, it showed a dissimilar conformer distribution along the subspace in comparison to the others systems, this behavior suggests that the trajectory sampled different regions of the phase space with different minima and small energy barrier, in this sense, D614G mutant affects the structural behavior of the protein, which is also reflected in the conformations observed during MD simulation (Fig. 4D). Similar cluster distribution was observed for H49Y and T573I mutants. H49Y showed slightly higher conformational mobility in comparison to T573I system (Fig. 4E,F). These results can be sustained by analyzing the graphical representation of the full mobility along PC1 and PC2 (Fig. 5), which allows us to study the direction and magnitude of the motions contributing to the total system's mobility. According to the projections, the motions of the WT system in the opposite direction, which provokes an expansion of the structure (Fig. 5A), which is in agreement with Rg results; i.e., RBD, NTD, HR1, CH and CD regions went in the opposite direction with higher magnitude. However, for D614G the direction of the movement changes in comparison to WT, the magnitude of the movement was higher for RBD region and S2 subunit, contrastingly for the NTD segment, a significantly minor detriment in the magnitude of the movements was observed (Fig. 5B), which corresponds to the cluster distribution of PC1 vs PC2 projection (Fig. 5D), this mutation not only increases the flexibility of the RBD region, but also changes its direction, which could contribute to an increase in virulence and could alter the binding affinity to the ACE receptor 13 . Previously, it was proposed that D614 forms a hydrogen bond with T859 and a salt bridge with K854 located in the S 2 subunit of the other protomer; thus, the change of D by G could provide a flexible space between the monomers, which increases the protein's conformational flexibility as suggested by our MD simulations, this could lead to improved access to ACE2, and these events could explain the increase viral entry to the host cell 8,40 . This motion perturbation is explained because D614 is located in an exposed loop at a side protein chain; it also has a negative charge, which www.nature.com/scientificreports/ will be a loss on the variant G614, which causes loss of interactions with other molecules or residues. The G614 has a neutral charge, is smaller, and introduces a more hydrophobic residue at this position of the protein, and this can result in loss of hydrogen bonds and/or disturb correct folding. Glycines are very flexible and can disturb the required rigidity of the protein at this position 13 . In the case of H49Y mutant, the whole mobility decreased, which was reflected in the magnitude of the porcupine representation in Fig. 5C. Moreover, the direction of the motions in the whole mutant protein was different in comparison to WT. Additionally, it was observed that the motions for NTD and S2 (816-1106) subunit were lesser in comparison to WT, suggesting that this mutant not only confers more structural stability as we have proven but also energetic stability as it was suggested by other authors 13 . Aside, it was found that the H49Y mutant virus has increased cell entry compared to WT S protein 8 . Finally, the T573I mutation increases the magnitude of the protein's movements at the S1 subunit; regarding the S2 subunit, it was observed that the direction of the movements is opposite to the WT (Fig. 5D). So, this mutation altered the magnitude and the direction of the protein's mobility. For this mutant, there is not much more evidence about the virus behavior until now, so it should be explored to correlate with our findings. www.nature.com/scientificreports/ Molecular docking with known ligands. We performed molecular dockings with cepharanthine, hydroxychloroquine, and nelfinavir (known anti-SARS-CoV-2) on the experimental reported S protein to explore how these punctual mutations affect the protein-ligand binding inhibitors 20,28,32 . Cepharanthine showed potent antiviral activity against SARS-CoV-2 in vitro, and by molecular docking, S protein was suggested as its target. These experiments indicate that cepharanthine binds to RBD, interfering with the interaction of human ACE2 and the virus, avoiding the anchorage of the virus to the host cell 20,41 . In WT complex, cepharanthine forms a hydrogen bond with E484 and S494 and hydrophobic interactions with L455, F456, Y489, F490, L492, and Q493 that are residues involved in the interactions between viral spike and ACE2 (Fig. 6A,D) 42 . Cepharanthine bound to WT spike protein with the highest binding free energy (− 6.57 kcal/ mol), followed by H49Y mutant (6.42 kcal/mol) that also interacts with residues of the RBD (Y449, N450, Y451, L452, F490, L492, S494, and Q493) only by hydrophobic interactions (Fig. 6A,F). Cepharanthine from H49Ycepharanthine complex was a little bit displaced compared to the WT-cepharanthine complex, which might be responsible for the binding energy reduction. While, D614G and T573I showed a decrease in binding free energy (− 5.95 and − 5.39, respectively). Also, hydrophobic interactions with Y449, N450, Y451, L452, E484, F490, L492, Q493 and S496 residues of the RBD domain of the D614G mutant were observed with cepharanthine (Fig. 6E). While within the T573I mutant, cepharanthine moved away from the typical interaction site and formed hydrogen bond with R509 and hydrophobic interactions with F342, N343, A344, W436, N437, S438, N439 and N440 none of these residues belong to ACE2-RBD interacting residues (Fig. 6G). It seems that T573I mutation might affect the binding affinity of this protein with cepharanthine and maybe the reason for the biological effect of cepharanthine as anti-SARS-CoV-2 agent 19 .
Hydroxychloroquine is a compound with effective in vitro inhibition activity against SARS-CoV-2 infection, it also elevates the endosomal pH and affects the ACE-2 terminal glycosylation avoiding the virus binding 25 . Two independent computational works suggested that hydroxychloroquine might interact with the RBD domain of spike protein, through it can disrupt the viral-host recognition and avoid viral infection 20,28 . Therefore, in this study, we selected hydroxychloroquine to perform docking with WT spike protein and its mutants. Similar www.nature.com/scientificreports/ behavior as with cepharanthine was observed, hydroxychloroquine reaches its binding site on the RBD of the WT spike protein as well as on the D614G and H49Y mutants, while in T573I mutant hydroxychloroquine was moved away from this site, and it was reflected on the detriment of the binding free energy (Table S3, Fig. 6B). Hydroxychloroquine and WT S protein interact by forming hydrogen bonds between the compound and the amino acids Y489 and L492, and by hydrophobic interactions with Y449, L452, F456, E484, G485, C488, F490, E493 and S494 (Fig. 6H). The interaction of this compound with D614G is mediated by the formation of hydrogens bonds between the compound and amino acids E484, F490, L492 and Q493 and hydrophobic interactions with S494, Y449, L452, L455 and P491 (Fig. 6I). Interaction of the chemical compound hydroxychloroquine and H49Y mutant is mediated by the formation of hydrogen bonds with amino acids S349, Y449, N450, and S494 and by hydrophobic interactions with R346, F347, A348, Y351, A352, Y451 and L452 (Fig. 6J). In WT S protein and both D614 and H49Y mutants, hydroxychloroquine interacted with residues of the RBD that participate in ACE-RBD recognition 19,42 . On the other hand, hydroxychloroquine and T573I mutant interacting residues www.nature.com/scientificreports/ were F342, W436, and R509, with which the compound formed hydrogen bonds and hydrophobic interactions were observed with amino acids S438, L441, N343, A344, S373, F374, N437, S438, which are not involved in ACE2-RBD interaction 42 (Fig. 6K). Nelfinavir is currently marketed as an anti-HIV drug, and recently, through molecular docking, it was shown to inhibit S protein-mediated fusion of SARS-CoV-2 with host cells 32,41 . Nelfinavir interacts with S protein between the fusion peptide and HR1 helix near the S1/S2 cleavage site 32 . Therefore, nelfinavir was assayed by molecular docking against WT and mutants forms of the S protein. Nelfinavir was bound to WT S protein with the highest binding free energy (− 7.88 kcal/mol), followed by H49Y (− 7.52 kcal/mol), D614G (− 6.10 kcal/ mol), and finally, T573I mutant (− 5.38 kcal/mol). Regarding the binding site, all ligands were allocated around HR1, FP region, and S1/S2 cleavage site, but nelfinavir reaches different binding sites in every S protein studied (Fig. 6C). Nelfinavir interacted with WT spike protein through the HR1 region and formed by hydrogens bond with amino acids, I312, and Q314, and hydrophobic interactions with amino acids L303, E309, G311, Y313, I664, P665 and I666 as previously reported (Fig. 6L) 32 . Furthermore, nelfinavir interaction with D614 mutant was also through the HR1 region, but the interacting amino acids were different than in the wild type docking, nelfinavir bound by the formation of hydrogen bonds with amino acids N317, S596, K947 and Q1010 and by hydrophobic interactions with amino acids Q314, F592, G593, I1013 (Fig. 6M). Whereas nelfinavir docking with H49Y mutant was through a couple of residues of the HR1 region and by hydrophobic interactions with amino acids (A942 and S943), in addition, nelfinavir also interacted with residues of the N-terminal domain of the S protein (S13-L303) and with amino acids (T302, L303, K304) by hydrogens bonds, and with residues (V47, L48, Y49, S305) by hydrophobic interactions, in this case, nelfinavir was displaced out from the HR domain more evidently than with the D614G mutant. Still, these interactions were energetically favorable (Table S3, Fig. 6N). In the case of nelfinavir-T573I mutant complex, interacted mainly with residues of the HR1 region by the formation of hydrogen bonds with amino acids S943, D950 and D954, and by hydrophobic interactions with K947, V951, R1014, it also formed a hydrogen bond with K310 and hydrophobic interaction with P665 (Fig. 6O), even though, this binding mode was energetically less favorable than those observed with other mutants and WT S protein.
Therefore, the ligand-protein interaction depicted a similar energy behavior than the observed with cepharanthine and hydroxychloroquine, where WT protein-ligand complex has the highest binding energy, while T573Iligand complex has the lowest binding energy, which could indicate that the mutation on T573I position could be relevant for drug-spike protein interaction affecting not only the binding mode but also the binding free energy.

MD simulation of the complex protein-ligand.
To test the stability of the ligand-S protein complexes (WT, H49Y, D614G and T573I) obtained by molecular docking, 50 ns of MD simulation studies were carried out.
The average deviations in the atomic positions and stability through the trajectory of 50 ns of the MD simulations, the RMSD values of the protein-ligand complexes were calculated (Supplementary Information). The trajectories with cepharanthine show that the most stable trajectory at 50 ns corresponds to H49Y with values at 7 Å (± 1), while the trajectory of the WT is still unstable (Fig. S1A). The nelfinavir trajectories show the most stables trajectories with D614G; however, it was dissociated at the last five ns of the trajectory. The T573I complexes reach equilibrium at 25 ns with values at 9 Å (+ 1) (Fig. S1B). The WT complexes reach the equilibrium at 20 ns with values at 15 Å and remain without changes for the rest of the trajectory (Fig. S1B). From the trajectories of hydroxychloroquine with WT and the mutated proteins, it was observed that D614G complex form the most stable complex with values around 5 Å (± 1) (Fig. S1C), but this complex at the last 10 ns of the trajectory loses the equilibrium, whereas the trajectory with H49Y reach the equilibrium at 20 ns and remain for the rest of the trajectory with values at 8 Å (± 1) (Fig. S1C). The WT reaches the equilibrium at 20 ns with values at 10 Å (± 1) (Fig. S1C). The RMSF values of the trajectories with cepharanthine show that D614G exhibited the highest fluctuations on the RBD zone, follows by the WT, while T573I possesses the lowest fluctuations at this region (Fig. S1D). RMSF values with nelfinavir ligand present similar values for all the protein-ligand trajectories at the RBD, but shows an increase in fluctuation on the residues located at S1 for H49Y, WT, and D614G. In RMSF values for the trajectory with hydroxychloroquine, the T573I shows the higher values at RBD domain followed by the D614G, while the WT and H49Y have the lowest values.

Binding free energy calculations of complex protein-ligand.
Binding free energies (ΔG mmgbsa ) of the protein-ligand complexes were calculated from the last 20 ns of the MD simulation once the system reached equilibrium (Table 1) using the MMGBSA approach. Regarding WT ligand complexes, Nelfinavir complexes were energetically more favorable than cepharanthine and hydroxychloroquine, which is in line with the predictions obtained by above molecular docking.
Comparing among mutants, it can be seen that nelfinavir-T573I complex was energetically more favorable than the corresponding with WT-nelfinavir complex (− 39.34 vs − 19.52 kcal/mol), followed by nelfinavir-H49Y complex, in whose case the binding free energy becomes more positive, indicating a lesser favored complex (− 13.96 vs − 19.52 kcal/mol). While the nelfinavir-D614G complex was not energetically favorable (Table 1).
In the case of cepharanthine complexes, the WT-cepharanthine complex (− 15.53 kcal/mol) was the least favorable in comparison to the corresponding complex with mutant proteins, where the most favorable complex was formed with H49Y (− 19.43 kcal/mol), followed by T573I (− 18.83 kcal/mol) and D614G (− 16.36 kcal/mol).
For Hydroxychloroquine complexes, those formed with WT and D614G were diffused, indicating that a favorable complex was not formed. Only energetically favorable complexes were formed with T573I (− 15.53 kcal/ mol) and H49Y (− 14.05 kcal/mol), in which case is found among the least favorable complexes, this points out that hydroxychloroquine did not form stable complexes with WT spike protein and D614G-Pandey et al., found by molecular docking that hydroxychloroquine has lower affinity by WT S protein, which could be due to the www.nature.com/scientificreports/ labile and bulkier carbon chain, which disturb hydroxychloroquine binding reducing in this way the binding site 29 . On the other hand, Fantini et al., found by MD simulation studies that hydroxychloroquine rather than interact with Spike protein, it is bound to gangliosides site near to ACE-2, saturating the spike protein binding sites 43 . However, it is interesting how T573I and H49Y mutant produce more energetically favorable complex, so further experimental studies regarding these other spike inhibitors deserve to be experimentally studied.
More detailed analysis showed that non-polar interactions (ΔE nonpolar = ΔE vwd + ΔG npol,sol ) mainly contributed to the free energy of binding, where van der Waals interactions were the component with the most important contribution in comparison to non-polar solvation term (ΔG npol,sol ). Contrastingly, the contribution of polar solvation term was stronger unfavorable that screen the small favorable contribution of the electrostatic term (ΔE ele ). These findings suggest that the studied compounds mainly interacted by hydrophobic interactions.

Non-covalent interaction analysis of the protein-ligand complexes.
Most populated cluster conformations of each of the protein-ligand complexes were obtained through clustering analysis to gain the map of interactions established between the compounds and S protein WT and mutants after being submitted to MD simulation. To this end, the Non-Covalent Interaction index (NCI) 44 was employed. NCI has become a useful tool for the qualitative and quantitative analysis of intermolecular interactions in supramolecular and biological systems [45][46][47] . The NCI framework belongs to quantum chemical topology 48 , and has the advantage of going far beyond the simple determination of non-covalent interactions by geometrical examination. NCI provides an intuitive way to visualize the presence of non-covalent interactions by the analysis of 3D isosurfaces 49 , such as those observed in Fig. 7. Localized interactions, i.e., those that can be attributed to some atomic pairs (e.g., hydrogen bonds), are depicted as small disk surfaces that appear in a middle way between the interacting atoms. On the other hand, delocalized interactions (for example, van der Waals forces) emerge as flat extended surfaces that cover large interacting regions. Additionally, a color scale is used to characterize the interaction strength. Bluish, greenish and reddish zones of the NCI isosurfaces represent strong attractive, weak and repulsive interactions, respectively. Furthermore, a quantitative analysis is obtained by the integration of the electron density in the NCI isosurfaces and its partition in different ranges of interactions (strong attractive, weak and repulsive). These quantities have been shown to correlate with binding energies in protein-ligand interactions 50 . More details about NCI theory and interpretation can be found in the Supplementary Information.
The NCI surfaces of the most populated cluster conformations of the protein-ligand complexes are depicted in Figs. 7 and S2. The classification of the interactions between the ligands and the closest residues of the S protein, according to NCI, is given in Table NCI. In addition, the atom-atom intermolecular distances of the localized interactions are shown as a complementary guide for its strength classification. The residues forming localized and delocalized interactions with the ligand are depicted in yellow and magenta, respectively. The surfaces corresponding to localized interactions are marked by a red square. The regions where π-interactions are present are indicated as well. As a general trend, it is observed that the protein-ligand complexes are stabilized mainly by dispersive interactions, i.e., the green surfaces predominate. This result is supported by the NCI integrals, which show that the contributions from weak interactions are larger than those of the strong attractive ones (Fig. S2). This outcome is also in agreement with the previous conclusion drawn from the above MMGBSA approach, that non-polar interactions have a major contribution to the free energy of binding. Moreover, it is interesting to note that the total NCI integrals show, in the main, the same tendency that the free energies of binding (Fig. S2). Table 1. Binding free energy components of complexes between ligands and SARS-CoV-2 spike protein (kcal/ mol). All energies are averaged over 200 snapshots at time intervals of 100 ps from the last 20 ns-long MD simulations, and they are in kcal/mol (± standard deviation). ND due to ligand diffuses of the binding pose at the first ns of simulations.  Cepharanthine remains bound to the RBD in WT via a strong N-H⋯O hydrogen bond and a weak nonconventional C-H⋯O hydrogen bond 52 to E484. Besides, one of the methyl groups of cepharanthine is forming a C-H⋯π interaction with the aromatic ring of F490. As well, it forms van der Waals contacts with four close neighbors ( Table 2). A diversity of other interactions is found for this ligand with the other mutant-S proteins. With D614G, it is slightly displaced with respect to the initial site (Fig. S3). Two new local interactions, the socalled hydrogen-hydrogen bonds (H⋯H) 53 are formed instead, while delocalized interactions are established with eight neighbors. In its complex with H49Y, cepharanthine is still bound to the same place as at the beginning of the MD simulation. There were found a strong N-H⋯O hydrogen bond with Y449 and two H⋯H bonds with L542 and T470, respectively. In addition, another C-H⋯π interaction is formed between a methyl group of the ligand and the aromatic ring of Y449. It also forms a C-H⋯π interaction with the same residue than in WT, but the ligand acts as an acceptor in this case. The number of van der Waals contacts is also similar as in its complex with WT (Table 2). Finally, in the Cepharanthine-T573I complex, cepharanthine was slightly moved away from the initial binding site. A weak C-H⋯O hydrogen bond with F342 is formed. Additionally, two C-H⋯π www.nature.com/scientificreports/ interactions are established between a methyl group of the ligand and W436, and between an aromatic group of the ligand and F374, respectively. For this complex, nine van der Waals interactions were found. The hydroxychloroquine complexes were only stable with the H49Y and T573I mutants, where they migrate from different pockets of the RBD regions. In the case of WT and the D614G mutant, the ligand diffuses from the proteins. The NCI isosurfaces reveal that for these systems, localized interactions are scarce. This fact explains why these complexes show the lowest free energy of bindings according to the MMGBSA approach. The π-interactions govern the complexation of hydroxychloroquine with H49Y. It forms a distant π⋯π stacking with the aromatic ring of F338, while the two aromatic rings of the ligand act as acceptors in C-H⋯π interactions formed with the A372 and F374 residues. In addition, one of the C-H bonds from the aromatic ring of the ligand act as a donor in another C-H⋯π contact established with the aromatic ring of F342. Two more van der Waals contacts were found for this system ( Table 2). The hydroxychloroquine-T573I complex shows one H⋯H weak hydrogen bond with A522 and act as an acceptor for a C-H⋯π interaction formed with I332. With the remaining close residues only van der Waals forces are established ( Table 2).
The nelfinavir complex with WT shows large free energy of binding, which can be partially attributed to a strong N-H⋯O hydrogen bond formed with I666. It further has van der Waals interactions with other nine residues, which contribute to form a stable complex since the ligand did not move from the original binding site. Contrarily, its complex with D614G was not energetically favorable (Table 1). Interestingly, the Nelfinavir-H49Y complex also shows nine van der Waals contacts but no localized interaction. This outcome explains why it shows one of the lowest free energies of binding among all the analyzed complexes. The residues that interact with nelfinavir were localized in the HR1 region. The nelfinavir-T573I has a greater number of localized interactions. It forms one strong N-H⋯O hydrogen bond with K310, one N-H⋯O and one O-H⋯O hydrogen bond with I312, another N-H⋯O hydrogen bond with Q954, and a strong O-H⋯O hydrogen bond with D663, plus a weak C-H⋯O hydrogen bond with Y313. Also, it acts as an acceptor in a C-H⋯π interaction established with www.nature.com/scientificreports/ P665. Finally, van der Waals forces are also abundant for this complex, which provides extra stabilization. This analysis explains why the Nelfinavir-T573I system has such great free energy of binding compared to the rest of the complexes, which is also predicted by the NCI integrals.

Multiple alignments of spike (S) protein.
The sequences corresponding to S protein from SARS-CoV-2 were downloaded (April 27th) from the Global Initiative for Sharing All Influenza Data (GISAID) database 54 (www.gisai d.org). The sequence from Wuhan was used as WT, and it was aligned against 49 S sequences found in Mexican population (Table S1). Multiple alignments were carried out in ClustalX using default parameters, further, it was edited on Jalview 55 . The whole sequence of the S protein has 1282 residues, we showed only the residues where the protein exhibited mutations found in the Mexican population, the sequences without changes were hidden. To obtained the 3D mutants of S protein, mutations were done on PyMol 56 , using as a template the crystal structure of SARS-CoV-2 S glycoprotein (PDB: 6VSB). Visualization of wild type and the H49Y, T573I, and D614G S mutants were performed on VMD (Fig. 2). Zoom of the structural 3D alignments were done to visualize the region which contains the mutations (Fig. 2).
MD simulation and analysis. MD simulations were carried out with AMBER 16 software package 57 using ff14SB forcefield 58 . Force field ligand parameters were assigned using the semi-empirical AM1-BCC and the general Amber force field (GAFF) 59 . The systems were solvated using an explicit TIP3P water model and centered into a rectangular box of 12.0 Å; after that, all the systems were neutralized by adding 6 Na + counter ions. Each one of the systems was minimized through 2500 steps of steepest descent and 2500 steps of conjugate gradients. Then, they were equilibrated through 500 picoseconds (ps) of heating and 500 ps of density equilibration with weak restraints followed by 2 ns (ns) of constant pressure equilibration at 310 K. MD simulations of Apo proteins were carried out for 100 ns while the complex ligand-protein complexes MD simulations of Apo proteins were carried out for 100 ns while the complex ligand-protein complexes for 50 ns, under periodic boundary conditions and using an NPT ensemble at 310 K. The electrostatic term was described through the particle mesh Ewald method 60 ; using a 10.0 Å cut-off, similar radio was chosen for van der Waals interactions. The SHAKE algorithm 61 was used to constrain bond lengths at their equilibrium values, and a time step was set to 2.0 fs. Temperature and pressure were maintained with the weak coupling algorithm 62 using coupling constants τT and τp of 1.0 and 0.2 ps, respectively (310 K, 1 atm). Trajectories were analyzed using cpptraj module for root mean squared deviation (RMSD), root mean squared fluctuations (RMSF), the radius of gyration (R g ). Based on the knowledge of the equilibration time, clustering analysis (using a 3.0 Å cut-off) was done using a hierarchical agglomerative (bottom-up) approach employing AmberTools 16, where the most populated cluster conformation of each MD simulation was obtained, and further structural examination was performed, in order to compare the structural differences between the initial conformation and the one obtained after MD simulation. Clustering analysis is a statistical method data mining tool that allows partitioning a data set based on similar features; in MD simulations it allows to reduce complex ensembles getting smaller subsets of data and obtain representative conformation from individual clusters 63,64 , in this case, we pick up a conformation from the most populated cluster Principal component analysis (PCA), also known as essential dynamic (ED) was performed. PCA analysis is a statistical technique that allows obtaining the large scale collective motions of the atoms on the simulations, which are frequently correlated to the biological function and biophysical properties 65 . The performed method is described in detail elsewhere 66 . Structural analysis of the systems and figures were performed using PyMOL v0.99 56 .
Molecular graphics were performed in SigmaPlot 12.0, and protein visualization was performed by VMD and Chimera 67, 68 . Binding free energies calculation. The MMGBSA [69][70][71] approach was employed to calculate the binding free energy (ΔG bind ). Twenty hundred snapshots at time intervals of 100 ps were selected through the equilibrated simulation (last 20 ns), and all counterions and water molecules with a salt concentration of 0.10 M were deleted using the implicit solvent model 72 . ΔG bind analysis was performed as previously described 73 . Molecular docking. Autodock 4.2 software was used for docking calculations 74 . Focused docking was performed, for cepharanthine and hydroxychloroquine grid box was center on RBD domain with a grid box size of 80 Å × 70 Å × 70 Å, as reported elsewhere 20,28 . For nelfinavir, grid box was focused on residues L303, Y313, Q314, between HR1 and FP region. With a grid box size of 80 Å × 82 Å × 80 Å 32 and a grid spacing of 0.375 Å. Compounds and S proteins were prepared using AutoDock Tools 1.5.6 74 . Polar hydrogen atoms and Kollman charges were encompassed for the S protein. Lamarckian genetic algorithm was used as a scoring sample with a randomized population of 100 individuals, and energy evaluations of 1 × 10 7 , 100 runs were performed. As a selection criterion, the conformation with the lowest free energy values was chosen. Docking results were analyzed using Autodock Tools 1.5.6 74 , and whereas figures were further processed with Pymol v.099 56 . Ligand structures are provided in Supplementary Material as in Fig. S4.

Non-covalent interaction index.
The NCI isosurfaces were generated using promolecular electron densities, ρ(r), taking as input the geometry obtained from the most populated cluster conformation retrieved from MD simulations. The keyword LIGAND was used to analyze exclusively the intermolecular interactions between the drugs and the surrounding amino acid residues, where a 5 Å radio cut-off was applied. The integrals of ρ(r) in the NCI isosurfaces were split in different ranges of sign(λ)ρ(r), where sign(λ) is the sign of the second eigenvalue of the electron density Hessian. These are: from − 0.1 to − 0.02 for strong attractive regions, from − 0.02 to www.nature.com/scientificreports/ the health care workers involved in the sampling collection, without them the GISAID database would not be possible.

Author contributions
Y.S.L. carried out theoretical studies, results analysis and wrote the manuscript; J.C.B. carried out theoretical studies, results, analyzed the data and wrote the manuscript. B.L.R., carried out the non-covalent interaction index in the protein-ligand complexes and wrote the paper, M.B. and J.A.G.T. analyzed the data and wrote the manuscript and SM, carried out theoretical studies, project design, results analysis and wrote the manuscript.

Funding
This work was supported by project PROFAPI 2015/182 (SM).