Spike protein mutations and structural insights of pangolin lineage B.1.1.25 with implications for viral pathogenicity and ACE2 binding affinity

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the causative agent of COVID -19, is constantly evolving, requiring continuous genomic surveillance. In this study, we used whole-genome sequencing to investigate the genetic epidemiology of SARS-CoV-2 in Bangladesh, with particular emphasis on identifying dominant variants and associated mutations. We used high-throughput next-generation sequencing (NGS) to obtain DNA sequences from COVID-19 patient samples and compared these sequences to the Wuhan SARS-CoV-2 reference genome using the Global Initiative for Sharing All Influenza Data (GISAID). Our phylogenetic and mutational analyzes revealed that the majority (88%) of the samples belonged to the pangolin lineage B.1.1.25, whereas the remaining 11% were assigned to the parental lineage B.1.1. Two main mutations, D614G and P681R, were identified in the spike protein sequences of the samples. The D614G mutation, which is the most common, decreases S1 domain flexibility, whereas the P681R mutation may increase the severity of viral infections by increasing the binding affinity between the spike protein and the ACE2 receptor. We employed molecular modeling techniques, including protein modeling, molecular docking, and quantum mechanics/molecular mechanics (QM/MM) geometry optimization, to build and validate three-dimensional models of the S_D614G-ACE2 and S_P681R-ACE2 complexes from the predominant strains. The description of the binding mode and intermolecular contacts of the referenced systems suggests that the P681R mutation may be associated with increased viral pathogenicity in Bangladeshi patients due to enhanced electrostatic interactions between the mutant spike protein and the human ACE2 receptor, underscoring the importance of continuous genomic surveillance in the fight against COVID -19. Finally, the binding profile of the S_D614G-ACE2 and S_P681R-ACE2 complexes offer valuable insights to deeply understand the binding site characteristics that could help to develop antiviral therapeutics that inhibit protein–protein interactions between SARS-CoV-2 spike protein and human ACE2 receptor.

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the virus responsible for the global pandemic known as coronavirus disease 2019 (COVID- 19), which emerged in December 2019.The World Health Organization (WHO) reports that COVID-19 has caused significant morbidity and mortality worldwide, with 651 million reported cases and 6.6 million deaths as of December 2022 (WHO Coronavirus (COVID-19) Dashboard|WHO Coronavirus (COVID-19) Dashboard with Vaccination Data).
In Bangladesh, the first case of COVID-19 was reported in March 2020, and since then, the country has experienced a steady increase in infections 1 .As of December 30, 2022, there have been 2,037,024 confirmed cases and 29,439 deaths (https:// corona.gov.bd/? gclid).Throughout the pandemic, samples from COVID-19 positive patients were collected from the National Institute of Laboratory Medicine and Referral Center (NILMRC) and sent to the Bangladesh Council of Scientific and Industrial Research (BCSIR) for sequencing SARS-CoV-2 isolates.The generated data were subsequently submitted to the Global Initiative for Sharing All Influenza Data (GISAID) and the National Center for Biotechnology Information/EUA (NCBI).Whole-genome sequencing was performed on these isolates to analyze the mutations and understand their relationship to the original SARS-CoV-2 strain in Wuhan, China 2,3 .
The spike (S) protein of SARS-CoV-2 plays a crucial role in viral entry into host cells.This glycoprotein forms trimers on the virion surface and binds to the host-cell receptor angiotensin-converting enzyme 2 (ACE2) through its receptor-binding domain (RBD).Subsequent structural rearrangements facilitate the fusion of the viral membrane with the host-cell membrane 4 .The S protein has undergone significant evolutionary changes, leading to the emergence of various variants of concern.These variants exhibit multiple mutations in the spike protein and can impact viral replication efficiency, neutralizing antibody sensitivity, and binding affinity to the ACE2 receptor 5 .
The Delta variant, in particular, has demonstrated higher replication efficiency and reduced sensitivity to neutralizing antibodies compared to the original strain 6 .It exhibits increased binding affinity to the ACE2 receptor and enhanced spike-mediated entry efficiency, making it highly concerning 7,8 .
In this study, we aim to identify the prevalent variants and characteristic mutations of SARS-CoV-2 in Bangladesh throughout 2021.We will investigate the impact of these mutations on the spike protein by analyzing the interaction patterns between mutant spike proteins and the ACE2 receptor.Additionally, we will assess changes in protein dynamics and stability resulting from vibrational entropy alterations.To achieve these objectives, we will employ various molecular modeling techniques, including protein modeling, molecular docking, and geometry optimization using quantum mechanics/molecular mechanics (QM/MM), and protein stability analysis.
By comprehensively analyzing the genetic variations and their effects on the spike protein, we hope to contribute to the understanding of SARS-CoV-2 evolution in Bangladesh and its implications for the ongoing COVID-19 pandemic.

Methods
Whole-genome sequencing.SARS-CoV-2 detection via RT-PCR assay (Novel Coronavirus (2019-nCoV) Nucleic Acid Diagnostic Kit, Sansure Biotech) from nasopharyngeal specimens of 17 COVID-19-positive patients collected from the NILMRC, Bangladesh.Library preparation and next generation sequencing of the samples were conducted by the Genomic Research Laboratory of BCSIR.This study was approved by the human research ethics committee of NILMRC (project code: 224125200).All methods were performed in accordance with the relevant guidelines and regulations.Informed consent was obtained from all subjects and/ or their legal guardian(s).
Viral RNA was extracted using PureLink™ Viral RNA/DNA Mini Kit (Cat.no: 12280050, Thermofisher Scientific, USA) 1 .The cDNA of all samples was used to prepare paired-end libraries with the Nextera ™ DNA Flex Library Preparation kit according to the manufacturer's instructions (Illumina Inc., San Diego, CA).The Library pool of the samples was sequenced using the S4 flow cell of Illumina NextSeq 550 instrument in a paired-end fashion (read length 151 bp; Illumina Inc.).FASTQ files were generated in the Illumina BaseSpace Sequence Hub (https:// bases pace.illum ina.com).The generated data for BCSIR-NILMRC-422 and BCSIR-NILMRC-424 after NGS sequencing were submitted to the NCBI as SRA submissions and got the SRA numbers SRP336906 and SRP336782 (PRJNA762998 and PRJNA762670) respectively.Genomes of SARS-CoV-2 viruses were assembled using Basespace DRAGEN RNA Pathogen Detection V3.5.14 software with default parameters.Consensus FASTA files of the SARS-CoV-2 genome were generated using Basespace Dragon RNA Pathogen Detection software version 3.5.1 (https:// bases pace.illum ina.com) with default settings.The sequences of SARS-CoV-2 isolates from Bangladeshi patients were compared to the reference SARS-CoV-2 sequence (NC_045512.1),through nucleotide substitutions.The positions of the nucleotides and amino acids were further confirmed from Gen-eBank reference sequences (NC_045512.1) 9 .
Design, refinement, and validation of the tertiary structure of the mutated spike proteins.Genome assembly of the raw data was performed using the assembly toolkit in the EzCOVID19 cloud service, provided on the EzBioCloud website (https:// eztax on-e.ezbio cloud.net).Genome sequences within the assembly were all 29,903 nucleotides in length and covered 99.84% of the SARS-CoV-2 Wuhan reference genome (NC_045512.2), with an average GC content of 38%.The FASTA file containing the sequences of all spike proteins was aligned and visualized using CLUSTAL Omega (https:// www.ebi.ac.uk/ Tools/ msa/ clust alo/) and Mview (https:// www.ebi.ac.uk/ Tools/ msa/ mview/).CFSSP (http:// www.biogem.org/ tool/ chou-fasman/), Swiss-Model (https:// swiss model.expasy.org/), and Rap-torX (http:// rapto rx.uchic ago.edu/) servers were used to predict secondary and 3D structure as well as solvent accessibility and disordered regions of spike proteins with D614G (S_D614G) and P681R (S_P681R) mutations.The latter one uses an input sequence in FASTA format and predicts its tertiary structure, based on three strategies: (a) single and (b) multiple template threading and (c) alignment quality prediction.To verify the quality of the predicted 3D structure, the server assigns confidence scores, including the P-value for relative global quality, GDT (Global Distance Test) and uGDT (un-normalized GDT) for absolute global quality.The RaptorX-Contact was officially ranked 1st in contact prediction in terms of F1 score in the worldwide protein structure prediction (CASP) competition round XII.
The 3D models obtained for S-D614G and S-P681R were refined using the GalaxyRefine server (http:// galaxy.seokl ab.org/).This web server is based on a method that refines 3D structures by implementing short Molecular Dynamics (MD) after repeated side-chain repacking perturbations at global and local levels, enabling larger movement.This server offers one of the best solutions for improving both local and global quality of the given structure by using the most up to date protein structure predictions available 10 .
The Swiss Model's Structure Assessment (https:// swiss model.expasy.org/ assess), MolProbity (http:// molpr obity.bioch em.duke.edu/ index.php), ProSA-web (https:// prosa.servi ces.came.sbg.ac.at/ prosa.php), SAVES-ERRAT, Verify3D and PROCHECK (https:// saves.mbi.ucla.edu/) tools checked the quality and possible errors of the 3D models using a newly validated structural validation protocol 11,12 .Swiss Model's Structure Assessment Tool was used to obtain information at global and local levels of structure through its own methods (QMEAN and Ramachandran plot) or by running additional software tools (MolProbity).The ProSA-web tool calculates an overall quality score for each input protein structure and highlights any problematic regions using a 3D molecule viewer.If the calculated score falls outside the range characteristic of native proteins, the structure likely contains errors 13 .The SAVES server v6.0 generated a Ramachandran plot to visualize the energetically allowed and disallowed dihedral angles psi (ψ) and phi (φ) of the amino acids 14 , as well as the overall quality score of the modeled protein by analyzing the unbound atom-atom interactions, and compared it with reliable high-resolution crystal structures using ERRAT, Verify3D, Prove, PROCHECK, and WHATCHECK.
Protein motion and flexibility analysis.Proteins are dynamic macromolecules whose function is closely linked to their biological movements 15 .It is known that both drug-resistant and genetic disease mutations can act through changes in the conformational balance and dynamics of proteins 16 .To predict the effects of single-point mutations on S protein stability, we used the DynaMut server (http:// biosig.unime lb.edu.au/ dynam ut/) to analyze the protein movement and flexibility of S_D614G and S_P681R.In this tool, ΔΔG ≥ 0 is considered stabilizing, and ΔΔG < 0 is considered destabilizing.In the entire blinded set of 702 mutations, including both forward and hypothetical reverse mutations, DynaMut achieved performance compatible with (or better than) that of established methods 17 .
Molecular docking and QM:MM simulations.Using the individual structures of two molecules of interest, a molecular docking tool identifies the binding modes through a search algorithm and evaluates them with an energy scoring function, thus being able to predict protein-protein interactions.Here, initially, the molecular structures of protein_1/ligand and protein_2/receptor ((S_D614G and S_P681R) ACE2-PDB ID: 7W9I) were adjusted by adding charges (protonation or deprotonation) in atoms and correcting bonds, using the PROPKA 3.1 package (https:// github.com/ jense ngroup/ propka/).The pH parameter used was within the physiological range (7.2-7.4)due to the presence of MAYV in the bloodstream.To make the calculation more accurate, a force field called CHARMm (Chemistry at Harvard Molecular Mechanics) version 36, specifically designed for organic molecules, was used to perform molecular dynamics simulations and optimize the geometry of hydrogen atoms [18][19][20][21] .
The docking between human ACE2 and spike proteins S_D614G and S_P681R was implemented using Cluspro (https:// clusp ro.bu.edu).The server performs three computational steps as follows: (1) rigid-body docking by sampling billions of conformations; (2) root-mean-square deviation (RMSD)-based clustering of the 1000 lowest-energy structures generated, to find the largest clusters that will represent the most likely models of the complex; and (3) refinement of selected structures using energy minimization 22 .The best Cluspro model was subjected to an additional refinement step in the HADDOCK interface (https:// bianca.scien ce.uu.nl/ haddo ck2.4/ refin ement/1).This server provides a list of clusters in order of score and presents detailed statistics showing the average score of the top four structures for each cluster 23 .
We used the combined quantum mechanics/molecular mechanics technique (QM/MM) to optimize the geometry of the docking models and to select the most relevant protein-protein complexes from the docking calculations, namely S_D614G-ACE2 and S_P681R-ACE2.The QM /MM methods have now established themselves as the most advanced computational methods for biomolecular systems, justifying the rapidly growing number of publications 24,25 .The approach combines the reaction center/active site as the QM region/layer (ACE2), whereas the MM region/layer accommodates the remaining part of the system (major amino acid residues of the S protein).As most biomolecular events concern the reaction center and not the full biomolecular system, QM/MM is ideal for obtaining relevant insights 26,27 .Here, the popular B3LYP hybrid functional (Becke, three parameters, Lee-Yang-Parr) of exchange-correlation and basis set 6-311G (d, p) were used to expand the electronic orbitals for the QM layer, and all amino acid residues within a radius of 6.0 Å from the centroid of the S protein were allowed to move during geometry optimization.
The best Cluspro and QM/MM models were run using the PRODIGY tool (https:// bianca.scien ce.uu.nl// prodi gy/ lig) for a comparative analysis of binding energies (van der Walls, electrostatic, and desolvation energies).PRODIGY predicts protein-protein binding affinity (or binding free energy) based on the functional and structural features of the biological system, i.e., the interfacial contact network 28 .The RMSD calculation of these same structures compared to their respective original PatchDock complexes was performed by Discovery Studio (https:// disco ver.3ds.com/ disco very-studio-visua lizer-downl oad).Finally, binding poses and the presence of intermolecular interactions were evaluated using LigPlot+ (https:// www.ebi.ac.uk/ thorn ton-srv/ softw are/ LigPl us/), PoseView (https:// prote ins.plus/), and Discovery Studio Visualizer, specifically intermolecular hydrogen bonds (carbon, conventional, and pi-donor H-bonds), electrostatics (salt bridge, attractive charges, pi-cation, pi-anion).
Ethics approval.This study includes experiments with humans and the human research ethics committee of the National Institute of Laboratory Medicine and Referral Center (NILMRC) approved the whole-genome sequencing of SARS-CoV-2 in this study (project code-224125200).
The age range of the patients was of 22-74 years (Supplementary Fig. 1a), and the sex distribution was 88.23% male and 11.76% female (Supplementary Fig. 1b).To identify clusters within the diversity of SARS-CoV-2, the coronavirus typing tool applies phylogenetic analysis (Supplementary Fig. 2).This typing tool has prudently selected reference sequences that denote the diversity of each well-defined phylogenetic cluster (detailed in nextstrain).Additionally, this tool executed extensive testing to be sure that their reference strains accurately classified other sequences 30,31 .If the sequence is identified as being part of the SARS-CoV-2 cluster, an additional phylogenetic analysis is performed to identify one of the following lineages: B. Assembly was performed by aligning reads to a pre-defined Wuhan reference genome (NC_045512.2).Genome assembly of the raw data was done by the Assembly Toolkit in the EzCOVID19 and EDGE COVID-19 cloud services provided on the EzBioCloud website and the EDGE bioinformatics website, respectively 33,34 .To retain the exceptional variations of raw data using the EDGE COVID-19 cloud service, a consensus genome was generated 33 .After that, the consensus genome was compared to the same reference genome in order to calculate single nucleotide variations (SNVs) with positions.The SNVs were compared against GISAID clade variation markers.The genome presented here belongs to the B.1.1.25lineage of the GR clade.This result was again confirmed by analyzing the raw data using the default parameters of the coronavirus typing tool offered by Genome Detective 35 .
Table 2 shows the single nucleotide variations, non-synonymous and synonymous mutations, between the input assembly and the SARS-CoV-2 reference NC_045512.2(https:// www.ncbi.nlm.nih.gov/ nucco re/ NC_ 045512.two) according to analyzes in the EZBioCloud database (https:// www.ezbio cloud.net/ tools/ sc2/?id= 4024a 5b2-40bb-40b5-acd6-c4dd6 5b38d 76).The queried dataset covers 99.84% of the SARS-CoV-2 genomic sequence.The nucleotide sequences were indicated starting from the 5′ UTR, while the corresponding amino acid changes were mentioned separately for each protein-coding region.In more than 30% of the viral isolates, nucleotide substitutions of C > T were observed, whereas, G > T and G > A showed 20% and 15% respectively.Among these mostly seen substitutions, C > T, which are present in the ORF1ab, S gene, and ORF3a gene (Table 2).G > T is found in ORF1ab, ORF3a, and the M gene, whereas, G > A present in the S gene and maximum cases in the N gene.
The emergence of mutations in SARS-CoV-2 evolution, in the context of 'variants of concern' , has been characterized by factors that influence virus characteristics, including transmissibility and antigenicity.These mutations are most likely a result of the human population's changing immune profile 36 .The amino acid change D614G in spike protein was noted to be increasing in frequency in April 2020 and to have emerged several times in the global SARS-CoV-2 population 36 .The high dN/dS ratio indicates positive selection at codon position 614, which could confer a moderate advantage for infectivity and transmissibility 35,37 .
Most of the viral isolates were found to have different amino acid substitution combinations (Table 3).However, the same amino acid substitution combinations (Spike_D614G and Spike_P681R) were observed for the samples BCSIR-NILMRC_393, BCSIR-NILMRC_396, BCSIR-NILMRC_421, BCSIR-NILMRC_422, and BCSIR-NILMRC_424.In this study, we have emphasized these two amino acid mutations, and hence we will discuss these two amino acid mutations in detail later on.In every isolate, Spike_D614G is the common mutation found in the spike protein (Table 3).The analyzed viral isolates were found to harbor 11 to 19 mutations per isolate, compared to the reference sequence.
For B.1.1.25lineage reports with S:P681R study, we have used the outbreak.infotool (https:// outbr eak.info/ situa tion-repor ts).The prevalence of the B.1.1.25lineage with S:P681R in the world, how it is changing over time, and how its prevalence varies across different locations, especially in Bangladesh, have been outlined in this report.To describe the prevalence of sets of mutations in Mutation Situation Reports, we rely on shared virus sequences from the GISAID.We also rely on the accuracy of the sequences and sample metadata deposited in GISAID, while we apply filters to eliminate some low-quality sequences and unreasonable metadata.The mentioned sequences are a sample of the entire number of samples, a biased sample that may not represent the true prevalence of the mutations in the population very frequently.Overall, it is important to note that case numbers for any given lineage/mutation can be significantly affected by total case numbers and rates of genomic sequencing at any particular location (https:// outbr eak.info/ situa tion-repor ts) 38 .), the United States of America (11.0%), the United Kingdom (5.0%), and Australia (2.0%) are the most common countries of this lineage.All the spike protein sequences, along with the reference sequence from Wuhan, were aligned using the multiple sequence alignment (MSA) platform of CLUSTAL Omega for sequence alignments and structure.After finishing the alignment, the generated file was viewed using MView, and variations in the sequence or amino acid changes were noted.For the prediction of secondary structures of the SARS-CoV2 spike protein, CFSSP (Chou and Fasman secondary structure prediction) server was used in this experiment.To check for the presence of similarities or differences, all these spike protein sequences were first aligned in CLUSTAL Omega.We found that at position 614, all the isolates from Bangladesh have a common mutation, Table 2. Common nucleotide substitutions in 17 SARS-CoV-2 viral genomes from Bangladesh (submitted to GISAID in October 2020) compared to the SARS-CoV-2 NCBI reference genome NC 045,512.1.The viral genes were identified according to the reference sequence information from GeneBank.The nucleotide sequences were indicated starting from the 5' UTR, while the corresponding amino acid changes were mentioned separately.(https:// www.ezbio cloud.net/ tools/ sc2/?id= 4024a 5b2-40bb-40b5-acd6-c4dd6 5b38d 76).www.nature.com/scientificreports/D614G whereas, in the reference genome (Wuhan), it is 'D' instead of 'G' .So, in this position, all the isolates of amino acid D (aspartic acid) changed into G (Glycine).Also, at position 681, the isolates BCSIR-NILMRC_393, BCSIR-NILMRC_396, BCSIR-NILMRC_422, and BCSIR-NILMRC_424 show the mutation P681R (Fig. 1).
Figure 1 shows the secondary structures in and around the site of the mutations, highlighting the loss of turn in positions 614 and 681.These conformational changes may affect the function of the receptor-binding subunit S1 of the S protein, particularly in the recognition of the host-cell receptor ACE2.Cryo-EM Delta spikes and S-ACE2 complex structures show that RBD destabilizations cause a population shift toward a more RBD-up and S1-destabilized fusion-prone state, which is advantageous for ACE2 engagement and S1 shedding 39 .
The amino acid sequences of our isolates were submitted to the Swiss-Model/Raptor X server to construct the tertiary structure of the S proteins with mutations D614G (S_D614G) and P681R (S_P681R).The best model in each case was refined using short molecular dynamics relaxations after repeated side chain repacking perturbations using GalaxyRefine2.Of the 10 models provided to S_D614G and S_P681R, the one with the best stereochemical and structural parameters was selected, considering the analyses of bond length and angle geometry, Ramachandran, rotamer, C β deviation analysis, cis-peptide, CaBLAM analysis and after run clashscore to find bad clashes and clashscore, a saber: RMSD (0.814 and 0.42), MolProbity (1.18 and 0.81), Clash score (1.31 and 0.99), poor rotamers (0 and 0), Ramachandran favored (95.36% and 97.95%), C β deviation (1.65% and 0.55%) and CA geometry outliers (0.52% and 1.04%) (Fig. 2), respectively.The MolProbily score represents the most important protein quality statistic because it combines the Clashscore, Rotamer, and Ramachandran scores into a single normalized score.The RMSD indicates the average deviation between the atoms of the refined and unrefined structures, and this value should be as low as possible.
To verify the structural and stereochemical qualities of the refined models of protein S_D614G (S_P681R), all-atom structure validation analyses were performed using MolProbity, Ramachandran plot, Z-score, ERRAT, Verify3D, and PROCHECK tools (Figs. 3 and 4).According to Ramachandran's plot, 95.36 percent (97.95 percent) of the residues were in the most favorable ranges, while only 4.12 percent (1.53 percent) were in the allowable ranges.Disallowed ranges make up 0.52 percent (0.52%).The Z-score of the model was estimated to be − 5.92 (− 5.93), which is within the range of scores normally found for native proteins of similar size.According to ERRAT, the overall quality factor of the protein model compared with highly refined structures was 80.88% (79.14%), and in Verify3D, 100.00% (96.45%) of the amino acid residues have an average 3D-1D score > = 0.2, showing high compatibility with the databases used for comparison.Validation was completed, with no critical errors found in the tertiary structure model.
A point mutation can significantly affect the structure, stability, and flexibility of some proteins by disrupting conformational constraints through overpacking or physicochemical effects.These changes can lead to perturbations in protein function, affect the difference in free energy between the folded and unfolded states of the protein, and cause changes in the interaction energy between residues 40 .To determine if changes in secondary structure are also reflected in the dynamics of the protein in its tertiary structure, we performed normal mode analyses and studied S protein stability and flexibility.The change in vibrational entropy energy (ΔΔSVib ENCoM) in case of mutation D614G between the wild-type Wuhan isolate and the BCSIR-NILMRC-422 isolate was 0.856 kcal/ mol K.It was 0.018 kcal/mol K in case of mutation P681R (Fig. 5).The ΔΔG was − 0.511(0.224)kcal/mol and the ΔΔG ENCoM was − 0.684 (− 0.014) kcal/mol for mutation D614G (P681R).These findings confirmed a stabilizing mutation in these types of spikes.
The interatomic interactions of mutations S_D614G and S_P681R have been shown in Fig. 6.We observed significant changes in the positions of these isolates (Figs. 7 and 8).Atomic fluctuations such as these indicate the degree of atomic motion deformation, while energy measurements detect protein flexibility.
Having identified the predominant strains in Bangladesh in 2021 and designed 3D models of the spike proteins of these viruses, we address here the description of the binding modes of their respective RBD with ACE2 by docking molecular and QM/MM simulations.Binding of the S_D614G and S_P681R proteins to ACE2 performed by the Cluspro server resulted in a set of possible models ordered by cluster size.In HADDOCK refinement, a solvent shell was built around the best complexes and, subsequently, a series of short MD simulations were performed according to the parameters below, all atoms except the side-chain atoms at the interface are restrained to their original position.Next, 1250 Molecular Dynamics steps were performed at 300 K with position restraints for heavy atoms which are not part of the PPI (residues not involved in intermolecular contacts within 5 Å).Finally, the systems were cooled down (1000 MD steps at 300, 200 and 100 K) with position restraints on the backbone atoms of the protein complex, excluding the interface atoms.
In the S1 domain, the most significant mutation was D614G.This mutation, located near the receptor-binding domain at a downstream position, emerged in the spring of 2020 and quickly became predominant 45 .The D614Gbearing variant has quickly swept out the original strain because the D614G mutation increases viral infectivity, fitness, and inter-individual transmissibility 37 .
Another mutation, P681R, is harbored further downstream in the S protein in the same receptor-binding domain.The P681R mutation in the furin cleavage site is known to enhance the basicity of polybasic stretching and the likely facilitation of additional contacts with furin for S1-S2 cleavage 46 .This could help with an increased rate of membrane fusion, internalization and thus better transmissibility.In fact, Mlcochova et al. 47 showed that B.1.617.1 and B.1.617.2 spike proteins mediated higher fusion activity and syncytium formation than WT, probably mediated by P681R.The P681R mutation apparently caused a small increase in proteolytic processing that could affect infectivity 48 .Reversal of the P681R mutation to wild-type P681 significantly reduces replication of the delta variant to a level below that of the alpha variant 49 .
Unfortunately, few papers evaluating PPI between mutant spike protein and ACE2 have examined the viral genomic variants prevalent in the Bangladeshi population.Through mutagenesis studies on spike proteins, Yi et al., reported that mutations in residues N501, Q498, E484, T470, K452, and R439 dramatically reduced binding affinity to ACE2 [https:// doi.org/ 10. 1038/ s41423-020-0458-z].Zou et al., performed molecular dynamics simulation along with alanine scanning analysis and found that the relative free energy of binding of the ACE2/ SARS complex changed significantly when a mutation was present in spike residues R426, L443, Y484, and T487 [https:// doi.org/ 10. 1021/ acs.jcim.0c006 79].Rawat et al., examined the conserved residues in the spike protein by in silico analysis and showed that the conserved glycine and tyrosine residues in SARS-2 (G502 and Tyr449) are important for both stabilization of the spike protein and its interaction with ACE2 [https:// doi.org/ 10. 1002/ prot.26024].Our study offers significant insights into the genetic epidemiology of SARS-CoV-2 in Bangladesh, focusing on the identification of dominant variants and associated mutations, with implications for surveillance, diagnostic strategies, and therapeutic interventions.Continuous genomic surveillance, underscored by our identification of the D614G and P681R mutations in the spike protein, is crucial for monitoring viral evolution and managing the severity of infections 51 .Future diagnostic strategies could leverage these specific mutations to develop novel tools for accurate tracking and early intervention.In terms of therapeutic interventions, these mutations, which potentially enhance the binding affinity between the spike protein and the ACE2 receptor, could guide the design of treatments that target these mutations and inform vaccine development to ensure effectiveness against prevalent variants.Future research should explore the impact of these mutations on the virus's transmissibility, disease severity, and their potential effect on the effectiveness of current vaccines and treatments, which will be pivotal in developing strategies to combat COVID-19.

Conclusion
The presence of the highly conserved mutation P681R in four out of seventeen Bangladeshi samples of SARS-CoV-2 indicates an enhancement in viral fusion efficiency and an accelerated action speed.In this study, we utilized the Dynamut web server, an integrated computational method, for stability prediction.Dynamut enables the simulation of protein dynamics and the prediction of protein stability, providing a consensus approach for accurate and reliable prediction of both stabilizing and destabilizing mutations at positions 614 and 681.Although the exact impact of the P681R mutation and its relationship with COVID-19 severity and unusual symptoms caused by SARS-CoV-2 infection are yet to be fully understood, detailed investigations on this mutation are planned.Further research will contribute to improved therapeutic targeting and the development of in silico vaccine designs.Due to the potential association of the P681R mutation in the spike protein with viral infection severity or unusual outcomes, it is crucial to conduct intensive analysis of SARS-CoV-2 variants harboring this mutation.
This study presents the first documented cases of the P681R mutation in Bangladesh, specifically in the isolates BCSIR-NILMRC-393, BCSIR-NILMRC-393, BCSIR-NILMRC-422, and BCSIR-NILMRC-424.Further sequencing efforts, accompanied by comprehensive annotation, will provide deeper insights into the spike protein mutation of SARS-CoV-2 in humans.These variations may contribute to the diversification of the virus and the subsequent emergence of variants, strains, serotypes, and antibody escape mutants 52 .Moreover, these mutations could aid the virus in adapting better to the host environment and expanding its tissue tropism.The increased affinity between the spike protein of SARS-CoV-2 carrying the P681R mutation and the human ACE2 receptor is attributed to strong electrostatic forces involving the ARG488 residue with Glu35 and Asp38, as well as interactions between His500 and GLU37.
The submitting research article "Spike Protein Mutations and Structural Insights of Pangolin Lineage B.1.1.25:Implications for Viral Pathogenicity and ACE2 Binding Affinity" for publication in your journal of repute, is a unique article and nobody did it earlier.Consents were taken from all the participates for this publication.Preprint.A preprint of the manuscript has previously been published in Research Square, https:// doi.org/ 10. 21203/ rs.3.rs-14800 75/ v1.

Figure 1 .
Figure 1.(i) Mutation analysis of 17 isolates from Bangladeshi patients.MSA of Spike protein sequence of Bangladeshi isolates with sequences obtained from Wuhan.Sites of mutation are shown in Red.(ii) Effect of mutation at positions 614 and 681 on the secondary structure of Spike protein in an isolate from Bangladeshi patient, highlighting the area around the residue 614 and 681.

Figure 2 .
Figure 2. (A) The best models generated by the tertiary structure refinement step of the mutant spike proteins.(B) 3D conformation, stereochemical, and structural parameters of S_D614G and S_P681R proteins after molecular modeling and refinement.

Figure 3 .
Figure 3. Validation of the final S_D614G protein model.(A) ProSA-web showing the Z-score.(B) Quality factor and quality score by ERRAT and Verify3D, respectively.(C) Information about the global and residual qualities of the protein generated by the Swiss Model's Structure Assessment tool.

Figure 4 .Figure 5 .
Figure 4. Validation of the final S_P681R protein model.(A) ProSA-web demonstrating the Z-score; (B) quality factor and quality score by ERRAT and Verify3D, respectively; (C) information about the global and residual qualities of the protein generated by the Swiss Model's Structure Assessment tool.

Figure 6 .
Figure 6.(a) Interatomic interaction of D614G in isolate BCSIR-NILMRC-422.Wild-type and mutant residues are colored in light-green and are also represented as sticks alongside the surrounding residues which are involved in any type of interaction.(b) Interatomic interaction of mutation P681R in isolates BCSIR-NILMRC-424.Wild-type and mutant residues are colored in light-green and are also represented as sticks alongside the surrounding residues which are involved in any type of interaction.

Figure 7 .
Figure 7. Visual analysis of fluctuations and deformation energies of mutation D614G in BCSIR-NILMRC-422.The magnitude of (a) atomic fluctuation and (b) deformation has been shown using thin to thick lines colored blue (low), white (moderate), and red (high).

Figure 8 .
Figure 8. Visual analysis of fluctuations and deformation energies of mutation P681R in BCSIR-NILMRC-424.The magnitude of (a) atomic fluctuation and (b) deformation has been shown using thin to thick lines colored blue (low), white (moderate), and red (high).

Figure 9 .
Figure 9.The best 3D docking model and energetic aspects of the S_D614G-ACE2 and S_P681R-ACE2 complexes (superimposed) according to the QM/MM refinement protocol.

Table 3 .
Amino acid substitutions in all 17 isolates of SARS-CoV2 viruses isolated in Bangladesh.