The outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes coronavirus disease 2019 (COVID-19), has posed a tremendous threat to global health, resulting in over 510 million infections and at least 6.23 million deaths worldwide as of early May 2022. Extensive efforts to fight against the SARS-CoV-2 pandemic have accelerated the research and clinical development of various vaccines and neutralizing antibodies (NAbs) [1, 2]. However, SARS-CoV-2 has an exceptional ability to mutate and evade the human immune system, which highlights the surveillance of SARS-CoV-2 evolution to anticipate and facilitate a timely response to new variants that could impact viral transmissibility and the efficacy of vaccines and antibody therapies [3].

The mature SARS-CoV-2 spike trimer comprises two functional subunits, S1 and S2. The S1 subunit mediates attachment using its receptor-binding domain (RBD) to interact with the host receptor angiotensin-converting enzyme 2 (ACE2), whereas the S2 subunit is responsible for the virus–host cell membrane fusion process [4]. RBD is the immunodominant spike domain and thus is the primary target of NAbs and vaccines, including several promising RBD-based subunit vaccines [2]. The receptor-binding motif (RBM) in the RBD, which is highly variable, forms the contact surface between the spike protein and ACE2. Mutations in the RBD, particularly in the main functional RBM, show an immune escape or partial vaccine escape of circulating SARS-CoV-2 variants. In addition, RBD variants can also promote viral entry and infectivity through different mechanisms, such as increasing the binding affinity to the ACE2 receptor, thereby increasing the number of spike proteins on the viral surface or altering the overall spike conformation to favor fusion-competent states [3]. Therefore, it is vital to understand the complex rules that govern RBD evolution.

At the early stage of the COVID-19 outbreak, we conducted an in-depth study on the variable parts of the RBMs from SARS-CoV and SARS-CoV-2 that affect receptor recognition, RBD immunogenicity, and antibody neutralization [4]. SARS-CoV-2 is phylogenetically closely related to SARS-CoV, with up to 76% amino acid concordance in the RBD. Despite only 47.8% identity in the RBM region, both viruses could bind ACE2 with high affinity. Compared with SARS-CoV, the ACE2-binding pattern of SARS-CoV-2 changed substantially by either affinity-enhancing substitutions or affinity-decreasing substitutions within the RBM, revealing a good example of the functional compensatory evolution mechanism. Furthermore, we observed that the variabilities in the key residues in the RBM also resulted in significant antigenic changes in the RBD, which caused the low cross-neutralization capacity of immune serum and NAbs against SARS-CoV-2 and SARS-CoV. Thus, our study suggests that the high flexibility and plasticity of the RBM region might facilitate SARS-CoV-2 evasion of immune monitoring.

Remarkably, our data predicted six substitutions at the SARS-CoV-2 RBD positions N439, L452, T470, E484, Q498, and N501, which were reverted to the corresponding amino acid of SARS-CoV RBD, resulting in the acquisition of enhanced binding affinity to hACE2. This finding provides clues for monitoring the increased infectibility of SARS-CoV-2 variants [4]. In addition, some of the six residues are also the key epitopes recognized by NAbs isolated from COVID-19 convalescents [5, 6]. As expected, RBD mutations at these positions have already appeared among SARS-CoV-2 isolates (Fig. 1A).

Fig. 1
figure 1

Landscape of amino acid mutations within the RBD that characterize variants of SARS-CoV-2. A Timeline of the SARS-CoV-2 pandemic and the emergence of the variants. B Alignment of RBM residues among sarbecoviruses and SARS-CoV-2 variants. Key residues critical for binding by antibodies and ACE2 are shown. SARS-CoV-2 genome sequences (n = 10,533,330) retrieved from GISAID and GenBank on April 24, 2022, were used to annotate variants of the spike glycoprotein. A total of 3272 mutations have been identified in the S gene of SARS-CoV-2 isolated from humans (CNCB-NGDC; GISAID). The substitutions of the residues within RBM were analyzed by the 2019 Novel Coronavirus Resource (2019nCoVR) released by the China National Center for Bioinformation (https://ngdc.cncb.ac.cn/ncov/variation/spike). Dots (▪) indicate identity to SARS-CoV-2 consensus residues, while dashes (–) indicate gaps in the alignment. The amino acid changes relevant to the SARS-CoV-2 Wuhan-Hu-1 reference strain are shown in red font. The amino acid changes relevant to the SARS-CoV-2 Wuhan-Hu-1 reference strain are shown in red font. The six key amino acid residues (N439, L452, T470, E484, 498, N501) of which substitutions were reverted to the corresponding amino acid of SARS-CoV RBD with enhancing ACE2-binding affinity are highlighted in red frame [4]. The key residues within RBMs from SARS-CoV and SARS-CoV-2 that are involved in ACE2 interaction by structural analysis and functional studies are highlighted in yellow [4, 20]. The key residues for SARS-CoV-2 NAbs recognition are highlighted in green, while the key residues for SARS-CoV NAbs recognition are highlighted in blue [1, 4, 5]

The RBM mutation N439K was noted to increase in frequency in Scotland early in March 2020. It enhances the binding affinity for ACE2 by introducing a new salt bridge, mimicking SARS-CoV RBD R426. N439K also confers resistance to some NAbs and some polyclonal antisera [7]. Another RBM amino acid substitution, N501Y, which is associated with increased ACE2-binding affinity [5, 8], has received considerable attention following its identification in variants of concern (VOCs), including B.1.1.7 (Alpha), B.1.351 (Beta), P.1 (Gamma) and the recent emerging B.1.1.529 (Omicron) (Fig. 1). The N501T substitution, which was replaced by the corresponding residues in SARS-CoV, was detected in Italy in August 2020, highlighting that this critical residue has evolved since then [9]. The N501Y is likely to have stronger hydrophobic interactions with ACE2 than N501T, which may explain its higher circulating frequency. However, residue N501 is less involved in antibody recognition; thus, the effect of N501Y alone on antibody neutralization is relatively limited [5, 8].

Mutations at site E484 should also be emphasized, as they are widely present in multiple lineages. E484 has been identified as an immunodominant spike protein residue, with various substitutions [3], including E484K, E484Q, and E484A. The E484K mutation is present in the emerging B.1.351, P.1, and B.1.526, while E484Q is present in B.1.617.1 and E484A is found in the B.1.1.529 lineage (Fig. 1A, B). E484K and E484A are associated with substantial antibody neutralization resistance, while E484Q showed a mild impact [5]. Structural analysis suggested that the E484K mutation may enhance ACE2-binding activity by increasing electrostatic complementarity between ACE2 and the RBD [10].

Circulating SARS-CoV-2 variants with substitutions L452R and L452Q in RBM have received attention due to the altered antigenic features and enhanced infectivity. The B.1.427 and B.1.429 lineages carrying L452R were predominant, particularly in California, USA, between December 2020 and February 2021. In addition, the L452R mutation is a hallmark of B.1.617.1 (Kappa) and B.1.617.2 (Delta). Our research and studies by others have demonstrated that L452R, L452Q, and L452K (reverted to SARS-CoV) could improve ACE2-binding affinity by electrostatic complementarity [5, 11, 12]. Furthermore, two studies suggest that L452R and L452Q substitutions not only reduce neutralization by several mAbs and convalescent plasma but also evade HLA-A24-restricted cellular immunity [11, 12]. Notably, the L452Q mutation first identified in C.37 (Lambda) has appeared in a more transmissible Omicron subvariant, the BA.2.12.1 sublineage, whereas the L452R mutation is found in Omicron’s two new subvariants, BA.4 and BA.5, emerged in South Africa. A recent study suggested that Omicron-L452R may enhance viral fusion-genicity and infectivity, raising the concern that the Omicron-L452 variant should be closely monitored [13].

The substitution Q498R emerged in late 2021 and is present in the heavily mutated Omicron variant. Q498R together with other mutations enhances ACE2-binding affinity [14]. In addition, the Q498R mutation has little impact on the binding activity of NAbs against the SARS-CoV-2 RBD, although the corresponding position Y487 of the SARS-CoV RBD is the key epitope of some NAbs, such as 80R and m396 [4, 14]. Substitutions of the T470 position circulate at relatively low levels according to the sequencing data, and should be further monitored (Fig. 1B). Importantly, the experimental characterization of the key residues for ACE2 and NAb binding will continue to provide useful information for tracking mutations that could facilitate immune escape or increased virulence.

To bypass the established immune barrier, a mutant virus must obtain a “new characteristic” different from the previous one and should preserve transmissibility and evolutionary fitness. The biological rules are akin to jigsaw puzzle games in that although several parts change their appearance, the puzzle preserves its overall shape and function. To maintain the overall structural conformation and stability of RBD, there are numerous evolutionarily conserved positions at RBD among sarbecoviruses—the viral subgenus containing SARS-CoV, SARS-CoV-2, and some zoonotic-origin viruses (Fig. 1B). Most of these positions are in the non-RBM region, with a small portion in the RBM region. In our previous study, we identified at least 38 out of 190 RBD positions that are evolutionarily conserved and critical for the proper folding of RBD [5]. Therefore, these conserved positions may have low mutational tolerance, which is analogous to a constant framework within the “shape” or the “structure” of the RBD. In addition, multiple studies have investigated the antigenicity of the SARS-CoV-2 spike protein by epitope mapping of different RBD-specific NAbs. Structural analysis and global alanine scanning have defined the binding determinants of RBD-specific NAbs and categorized antigenic sites into four classes [5, 6]. Immunodominant site 1 and site 2 within the RBM overlap with the ACE2 contact surface and are targeted by highly potent ACE2-blocking NAbs. Sites 3 and 4 are outside the conserved RBM region and are less immunodominant.

Based on the sequence alignment of RBD from different sarbecoviruses and the epidemiological data of SARS-CoV-2 variants (Fig. 1B), it can be concluded that (i) For RBM, the highly variable parts among SARS-CoV-2 variants are prone to occur at the evolutionarily less-conserved positions among sarbecoviruses, indicating that these positions are structurally flexible and malleable. (ii) With the emergence of tremendous quantities of mutants, only a small minority of amino acid mutation sites are present in growing VOCs in a way that confers a fitness advantage. These fitness-enhancing mutations are either associated with major immune resistance, such as substitutions of K417, E484, F490, L452, R343, G446, and N440, or associated with increased ACE2-binding affinity, such as S477N, N501Y, and Q498R [3, 5, 14]. (iii) The resistance of a NAb is conferred by the change in biophysical properties (hydrophobicity, polarity, charge, and volume) of epitope residues within the RBM. However, the amino acid usage of a particular mutation site has been restricted to limited types, resulting in the delicate balance between the capacity of immune escape and maintaining a functional RBD. This hypothesis could explain the observation that the same RBM mutation pattern of a site is present in different lineages. (iv) SARS-CoV-2 VOCs shrewdly adopt a comutations strategy to balance antigenic escape and viral infectivity. Regardless of how SARS-CoV-2 is mutated to escape human immunity, most of the VOCs maintained or enhanced the affinity for ACE2 to ensure its transmission activity.

A range of SARS-CoV-2 variants has emerged across the world since the onset of the COVID-19 pandemic. In particular, the emergence of a new SARS-CoV-2 variant named Omicron with pathogenicity, antigenicity, and host range clearly distinguished it from prior strains and has become dominant around the world. Among the 37 amino acid changes in the initial Omicron variant (BA.1) spike, 15 are in the RBD, while previous VOCs had only 1–3 changes in the RBD. Omicron shares several of the mutations with prior VOCs and carries unique mutations both in RBM and non-RBM regions, indicating complicated selective pressures in shaping its evolution [14]. In particular, the substitutions of S371-T376 in the non-RBM region also result in resistance to multiple NAbs by allosteric structural effects or altering the epitope [14]. Along with B-cell immunity, recent studies highlight the importance of cross-reactive T cells to hold up against Omicron [15]. However, the evolution of Omicron will continue to overcome preexisting immunity and preserve viral fitness. Alarmingly, Omicron binds ACE2 from a wider host range than prior variants, including mice and domestic poultry, which may accelerate viral evolution [16]. Nevertheless, there are still a small number of broadly cross-variant NAbs targeting evolutionarily conserved non-RBM RBD epitopes or recognizing conserved receptor-binding residues within the RBD [14, 17]. Future work is warranted to probe the antigenic features of RBD by epitope mapping of Omicron-elicited antibodies induced by natural infection or vaccination, which may provide deep insight into Omicron evolution. Furthermore, we and others also identified pan-coronavirus or pan-β-coronavirus NAbs directed toward the highly conserved S2 subunit, which raises the possibility of developing antibody therapies and universal vaccines against coronaviruses [18, 19].

Altogether, working out the “jigsaw puzzle” of SARS-CoV-2 evolution and immune escape will shed light on the development of effective broad-spectrum antibody drugs, entry inhibitors, and vaccines to fight against highly variable SARS-CoV-2.