Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has led to more than 274 million infections and over 5.35 million deaths by December 20, 2021 [1,2,3,4,5]. SARS-CoV-2, an enveloped, single-stranded positive-sense RNA virus, belongs to beta-coronavirus and is related to two highly pathogenic coronaviruses, SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV), which have caused over 8000 and 2500 confirmed cases, with ~10% and 35% fatality rates, respectively [5,6,7]. Compared with SARS‐CoV and MERS‐CoV, SARS‐CoV‐2 is spreading more rapidly and causes much higher number of deaths. This emerging virus has promoted massive hospitalizations, lockouts, financial loss, unemployment, the closure of schools in nearly all countries [8].

The SARS-CoV-2 genome is a non-segmented large positive-sense stranded RNA with a length of about 30 kb. It contains a 5′-cap structure and a 3′-poly-A tail [7, 9]. The viral genome encodes 29 proteins, including 25 putative non-structural and accessory proteins and four structural proteins [10]. Non-structural proteins (NSPs) play crucial roles in viral RNA replication and immune evasion, while accessory proteins carry out multiple functions that aid with viral infection, survival, and transmission in the host cells [8, 11, 12]. Structural proteins are responsible for viral assembly and make up the mature viral particles. Elucidating the structure and function of these SARS-CoV-2 encoded proteins will deepen the understanding of the viral infection cycle and offer a new opportunity to develop effective vaccines and drugs to combat this global pandemic.

Not long after the first SARS-CoV-2 genome sequence was published [9], Rao et al. deposited the first structure of SARS-CoV-2, Main protease (Mpro), into Protein Data Bank [13]. With the tireless efforts of scientists worldwide, structures of SARS-CoV-2 proteins have burst and reached 1250 in the current moment. These structures cover more than 90% of the SARS-CoV-2 coding amino acids. The rapidly expanding repertoire of SARS-CoV-2 structures has provided new insights into the viral life cycle and facilitates drug and vaccine development.

In this review, we provide an overview of the SARS-CoV-2 genome and its coded proteins. We then describe the structures of SARS-CoV-2 proteins and mainly focus on RNA-dependent RNA polymerase and S protein. We provide insights into in situ structures of SARS-CoV-2 that have been solved by cryo-electron tomography (cryo-ET). Finally, we discuss the remaining challenges and future perspectives on the structural biology of SARS-CoV-2 and highlight the recent success of the development of anti-COVID-19 drugs.

The genome of the SARS-CoV-2 and its coded proteins

The 5′-proximal two-thirds of the coronavirus genome contains the replicase gene, which encodes two open reading frames, ORF1a and ORF1b (Fig. 1a) [10]. The other one-third of the genome at the 3′-end encodes several ORFs. Among them, four ORFs encode coronavirus structural proteins, which are spike glycoprotein (S), membrane (M), envelope (E), and nucleocapsid (N) proteins, where S, E, and M present on virion membrane surfaces, with the N protein is involved in the binding and packing of the RNA genome. Other ORFs encode several accessory proteins [14, 15] (Fig. 1a), with their functions less well-defined.

Fig. 1: The genome of SARS-CoV-2 and its coded proteins.
figure 1

a The organization of SARS-CoV-2 genome. b Schematic illustrations of the secondary structure of the frameshift stimulation element −1 programmed ribosomal frameshifting, with different functional regions labeled and colored accordingly. c The rectangle depicts the nsps derived from processing of the pp1a and pp1ab polyproteins. Labels indicate protein names. The position of blue arrow indicates cleavage site of PLpro, and position of red arrow indicates the site of Mpro. d Domain architectures of SARS-CoV-2 genome coded proteins and summary of structural characterization of individual proteins. Bars above domain architectures indicate regions of the proteins for which high-resolution structures area are available. Ubl ubiquitin-like domain, MD macrodomain, SUD SARS unique domain, PLpro papain-like protease, NAB nucleic acid-binding domain, TM transmembrane domain, Y Y region, NTD N-terminal domain, CTD C-terminal domain.

ORF1a translation yields a replicase polyprotein 1a (ppla), and a −1 ribosomal frameshift at the 3′-end of ORF1a, which facilitates the translation of ORF1b to produce the replicase polyprotein 1ab (pp1ab) [16,17,18]. The −1 programmed ribosomal frameshifting mechanism is stimulated by a three-stem pseudoknot at the 3′ of slippery sequence -UUUAAAC- named the frameshift stimulation element (FSE) (Fig. 1b) [19]. When the frameshift does not occur, the stop codon at the stem 1 induces translation termination (Fig. 1b). Conversely, when the −1 ribosomal frameshift occurs, translation termination does not occur, which results in the generation of a protein with additional ~2700 amino acids. The ribosomal frameshifting occurs at a frequency of 0.25–0.75 at this site [19]. The structures of FSE alone and FSE complexed with ribosomes were recently reported [19, 20].

In SARS-CoV-2, the polyprotein pp1a is proteolytically cleaved into 11 functional NSPs, whereas pp1ab is cleaved into 15 NSPs (Fig. 1c). Each NSP plays specific or multifaceted roles in the viral life cycle. Nsp1 interacts with the host ribosome to inhibit the synthesis of host proteins. Nsp2 may be involved in coupling viral transcription with translation by interacting with both ribosome and replication-transcription complexes (RTC) [21]. The papain-like protease (PLpro) from nsp3 cleaves at three sites forming mature nsps 1–3, while the nsp5, also named main protease (Mpro), cleaves pp1a and pp1ab at 11 sites, releasing the mature nsps 4–16 [22, 23] (Fig. 1c). Considering their vital roles in NSPs maturation, Mpro and PLpro have been considered attractive therapeutic targets for COVID-19 [24, 25]. In addition to the PLpro domain, nsp3 contains several other domains: ubiquitin-like domain, macrodomain (also named “X domain”), nucleic acid-binding domain (NAB), SARS coronavirus-unique domain, transmembrane domain and the Y1-3 domain [26]. The transmembrane domain of nsp3 together with nsp4 and nsp6 plays a key role in rearranging endoplasmic reticulum (ER) membranes, leading to curvature of the ER membrane, which is adapted to form double-membrane vesicles (DMVs) and is essential for virus replication [27]. Nsp7 and nsp8 are assembled with nsp12 to enhance the RNA polymerase activity.

Nsp7-nsp16 are involved in the formation of a large RTC to regulate the replication and transcription of SARS-CoV-2. Nsp9 inhibits the nucleotidyltransferase activity of nsp12 [28]; nsp10 is a cofactor to nsp14 and nsp16 [29]; nsp12 functions as a RNA-dependent RNA polymerase and nucleotidyltransferase [30]; nsp13 acts as a helicase; nsp14 is exoribonuclease and N7-guaninemethyltransferase [31]; nsp15 has uridine-specific endoribonuclease activity [32]; nsp16 has 2′-O-methyltransferase activity that mediates mRNA cap process [33].

Among four structural proteins, S, M and E are located on the viral membrane. S plays an essential role in the host receptor-binding and membrane fusion [34, 35]. M protein is associated with N protein and other viral structural proteins to facilitate the viral assembly and is involved in the pathogenesis process [36]. E protein forms an ion channel, which promotes virus assembly and pathogenesis [37, 38]. N protein participates in viral genome RNA packaging and promotes virion assembly [39, 40].

Among nine accessory proteins, ORF3a, ORF7a and ORF7b are transmembrane proteins. ORF3a forms a homo-dimer, with its transmembrane domain forming an ion channel in the host cell membrane. It participates in virus release, pathogenesis and host cell apoptosis induced by the virus [41]. As a type I transmembrane protein, ORF7a is an immunomodulatory factor for immune cell binding and is involved in dramatic inflammatory responses. ORF7b is a small, integrative membrane protein found in the host cell Golgi, which may increase the virulence of SARS-CoV-2 [42]. It is coded by the same mRNA as ORF7a and expressed according to the leaky-scanning mechanism of ribosomes [42]. ORF8 is one of the newest genes and has low homology with SARS-CoV due to deletion [43]. This protein interacts with major histocompatibility complexes I, thus modulating their degradation in cell cultures, thereby contributing to immune evasion [44]. ORF9b mediates interferon response suppression by inhibiting the interaction between Hsp90 and TOM70 [45]. ORF10 is predicted to encode a small protein. Pancer et al. suggest that this protein is not essential for the viral life cycle in humans [46]. To date, only the structures of ORF3b, ORF6, ORF9c and ORF10 remain unsolved (Fig. 1d).

The structures of RNA-dependent RNA polymerase and replication-transcription complex

Nsp12 functions as an RNA-dependent RNA polymerase and is a core component of the replication-transcription machinery [47]. Auxiliary factors nsp7 and nsp8 are essential for enhancing the binding of nsp12 to the RNA template and its enzymatic activity [48]. The association of the nsp7–nsp8–nsp12 and template-primer RNA constitutes the minimum components of a functional holo-RdRp, which is the main component of the SARS-CoV-2 RTC [49]. Several structures of holo-RdRp or holo-RdRp complexed with inhibitors have been determined by cryo-electron microscopy (cryo-EM) [30, 47, 50,51,52,53]. These structures provide insights into the molecular architecture of the holo-RdRp components. The nsp12 subunit mainly contains several highly conserved domains in coronavirus, including a nidovirus-specific extension domain (NiRAN), a C-terminal polymerase domain and an interface domain in the middle (Fig. 2a). The β-hairpin (residues V31 to K50), a unique conserved domain located upstream of the NiRAN domain, is sandwiched by the NiRAN and palm subdomain in the polymerase domain and stabilizes the overall structure (Fig. 2a). Yan et al. disclosed that NiRAN catalyzes the formation of cap core structure (GpppA) [28]. The C-terminal polymerase core domain of nsp12 adopts a classic cupped right-handed conformation, consisting of the finger, palm, and thumb subdomains and several conserved structural motifs. The SARS-CoV-2 nsp7-nsp8 heterodimer shares significant structural similarity to that of SARS-CoV. The nsp7-nsp8 heterodimer binds above the thumb subdomain and sandwiches the extended finger loops to stabilize nsp12 (Fig. 2a). Nsp7 contributes to most of the binding of the heterodimer to nsp12, while nsp8 sparsely contacts nsp12. The second nsp8 (nsp8-2) attaches to the top of the finger subdomain and forms additional interactions with the interface domain. The active sites of SARS-CoV-2 RdRp and RNA recognition have been summarized and discussed in detail in the previous review [29].

Fig. 2: Structures of the SARS-CoV-2 holo-RdRp and RTX.
figure 2

a The schematic diagram for the domain organization of the holo-RdRp, containing nsp12, nsp7, nsp8-1 and nsp8-2 (PDB: 7C2K). Two views of the cartoon model of the cryo-EM structure of holo-RdRp. The subdomains and components of the holo-RdRp are colored as follows: β-hairpin, chocolate; NiRAN, light orange; Interface, light magenta; Fingers, bright orange; Palm, lime; Thumb, pale cyan; nsp7, marine; nsp8-1, pink; nsp8- 2, fire brick; template RNA, red; product RNA, deepteal. b Zoom in views of holo-RdRp bound to inhibitors, Favipiravir at +1 (PDB: 7AAP, 7CTT), Remedisivir at +1 (PDB: 7BV2), Remedisivir at −1 (PDB: 7C2K), Remedisivir at −3 (PDB: 7B3B), Remedisivir at +1, −1, −2, −3 (PDB: 7L1F), Suramin (PDB: 7D4F), Molnupiravir at template RNA, the four inhibitors are indicated in different colors. Proteins and RNA are shown in cartoon representation, and inhibitors are shown as sticks. Nsp12 is shown in transparent gray. Ligand atom color code: O atoms, red; N atoms, blue; P atoms, salmon; Mg2+ ion, green. 2D structures of five drugs are presented. c The schematic diagram for the domain organization of the RTC, containing nsp12, nsp7, nsp8-1, nsp8-2, nsp9, nsp13-1 and nsp13-2 (PDB: 7CYQ). Two views of the cartoon model of the cryo-EM structure of RTC. The subdomains and components of the holo-RdRp are colored as follows: the subdomains and components of the RTC are colored as follows: β-hairpin, chocolate; NiRAN, light orange; Interface, light magenta; Fingers, bright orange; Palm, lime; Thumb, pale cyan; nsp7, marine; nsp8-1, pink; nsp8-2, fire brick; nsp9, purple blue; nsp13-1, wheat; nsp13-2, dark salmon; template RNA, red; product RNA, deep-teal.

To date, several antiviral drugs such as Remdesivir and Favipiravir that target the viral RdRp have been repurposed for COVID-19 therapy [54,55,56,57,58,59] (Fig. 2b). In April 2020, the first cryo-EM structure of holo-RdRp in complexed with Remdesivir was reported. Biochemical studies showed that 1 mM Remdesivir triphosphate (RTP) displayed a complete inhibition of RdRp polymerization activity in the presence of 10 mM ATP, which mimicked the physiological concentrations of ATP. While 100 μM RTP displayed a delayed chain termination mechanism [50]. Kinetic studies on RdRp demonstrated that the polymerase can still add 2 or 3 nt after incorporating Remdesivir monophosphate (RDV-MP) [60, 61]. This delayed chain termination model was further supported by recently solved structures of the SARS-CoV-2 polymerase-RNA complex in its catalytic states incorporated with RDV-MP [62, 63]. Upon incorporation of Remdesivir, RDV-MP moved to position −4 of the RNA product strand results in a steric clash between the C1 cyano group of RDV-MP and the side chain of nsp12 Ser861 (Fig. 2b), which may cause the delayed chain termination of Remdesivir on RdRp. Because Remdesivir is intravenously administered nucleotide prodrug that is inconvenient to patients, VV116, an orally available Remdesivir derivative, was developed and showed excellent efficacy in inhibiting SARS-CoV-2 replication in cell-based and animal models [64] (Fig. 2b).

Favipiravir, an orally administered nucleoside analog, also showed a potential therapeutic effect on COVID-19 [65]. Unlike Remdesivir, Favipiravir showed a poor RNA replication termination effect in the presence of natural nucleotides [66]. It mainly displays as a mutagenic agent toward viral genomic RNA in vivo [67, 68], different from Remdesivir, which impairs the elongation of RNA products. The mechanisms of binding of Favipiravir to RdRp and the low efficiency of incorporation of Favipiravir-RTP are revealed by the cryo-EM structure of the Favipiravir-bound SARS-CoV-2 RdRp-RNA complex (Fig. 2b) [69]. A recent study reported similar findings that Favipiravir-RTP complexed with SARS-CoV-2 polymerase in the pre-catalytic state [70].

Molnupiravir, a novel RdRp inhibitor, has recently received much attention, as it showed very promising phase III study results []. Patients with mild or moderate COVID-19 had a reduced risk of hospitalization or death by about 50% compared with placebo in a positive interim analysis of the oral antiviral drug Molnupiravir in a phase III study [71]. When Molnupiravir was incorporated into RNA template by the RdRp, Molnupiravir directs incorporation of A or G, resulting in mutation of the viral RNA genome [59]. Recently, the first structure of SARS-CoV-2 RdRp complexed with the non-nucleotide inhibitor, Suramin, was reported [30]. The RdRp harbors two independent Suramin binding sites. One suramin molecule binds to the template RNA binding site, directly blocking the substrate binding. The other one binds to the site near the catalytic site, thus blocking RNA primer binding by steric hindrance (Fig. 2b). This unique binding pattern clarifies that Suramin is at least 20-fold potent than the triphosphate form of Remdesivir (RDV-TP) in polymerase activity inhibition assays in vitro.

Promisingly, during the revision of this manuscript, Molnupiravir (Lagevrio) was approved by the Medicines and Healthcare products Regulatory Agency (MHRA) and FDA for the treatment of established infections of COVID-19 [72]. Benefiting from recent progress in structural biology, the precise interaction between drugs or drug candidates and holo-RdRp has been elucidated, and the mechanism of these drugs block virus replication by targeting RdRp has been revealed. The structural information will greatly promote drug design through the structure-based approach and facilitate the development of the next generation of RdRp inhibitors.

Holo-RdRp is located in the central of the SARS-CoV-2 RTC [29]. In addition, a number of conserved accessory factors, which are thought to coordinate with Holo-RdRp, are also required during the viral life cycle [18]. One of these accessory proteins, nsp13, acts as an RNA helicase to unwind the RNA double strand. Chen et al. reported the cryo-EM structure of two molecules of nsp13 in complex with nsp7-nsp8-nsp12 containing the RNA template products [73]. One of those interacted with nsp8-1 named nsp13-1, the other named nsp13-2. The nsp13-1-ZBD interacts with nsp12 thumb domain and nsp8-1. Meanwhile, nsp13-1 RecA1 domain abuts upon the nsp7 and the nsp8-1 head (Fig. 2c), while nsp13-2 binds less tightly in the complex than does nsp13-1. Aside from the interactions between the nsp13.2-ZBD and nsp8-2, nsp13-2 only interacts with nsp13-1. The RdRp translocates in the 3′ → 5′ direction on the template RNA strand. Meanwhile, nsp13 translocates on this strand in the 5′ → 3′direction direction. Integrating structural analysis, unbiased molecular dynamics simulations and RNA–protein cross-linking studies, Malone et al. proposed the backtracking mechanism [74]. The nsp13 facilitates reverse RNA translocation, resulting in a single-stranded 3′ segment of the product RNA extruding through the NTP entry tunnel. This backtracking manner is similar to cellular DNA-dependent RNA polymerases. Thus, nsp13 may play an important role in enhancing the fidelity of RdRp [75]. More recently, Yan et al. have presented a cryo-EM structure of the SARS-CoV-2 RTC that includes an nsp9 [28]. The structure reveals that nsp9 binds tightly to the catalytic center of nsp12 NiRAN and inhibits the GTPase activity of nsp12 NiRAN (Fig. 2c). How the complete RTC executes a series of complicated and fine processes in genome replication, such as the mRNA 7MeGpppA2′OMe cap process and the mismatch repair process, remains to be addressed by continuous and in-depth structural studies.

The structures of S protein

The SARS-CoV-2 S protein belongs to the type I viral fusion protein and mainly comprises two functional subunits, S1 and S2 [76]. The S1 subunit is mainly responsible for receptor-binding, and the S2 subunit mediates membrane fusion following proteolytic activation [77,78,79]. The S1 subunit can be further subdivided into the N-terminal domain (S1-NTD), two subdomains (SD1 and SD2) and C-terminal domain (S1-CTD), which is also known as the receptor-binding domain (RBD) [79] (Fig. 3a). The SARS-CoV-2 S RBD is responsible for recognizing host receptor ACE2 and contains two subdomains, a receptor-binding motif (RBM) and a core structure (Fig. 3a). The S2 subunit consists of multi-structural components, including hydrophobic fusion peptide (FP), heptad repeats (HR1 and HR2), transmembrane domain (TM). Unlike SARS-CoV and other beta-CoVs in subtype B, SARS-CoV-2 S protein contains a furin cleavage site between S1 and S2 subunits [80] (Fig. 3a). In host cells, S protein is proteolytically cleaved at the PRRAR motif (residues 681–685) by furin into S1 and S2 to form a S1/S2 heterodimer, which was further assembled into the final trimeric spike complex [76, 81]. The S2 subunit contains an S’ cleavage site, which is cleaved by host proteases like transmembrane protease serine 2 (TMPRSS2) to expose the fusion peptide for membrane fusion [79]. To date, numerous SARS-CoV-2 S protein structures in both prefusion and post-fusion forms have been reported [77, 78, 80, 82,83,84]. In the prefusion state, the SARS-CoV-2 trimeric S protein primarily adopts two conformations, a close form with all RBD domains in “down” conformation and an open form with one or multiple RBDs in the “up” conformation (Fig. 3b, c). In a closed form, the receptor-binding site was enshrouded by S1-NTD, thereby impeding the binding of the receptor ACE2, whereas, in the open conformation, one or multiple RBDs expose the receptor-binding site for ACE2 binding [83]. Ke et al. determined the S protein structure of intact SARS-CoV-2 virion by cryo-EM and cryo-ET and found that among all prefusion state S protein, fully closed trimers account for 31%, one RBD opened or two RBDs opened trimers account for 55% and 14%, respectively [83]. ACE2 binding destabilizes the prefusion structure and facilitates the S1 subunit dissociates, thereby enhancing the prefusion to the post-fusion transition of the S protein [85]. Fan et al. solved the cryo-EM structure of the post-fusion state of SARS-CoV-2 S and showed that S2 forms a tightly bound six-helix bundle by rotating HR1-HR2 and the linker region at the upstream of the HR2 motif [86] (Fig. 3d). The mechanism of S protein-mediated coronavirus attachment and fusion has been described in detail in previous reviews [87,88,89,90].

Fig. 3: The structures of S protein.
figure 3

a Schematic of SARS-CoV-2 S protein domain architecture. The S1 and S2 subunits are indicated, with diamond representing the locations of furin cleavage site. SP signal peptide, RBD receptor-binding domain, RBM receptor-binding motif, SD1 subdomain 1, SD2 subdomain 2, FP fusion peptide, HR1 heptad repeat 1, HR2 heptad repeat 2, TM transmembrane region, CT cytoplasmic tail. b Side and top views of the prefusion structure of the SARS-CoV-2 S protein with all RBD in the “down” conformation (PDB: 6VXX). Protein is shown in cartoon representation, and glycosyls are shown as sticks. c Side view of the prefusion structure of the SARS-CoV-2 S protein with one RBD in the “up” conformation (PDB: 6XKL). d Side and top views of the post-fusion structure of the SARS-CoV-2 S protein (PDB: 6M3W). e Structure of the SARS-CoV-2 RBD complexed with ACE2 (PDB: 6LZG). ACE2 is shown in yellow. The RBD is shown in cyan. Key contacting residues are shown as sticks at the SARS-CoV-2 RBD–ACE2 interfaces. f Side and top views of superimposed three cryo-EM structures of SARS-CoV-2 S in complex with nAbs. S2E12 (represented as a cyan surface) binds to the “up” conformation of SARS-CoV-2 S RBD (PDB: 7K4N); S2M11 (represented as a brown surface) binds to the “down” conformation of SARS-CoV-2 S RBD (PDB: 7K43); 4A8 (represented as a magento surface) binds to the NTD of SARS-CoV-2 S (PDB: 7C2L). g Amino acids mutated in the Omicron variant S protein.

SARS-CoV-2 displays higher transmissibility than other human coronaviruses, mainly attributing to the following factors: a, higher receptor-binding affinity; b, more susceptibility to protease; c, existing more alternative host cell receptors. Walls et al. found that SARS-CoV-2 S is bound to ACE2 with significantly higher affinity compared to SARS-CoV S [76]. Sequence alignment showed that four of five key residues at the interface between SARS-CoV-S RBD and host ACE2 are non-conserved, including Y422SARS-CoV/L455SARS-CoV-2, L472SARS-CoV/F486SARS-CoV-2, N479SARS-CoV/Q493SARS-CoV-2, T487SARS-CoV/N501SARS-CoV-2 and Y491SARS-CoV/Y505SARS-CoV-2 [84, 90,91,92]. Shang et al. demonstrated that the N501 and Q493 contribute to the enhanced ACE2-binding affinity of SARS-CoV-2 relative to SAR-CoV by forming additional hydrogen bonds with RBM main chain, as well as K31 and E35 side chains, respectively [77, 78]. Intriguingly, the B.1.1.7 variant with N501Y substitution, which was originated in the UK in September of 2020, displays higher transmissibility and rapid spread globally, which may be caused by the enhanced ACE2 affinity to the S variant of N501Y compared with the wild-type S protein [93,94,95].

As mentioned above, the SARS-CoV-2 S protein is proteolytically cleaved by furin during posttranslational maturation at ER to Golgi steps, leading to significantly decreased dependency of membrane protease. The non-covalently linked S protein is more flexible, which facilitates conformation change to the open state for receptor-binding and exposes the S2′ cleavage site to protease like TMPRSS2 [81, 96]. Cleavage of the S protein at the S2′ site by cell surface protease facilitates membrane fusion for viral cell entry. Mutations promoting the cleavage efficiency on S1/S2 or S2′ site lead to higher transmissibility. By early April of 2020, a SARS-CoV-2 variant containing a spike D614G mutation is spread rapidly and become the dominant form in the pandemic. Korber et al. found that the D614G variant is closely related to greater infectivity and increased viral load [97]. Yan et al. reported the cryo-EM structure of the D614G variant S protein and found that this mutation makes S protein more flexible and is susceptible to protease cleavage [83]. Another study demonstrated that the D614G mutation disrupted a salt bridge between D614 and K854, thereby attenuating the interaction between S1 and S2 subunits [84]. In the spring of 2021, the B.1.617 lineage (Delta) containing a P681R mutation in S protein emerged from India and became the dominant strain globally [98]. Saito et al. suggested that P681R mutation in the spike protein facilitates spike protein cleavage and enhances viral fusogenicity [99].

In addition to higher affinity to ACE2 and more accessible to be digested by host proteases, S protein binds to several other host receptors or co-receptors to facilitate the entry of SARS-CoV-2 into host cells. The N-terminal domain of S protein specifically interacts with tyrosine-protein kinase receptor UFO (AXL) in the human cell surface [100]. It was reported that other proteins like neuropilin-1 and asialoglycoprotein receptors also facilitate SARS-CoV-2 cell entry and infectivity [101, 102].

The S protein determines the viral infection efficiency of human cells and is pursued as a prominent anti-virus drug target. Due to the large interaction interface between S protein and ACE2, it is hard to block their interaction by small-molecule drugs. SARS-CoV-2 S neutralizing antibodies (nAbs) may be the preferred therapeutics. Recently, a number of broad and potent SARS-CoV-2 nAbs have been identified, including nAbs which were also identified from convalescent COVID-19 patients and nanobodies screened in the surface-displayed library. A number of structures of the SARS-CoV-2 S protein or RBD in complexed with antibody have been determined by cryo-EM and X-ray crystallography [83, 95, 103,104,105,106,107,108,109,110]. Most nAbs bind to the RBD, which is in the “up” conformation resembling the conformation when interacting with ACE2. Conversely, only a few nAbs can bind to the “down” conformation and thus impede the conformational switching required for viral entry [105, 111] (Fig. 3f). Besides anti-RBD antibodies, an antibody named 4A8 responds to S1-NTD, which has also been identified from convalescent COVID-19 patients [112](Fig. 3f). Recently, two reviews have conducted a comprehensive analysis on epitopes on S recognized by various antibodies [113, 114]. Currently, more than 50 nAbs-related clinical trials have been conducted for patients with COVID-19. Among them, eight RBD-specific nAbs have been authorized by the Food and Drug Administration (FDA) for emergency use [115].

Unfortunately, the continuously evolving SARS-CoV-2 has led to vaccine and nAbs resistance. Several variants of SARS-CoV-2 with certain mutations have been designated as a variant of concern (VOC) by the World Health Organization (WHO) [98]. On 26 November 2021, the WHO designated a new variant B1.1.1.529 (Omicron) as a VOC only 2 days after it was reported [98]. The Omicron variant accumulated a total of 60 mutations compared to the original SARS-CoV-2 variant [98]. There are 15 mutations on the RBD of S protein, which has more than 30 mutations in total (Fig. 3g). Recently, Xie et al. determined the RBD escaping mutation profiles for a total of 247 anti-RBD NAbs by yeast display screening [116]. Among all these tested nAbs, more than 85% are escaped by Omicron. For instance, nAbs whose epitope included in the RBM are escaped mainly by six mutations (K417N, N440K, G446S, E484A, Q493K, and G496S). Combination therapy may greatly reduce the probability of immune escape of mutant viruses caused by single-antibody treatment. High-resolution structures of nAbs and S protein are of vital importance in revealing their epitope distribution and understanding the neutralization mechanism. The structural information of neutralizing antibodies to different epitopes will be critical to accelerate the development of broadly protective nAb cocktail therapies for COVID-19. Moreover, structure-based design of nAb mutants help to generate novel nAbs with improved potency and efficacy [117].

In situ structure of SARS-CoV-2 virions

Coronavirus belongs to the envelope virus. Its shape is relatively variable, and each virus has a unique structure. Therefore, it cannot be solved by the single-particle cryo-EM, which limits the acquisition of high-resolution entire virus structure. With the improvement of the cryo-EM facility and the optimization of the image processing algorithm in recent years, it is possible to obtain the in situ structure with a near-atomic resolution by cryo-ET. In the autumn of 2020, three research groups reported the overall structure of purified SARS-CoV-2 virions, which presents an approximately spherical conformation. Ke et al. measured the average virion diameter of 91 ± 11 nm [118]. The virions may become less spherical after being concentrated by ultracentrifugation through a sucrose cushion [118]. Yao et al. measured the virion diameters for the short, medium, and the long axis of the envelope are 64.8 ± 11.8, 85.9 ± 9.4, and 96.6 ± 11.8 nm, respectively [119]. The S trimers are randomly distributed on the surface of the envelope (Fig. 4a). In three independent studies, the average number of S trimers resided on the virion surface was calculated to be between 20–40, with the reported statistical results being 40 [120], 24 ± 9 [118] and 26 ± 15 [119], respectively. Surprisingly, the S protein in the virus is not perpendicular to the viral membrane but tilted at different angles, which is significantly different from other enveloped viruses possessing class I fusion proteins (Fig. 4b). A small part of the S trimer even tilted by over 90° toward the membrane [118]. The tilt of S protein is thought to be attributed to the flexibility of the hinge region in proximity to the membrane. Interestingly, besides tilting from the membrane, S trimers can also move freely on the viral envelope. Yao et al. even observed two S trimers are combined together with heads and stalks forming a Y-shaped spike [119] (Fig. 4b). The flexibility of S protein both in direction and location may facilitate the virus sensing and binding to ACE2, allowing one S trimer to bind with two or three ACE2s, or two S proteins with one ACE2 dimer. Moreover, compared with recombinant S proteins, the enriched N-linked glycans present on the native spike are more complicated. These enriched glycosylation shields S from host protease digestion and antibody recognition.

Fig. 4: In situ structures of SARS-CoV-2 virions.
figure 4

a Cryo-EM map of SARS-CoV-2 virion structure. The S proteins with different conformations are distributed over the virion surface and can be tilted to different directions (EMD-30430). b A tilted conformation of S trimer is presented in the upper plots, while a Y-shaped spike pairs having two heads and one combined stem are presented in the lower plots. c Side view of cryo-EM map of SARS-CoV-2 virion structure, displaying RNPs assembled in the virus envelope. d Side and top views of in situ structure of SARS-CoV-2 RNP (EMD-30429). e Tomograms showing SARS-CoV-2 virions in VeroE6 cells (EMD-11865). f Tomograms showing DMVs in SARS-CoV-2 affected VeroE6 cells (EMD-11866). g Different views of the EM structure of the MHV-induced pore complex embedded in the DMVs membranes (EMD-11514).

The coronavirus has the largest genome across all RNA viruses but a smaller size (Ø = 90 nm) compared with other RNA viruses like human immunodeficiency virus (Ø = 120–170 nm) [121], and human respiratory syncytial virus (Ø = 150–250 nm) [122]. Considering that coronaviruses pack their large genomes to form a supercoiled dense structure into a relatively small viral particle [123], it seems that the RNPs of SARS-CoV-2 inside the envelope are more tightly packed relative to other coronaviruses.

The crystal structures of the nuclear acid-binding domain (NTD and CTD) of nucleocapsid protein have recently been solved [124, 125]. However, the lack of the native structure of the ribonucleoprotein complex limits our understanding of the assembly and function of coronavirus RNP. Yao et al. find that each SARS-CoV-2 virion contains an average of more than 26 RNPs, of which the membrane-proximal RNPs assemble as “hexon” and the membrane-free assemble as “tetrahedron” (Fig. 4c). They proposed an RNP assembly model in which the native RNPs interact with the RNA in a “beads on a string” pattern, similar to the mechanism of the chromatin forming in the eukaryotic cell [119]. Through the sub-tomogram averaging method, Yao et al. solved a 13.1 Å resolution structure of RNPs, which revealed their reverse G-shaped architecture, with its diameter 15 nm and height 16 nm (Fig. 4d). This is similar to the 30 Å-resolution RNPs structure in the intracellular virions, which is solved by Klein et al. [126]. This size is different from the reported architecture of SARS-CoV N protein, which is assembled by crystal packing of 24-mer CTD domain [127, 128]. The shape is also distinct from the released MHV RNP structure [129]. Limited by the resolution, it is hard to clarify the precise mechanism of RNP assembly. This structure also cannot provide more information on the interaction between N protein and other structural proteins like M and E [127]. Recently, a 4.3 Å cryo-EM map of full-length SARS-CoV-2 N protein has been released in the Electron Microscopy Data Bank, but the corresponding structural model is still unavailable [130]. This N protein structure of moderate resolution may provide valuable insights into RNP assembly of SARS-CoV-2.

To better understand the life cycle of the SARS-CoV-2 virus in host cells, Steffen et al. used the cryo-ET method to structurally characterize the near-native state of virus assembly, DMVs morphology, and extracellular virions [126] (Fig. 4e). DMVs inside the SARS-CoV-2 infected cells show an average diameter of 338 nm, which approaches the size of DMVs in SARS-CoV-1-infected VeroE6 cells [131]. The inner and outer membranes of DMVs are separated by 5–10 nm but are clamped together in several sites (Fig. 4f). These connected sites may be the proteinaceous pore complex formed by NSPs. This proteinaceous pore complex is analogous to the hexameric assembly crown-shaped complex, which is disclosed in murine hepatitis coronavirus-induced DMVs in infected cells [132]. This DMVs pore complex is a well-developed pathway for the transport of coronaviral RNA products out of the DMVs [132] (Fig. 4g). However, the exact component and the function of this molecular pore remain to be elucidated by high-resolution structures.

Summary and perspectives

As a result of the rapid development of structural biology techniques and the efficient investment of global researchers over recent years, remarkable progress has been made on structural studies of SARS-CoV-2. In addition to the RTC and S protein, the structures of most SARS-CoV-2 proteins have been resolved in the past 2 years. These proteins include (1) NSPs Mpro [13], PLpro [22, 133], nsp1 [134, 135], nsp9 [136], nsp10-nsp14 complex [31], nsp10-nsp16 complex [33] and nsp15 [32], (2) structural protein N [125, 137] and E proteins [138], (3) and accessory proteins such as orf8 [43] (Fig. 5). Recently, the structures of ORF3a [41], ORF7a [139], ORF9b [45], and Nsp2 [21] have also been reported (Fig. 5).

Fig. 5: High-resolution structures of SARS-CoV-2 proteins.
figure 5

Crystal structure of SARS-CoV-2 nsp1 (PDB: 7K7P); Cryo-EM structure of the SARS-CoV-2 nsp2 (PDB: 7MSW); crystal structure of the SARS-CoV-2 PLpro with GRL0617 (PDB: 7CMD); crystal structure of the SARS-CoV-2 Mpro in complex with an inhibitor N3 (PDB: 6LU7); crystal structure of the SARS-CoV-2 Nsp9 in complex with a peptide (PDB: 6WC1); crystal structure of the SARS-CoV-2 nsp10 (PDB: 6ZCT); crystal structure of SARS-CoV-2 nsp10 bound to nsp14-exoribonuclease domain (PDB: 7DIY); crystal structure of SARS-CoV-2 nsp10 in complex with nsp16 (PDB: 6W4H); crystal structure of the SARS-CoV-2 nsp15 endoribonuclease (PDB: 6VWW); crystal structure of SARS-CoV-2 nucleocapsid protein N-terminal RNA binding domain (PDB: 6M3M); crystal structure of SARS-CoV-2 nucleocapsid protein C-terminal RNA binding domain (PDB: 6WZO); Cryo-EM structure of SARS-CoV-2 ORF3a (PDB: 7KJR); the crystal structure of the SARS-CoV-2 ORF7a ectodomain (PDB: 7CI3); the crystal structure of SARS-CoV-2 ORF8 accessory protein (PDB: 7JTL); the crystal structure of SARS-CoV-2 ORF9b accessory protein (PDB: 6Z4U).

A better understanding of these viral protein structures has accelerated the structure-based development of novel drugs, including anti-influenza drugs Zanamivir [140] and Oseltamivir [141]. Based on the structure of SARS-CoV-2 Mpro in complex with inhibitors, Pfizer has developed the second-generation orally SARS-CoV-2 Mpro inhibitor, PF-07321332. It showed positive phase III results for the treatment of COVID-19 in combination with Ritonavir, which maintains higher circulating concentrations of PF-07321332 by inhibiting cytochrome enzymes [142].

PF-07321332 shows an overall structural similarity to the anti-HCV drug Boceprevir, which harbored a rigid P2 dimethylcyclopropylproline residue and a hydrophobic P3 residue (Fig. 6a) [143]. The P2 residue fits well to the S2 subsite and hydrophobically interacts with Met149 and Asp187, while the P3 residue interacts with Met165 in S3 subsite (Fig. 6b). The P1 c-Bua residue of Boceprevir does not directly contact the SARS-CoV-2 Mpro S1 subsite, thus limiting its affinity [144]. The P1 residue of PF-07321332 is a glutamine surrogate that mimics the equivalent residue of GC376. As a broad-spectrum cysteine protease covalent inhibitor, GC376 has shown promise in treating cats with fatal feline infectious peritonitis (FIP) caused by FIPV and is being investigated as a treatment for COVID-19 [144]. The glutamine surrogate ring of both GC376 and PF-07321332 fits into the S1 pocket and forms three stable hydrogen bonds with Phe140, His163, and Glu166 (Fig. 6c). Unlike other Mpro inhibitors, PF-07321332 contains a nitrile warhead in P1’ residue, forming a reversible covalent thioimidate adduct with the catalytic Cys145. A trifluoroacetyl capping group was applied to the P4 residue to improve the oral delivery efficiency. As shown in the SARS-CoV-2 Mpro-PF-07321332 complex structure, the trifluoromethyl group forms hydrogen bonds with Gln192 and two ordered water molecules, providing stronger interactions to the S4 subsite relative to Boceprevir (Fig. 6b). Collectively, through the structure-based approach, the affinity and physicochemical properties were optimized from initial leads in a short time. On November 16, an application was submitted to the U.S. FDA for PF-07321332 in combination with Ritonavir (brand name: Paxlovid) for emergency treatment of COVID-19. The advent of this promising new agent has offered much hope to people worldwide. In addition to SARS-CoV-2 Mpro, RdRp and spike protein, several other promising therapeutic targets have been identified in the laboratory, such as PLpro and helicase; structure-based antiviral drugs development targeting these proteins are ongoing [145, 146].

Fig. 6: The structures of SARS-CoV-2 Mpro in complex with inhibitors.
figure 6

a Chemical structure of PF-07321332, Boceprevir and GC376. b The comparison of SARS-CoV-2 Mpro-PF-07321332 complex structure with that of SARS-CoV-2 Mpro-boceprevir. c The comparison of SARS-CoV-2 Mpro-PF-07321332 complex structure with that of SARS-CoV-2 Mpro-GC376.

So far, there are only a few SARS-CoV-2 proteins whose structures have not been characterized, and most of them are transmembrane proteins (Table 1). These studies shed light on the biological functions of SARS-CoV-2 proteins and offer new opportunities to develop vaccines and drugs. Nevertheless, there are still some important issues waiting for elucidation from a structural perspective.

Table 1 A summary of SARS-CoV-2 protein structures that have not been resolved.

At present, the largest available RTC structure only contains nsp7-8-9-12-13 [28]. Although the structures of other potential components such as nsp10, nsp14, nsp15 and nsp16 have been solved, it remains unclear how these subunits participate in the assembly of the complete RTC. As we know, coronavirus contains the largest single-stranded RNA genome among all RNA viruses. How the complete RTC executes a series of complicated and fine processes in genome replication, including the mRNA 7MeGpppA2′OMe cap process and the mismatch repair process, remains to be revealed by the structural study. For biosafety reasons, the current structural studies of RTC are carried out through in vitro recombinant expression and assembly, which hampers the acquisition of stable RTC. Overcoming the biosafety issues of SARS-CoV-2, or alternatively to culture low pathogenicity coronaviruses like MHV to isolate and purify natural RTC may be a feasible strategy to solve the complete RTC structure.

Although the high-resolution crystal structures of NTD and CTD of N protein have been solved, and a low resolution full-length N protein structure has been available recently [124, 125, 130], the structural basis for the assembly of RNPs has not been clarified. It has been proposed that the assembly of RNPs enables efficient genome packaging in a nucleosome-like manner [126]. High-resolution of assembled RNP structure is needed to explain how many N proteins assembled in the natural state and how the RNA wounds around them. N protein plays a very important role in the process of virus assembly. Relevant studies have proven that N protein can interact not only with RNA but also with M and E proteins [147]. As described in the cryo-ET virion structure, the RNP is close to the inner membrane of the virus capsule, indicating that it is feasible to interact with membrane proteins like M and E proteins in terms of space proximity [119]. Elucidating the interaction mode of N protein with M protein and E protein would help understand the viral assembly process and provide a new opportunity for designing antiviral drugs that interfere with viral assembly.

Structurally, N proteins consist of two ordered domains: the N-terminal domain (NTD), and C-terminal domain (CTD), connected by flexible LKR linker regions and two disordered regions, N-arm and C-tail. These disordered regions of the N protein make it extremely difficult for high-resolution structure determination. In the SARS-CoV-2 virions, the N protein may be stabilized by interacting with several molecules, such as viral genome RNA, M protein, E protein and S protein, with an unknown assembly ratio. Besides, the N protein binds nonspecifically with the RNA phosphate backbone, making it difficult for in vitro assembly of the recombinant N protein complex. Alternatively, virus-like particles (VLPs), which can be spontaneously assembled by co-expressing these structural proteins, could be a potential resource of the N-M-E protein complex samples, since VLPs are viral genome-free and are considered biologically safe [147]. Nevertheless, obtaining the homogeneous, soluble N-M-E protein complexes required for detailed structural studies is still a huge challenge, because detergents can hamper the assembly of such membrane protein complexes. Extensive efforts to improve the stability of these complexes, such as developing specific Fabs or nanobodies, should be made for high-resolution structure determination.

The mechanism of DMVs formation in the life process of coronavirus has not been clearly explained. Previous studies have shown that nsp3, nsp4 are mainly involved in the DMVs formation process [148]. Interestingly, a hexameric assembly pore complex that spans both membranes of the MHV-induced DMVs was visualized by the cryo-ET method [132]. Nsp3 was identified as a major constituent of the pore complex, and nsp4, nsp6 and some other unknown proteins may also be involved in this process. It is speculated that the pore complex can export synthesized RNA to the cytosol. In a sense, DMVs are somewhat similar to the nucleus of eukaryotic cells. As an independent platform of mRNA and genome replication, it is continuously exported to the cytoplasm. Similarly, the novel pore complex is analogous to a nuclear pore and also spans double membranes. The pore complex may finely control the inlet and outlet of raw materials and products in the DMVs. The structure of the pore complex is important for understanding the mechanism of DMVs formation and material transport. Moreover, it can also provide a structural basis for the design of antiviral drugs through intervening viral replication.

With the advances in recent cryo-EM methods, it is practicable to solve the structure of protein-free RNA. Recently, Wah Chiu group and Rhiju Das group developed a set of methods for RNA structure study by cryo-EM [149]. These two groups solved several near-atomic resolution RNA structures [150, 151], including a 3.1 Å cryo-EM structure of the full-length Tetrahymena ribozyme. To date, 3D structures of key RNA elements in the SARS-CoV-2 genome remain poorly characterized. Only the structure of FSE was solved [20]. There remain dozens of highly conserved RNA segments in the SARS-CoV-2 genome. Targeting these segments by antisense RNA is a good strategy for therapeutic agent development [152,153,154]. Applying the cryo-EM method to study these RNA structures is of great significance for the elucidation of SARS-CoV-2 RNA biology, hopefully, which would facilitate the development of genome-disrupting antiviral agents.