Introduction

The devastation caused by the COVID-19 pandemic has led to an extraordinary expansion of research focused on the causative agent, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This virus has been classified within the species Severe acute respiratory syndrome-related coronavirus, which belongs to the genus Betacoronavirus of the family Coronaviridae1. The coronavirus family belongs to the order Nidovirales, which includes a rapidly expanding and diverse group of enveloped viruses with a single-stranded RNA genome of positive polarity. Different nidoviruses use similar strategies to organize, express and replicate their genomes. They constitute a monophyletic virus cluster that is characterized by the universal conservation of seven domains in their large replicase gene, which encodes the functions required for viral RNA replication and transcription in the infected cell2. Much of the foundational research on this replicase addressed viruses in related nidovirus families such as the Arteriviridae and, more relevantly, other members of the family Coronaviridae. The features unique and common to viruses within and outside the order Nidovirales have been described in several comprehensive reviews3,4,5. Using the first available genome sequences, early studies unravelled the genome organization and expression strategy used by coronaviruses and other nidoviruses. These studies were followed by pioneering bioinformatics, biochemical and genetic studies that established or confirmed the essential functions of many of the replicase proteins, thereby laying a road map for the currently ongoing SARS-CoV-2 research.

Betacoronaviruses have caused previous deadly epidemics, including the 2003 SARS outbreak and the ongoing Middle East respiratory syndrome (MERS) epidemic, which was first detected in 2012 (ref.6). These zoonotic events inspired earlier efforts to develop coronavirus-specific antiviral drugs6,7,8,9. The onset of the COVID-19 pandemic spurred scientists around the globe to apply their respective expertise to address how SARS-CoV-2 infects humans, avoids or delays host immune responses, copies its genome and expresses its proteins to make new virions. As successful replication directly depends on efficient synthesis of viral RNA, the replicase proteins responsible for this process are obvious antiviral drug targets.

Coronaviruses use an unusually large collection of RNA-synthesizing and RNA-processing enzymes to express and replicate a genome that is two to three times larger than that of most other RNA viruses. The central enzyme of transcription and replication is the RNA-dependent RNA polymerase (RdRp), which synthesizes all viral RNA and is a proven target for antiviral drugs. Successful replication and transcription also entails the use of specific RNA recognition signals to initiate RNA synthesis, sustaining RdRp processivity and fidelity, viral mRNA capping to ensure translation by host ribosomes, and the spatial and temporal regulation of the viral cycle within the infected cell. The non-structural proteins that assist the RdRp to perform these functions (Table 1) constitute additional targets for antiviral drug development. Like the RdRp, some of these non-structural proteins are conserved across most RNA viruses, whereas others are unique to coronaviruses3,4,5.

Table 1 Coronavirus replicase proteins and their functions

In this Review, we discuss recent advances in deciphering the molecular mechanisms of coronavirus gene expression and RNA replication, with special focus on SARS-CoV-2. We aim to contextualize the more recent studies with the foundational work, to provide a coherent view of our current understanding of the successive steps in SARS-CoV-2 replication and transcription. We focus on the processes required for viral gene expression and replication: translation, replication organelle formation, and production and capping of genomic RNA (gRNA) and subgenomic RNAs (sgRNAs). We also describe the known and proposed functions of the viral nucleic acid-metabolizing proteins, as revealed by new biochemical, structural and virological studies. We then discuss two well-studied examples of antiviral nucleoside analogues that target the RdRp; owing to space constraints, we do not address therapeutics designed to target viral proteins other than the RdRp. Likewise, we do not extensively cover the mechanisms of viral protein synthesis, or the (proposed) involvement of a substantial number of host factors in coronavirus replication and transcription. We conclude the Review by describing the remaining gaps in our knowledge to help guide new research.

The SARS-CoV-2 infection cycle

To infect a cell, coronaviruses use multiple host factors, whose expression patterns therefore co-determine viral tropism. Delivery into the cell and translation of the large RNA genome launches a cytoplasmic replication cycle that integrates a remarkable variety of strategies to fine-tune viral gene expression on both the translational level and the transcriptional level. The successive steps that ultimately lead to the release of viral progeny are coordinated temporally and spatially, and rely extensively on the infrastructure and metabolism of the host cell. In this section, we outline the key steps of the SARS-CoV-2 infection cycle.

Entry into the host cell

SARS-CoV-2 entry into the cell depends on several host attachment and entry factors, chiefly among them on the receptor angiotensin-converting enzyme 2 (ACE2), to which the SARS-CoV-2 spike (S) glycoprotein binds (reviewed in refs10,11) (Fig. 1). The fusogenic spike protein consists of two parts, S1 and S2, which mediate attachment to and fusion with the cell membrane, respectively. Cellular proteases such as transmembrane serine protease 2 cleave the spike protein, a step required to prime its membrane fusion activity12,13.

Fig. 1: The SARS-CoV-2 infection cycle.
figure 1

To enter a host cell, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike protein interacts with the cellular surface protein angiotensin-converting enzyme 2 (ACE2), while being cleaved by cellular proteases such as transmembrane serine protease 2 (TMPRSS2) to activate its membrane-fusion capacity. The genomic RNA (gRNA), which is capped on its 5′ end (red circle) and polyadenylated ((A)n) on its 3′ end, is released from the viral particle and — after recruiting host-cell ribosomes — translated into two replicase polyproteins, pp1a and pp1ab. Proteases embedded in viral non-structural protein 3 (nsp3) and nsp5 cleave pp1a and pp1ab into 16 non-structural proteins that assemble into replication–transcription complexes (RTCs). Viral RNA synthesis occurs within double-membrane vesicles that are part of virus-induced membranous replication organelles (Box 1). The RTCs produce new gRNAs and a set of subgenomic mRNAs (sg-mRNAs) that include open reading frames (ORFs) 2–9b, which encode the structural spike, membrane, envelope and nucleocapsid proteins, and also a number of accessory proteins. Newly made gRNAs can be translated to yield additional non-structural proteins, serve as a template for further RNA synthesis or be packaged into new virions. SARS-CoV-2 assembly starts with the coating of gRNAs with nucleocapsid proteins, generating nucleocapsid structures that bud into the endoplasmic reticulum–Golgi intermediate compartment (ERGIC), thereby acquiring a lipid bilayer containing the viral spike, membrane and envelope proteins. Adapted from ref.161, CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/).

Following membrane fusion, the SARS-CoV-2 gRNA is released into the cytosol. The genome is a 5′-capped, single-stranded RNA of 29,870 bases with a 3′ poly(A) tail of variable length14,15. It encodes at least 13 recognized open reading frames (ORFs), organized largely linearly from the 5′ end to the 3′ end (Fig. 2a). The coding part of the gRNA is flanked by a 5′ untranslated region (UTR) and a 3′ UTR of 265 and 337 nucleotides (excluding the poly(A) tail), respectively. The gRNA possesses a number of regulatory sequences and higher-order RNA structures (discussed later) that are involved in its translation, replication and transcription16,17,18,19.

Fig. 2: Regulation of SARS-CoV-2 gene expression on the translational level.
figure 2

a | Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome organization. Open reading frames (ORFs) are not drawn exactly to scale. The SARS-CoV-2 genomic RNA (gRNA) has a 5′ cap (red circle) followed by a leader sequence (red line) that is shared with all subgenomic mRNAs (sg-mRNAs), and a 3′ poly(A) tail. The 5′-proximal three quarters of the genome encode the replicase polyproteins pp1a and pp1ab, which are cleaved to yield 16 non-structural proteins (nsp1–nsp16; blue from ORF1a and red from ORF1b). The 3′-proximal one quarter of the genome encodes the structural (brown) and accessory (azure) proteins. Structural and accessory proteins are expressed from a nested set of sg-mRNAs, with ORF3c, ORF7b and ORF9b being expressed via ribosomal ‘leaky scanning’. b | RNA motif and structures that promote a frameshift from ORF1a to ORF1b, thereby controlling the synthesis of pp1ab. The key programmed ribosomal frameshifting (PRF)-stimulating RNA structure — a pseudoknot — interacts with the ribosome and induces its pausing, which generates tension in the gRNA template. As a result, ribosomes can slip one nucleotide backwards on the ‘slippery sequence’ (−1 PRF). An attenuating RNA loop located upstream of the slippery sequence also contributes to modulating PRF frequency. c | Model of −1 PRF at the ORF1a–ORF1b junction, showing the regulatory RNA elements inducing a simultaneous −1 shift of the tRNAs bound to the A and P sites of the ribosome, which can then translate ORF1b. The one-letter code for amino acids (circles) is used. A stop sign represents the ORF1a stop codon. E, envelope protein; M, membrane protein; N, nucleocapsid protein; S, spike protein.

Gearing up for RNA synthesis

Similarly to all positive-strand RNA viruses of eukaryotes, the replication of SARS-CoV-2 occurs entirely in the cytoplasm. The SARS-CoV-2 gRNA first recruits host ribosomes and serves as mRNA for translation of the two large replicase ORFs ORF1a and ORF1b, which constitute about three quarters of the genome15 (Fig. 2a). The resulting, amino-terminally (N-terminally) collinear replicase polyproteins pp1a and pp1ab are 4,405 and 7,096 amino acids long, respectively. Production of pp1ab depends on the occurrence of a −1 programmed ribosomal frameshift (PRF) just upstream of the ORF1a termination codon, thus extending pp1a with the ORF1b-encoded polyprotein. The estimated frameshifting efficiency is 45–70%, resulting in a 1.5–2-fold overexpression of ORF1a-encoded proteins relative to ORF1b-encoded proteins20. Sixteen mature non-structural proteins are released from pp1a and pp1ab following 15 proteolytic cleavages performed by the virus-encoded papain-like protease (PLpro) in non-structural protein 3 (nsp3) and chymotrypsin-like or main protease (Mpro) in nsp5 (Table 1). In this manner, pp1a yields nsp1 to nsp11, whereas pp1ab is cleaved into nsp1 to nsp10 and nsp12 to nsp16 (refs21,22) (Fig. 2a).

The rapidly released nsp1 mediates the shutdown of the translation of host mRNAs23,24,25, while the other non-structural proteins form protein complexes, yet to be definitely determined, that engage in viral RNA synthesis and are referred to as the replication–transcription complexes (RTCs). Replication and transcription are driven primarily by the enzymes contained in nsp12, nsp13, nsp14 and nsp16 (Table 1). Nsp12, the subunit containing the RdRp domain, catalyses RNA synthesis with the assistance of nsp7 and nsp8, together forming the holoenzyme RdRp (holo-RdRp). Other RTC subunits have supporting roles in the RTC, modulate the host’s innate immune responses or remodel cell membranes into peculiar double-membrane structures known as ‘replication organelles’, which accommodate viral RNA synthesis26,27. The formation of replication organelles typically precedes the exponential phase of viral RNA synthesis and is discussed in Box 1.

RNA synthesis and virion assembly

Using the gRNA template, RNA synthesis by the RTC starts with producing both a full-length genome complement (the anti-genome) and a set of minus-strand sgRNAs, which are derived from the gRNA region downstream of ORF1a and ORF1b (the replicase gene). Whereas the anti-genome serves as a template to produce new gRNA, the minus-strand sgRNAs direct the synthesis of a nested set of subgenomic mRNAs (sg-mRNAs) (discussed later). Although transcription is defined principally as the synthesis of RNA from a DNA template, in this Review we use the term to describe the synthesis of sg-mRNAs from RNA templates, to conform with the terminology used in the coronavirus literature.

The sg-mRNAs are crucial for the production of the four coronavirus structural proteins, which are required for virion assembly and egress. New virions were recently reported to leave the cell via lysosomal trafficking rather than the biosynthetic secretory pathway used by many other enveloped viruses28. A number of the sg-mRNAs are used to express so-called accessory proteins, many of which have been implicated in modulating cellular innate immune responses20,29,30 (Fig. 1).

Regulation of SARS-CoV-2 translation

Expression of the SARS-CoV-2 proteins in infected cells depends primarily on the translation of gRNA and the eight ‘canonical’ sg-mRNAs20,29,30. The viral genes fall into three groups (Fig. 2a): replicase ORF1a and ORF1b, which are translated from gRNA with ORF1b expression depending on −1 PRF; ORFs encoding the four ‘universal’ coronavirus structural proteins (the spike, membrane, envelope and nucleocapsid proteins) (Fig. 1), which are translated from sg-mRNAs; and ORFs encoding accessory proteins, which are translated from the remaining sg-mRNAs, but differ widely between various coronavirus lineages31 (Fig. 2a). In addition, small (putative) ORFs that overlap with several of the ORFs outlined above were identified by theoretical and experimental approaches. These small ORFs were also the subject of nomenclature confusion32, and their expression and biological relevance continue to be investigated32,33,34,35. In this light, we have included only ORF3c and ORF9b of the small ORFs in the current map of the SARS-CoV-2 genome (Fig. 2a).

Like many RNA viruses, coronaviruses use non-canonical translation mechanisms to expand their coding capacity and fine-tune the expression levels of particular viral proteins36. Specifically, PRF (for ORF1b) and ‘leaky ribosomal scanning’ (for some ORFs in sg-mRNAs) co-regulate SARS-CoV-2 genome expression. In leaky ribosomal scanning, ribosomes load onto the 5′ end of the viral sg-mRNAs, but initiate translation from a more downstream, internal start codon. The use of leaky scanning has been suggested or demonstrated for SARS-CoV-2 ORF3c33,34, and for ORF7b20,22,37 and ORF9b38 of both SARS-CoV and SARS-CoV-2. In these cases, leaky scanning and expression of the more downstream ORF appear to be promoted by the suboptimal nature of upstream translation initiation signals.

Expression of ORF1b from gRNA depends on −1 PRF occurring just upstream of the ORF1a stop codon39, a highly conserved feature among coronaviruses and other nidoviruses. Termination of SARS-CoV-2 ORF1a translation yields the 4,405-residue-long pp1a, and −1 PRF results in extension to yield the 7,096-residue-long pp1ab. The ORF1a–ORF1b PRF mechanism directs the expression of the key RNA metabolism enzymes of the RTC and regulates the relative expression levels of the proteins encoded by ORF1a and ORF1b. Frameshifting occurs on a specific ‘slippery sequence’ (5′-U UUA AAC-3′ (nucleotides 13,462–13,468 in GenBank genome entry MN908947.3; all genome reference numbers in this Review refer to this sequence)), followed by GGG in the case of SARS-CoV-2) — ribosomes translating the UUA and AAC codons of ORF1a can shift one nucleotide backwards, and translation then continues with a CGG codon in ORF1b40,41 (Fig. 2b,c). The importance of maintaining the optimal ratio between ORF1a expression and ORF1b expression was demonstrated experimentally with use of SARS-CoV mutants with altered PRF levels, which were found to be dramatically crippled41.

Regulation of PRF efficiency is achieved by the formation of several RNA structures and through interactions of the nascent protein chain with the ribosome. The key PRF-stimulating element is a three-stemmed RNA pseudoknot structure located downstream of the slippery sequence39,40,41,42 (Fig. 2b). This element interacts with the ribosome at the entry of the mRNA channel of the 40S ribosomal subunit and induces translational pausing before −1 PRF; complete unfolding of this tertiary RNA structure is slow and thought to promote ribosomal frameshifting on the viral mRNA40. The position of the ORF1a stop codon, five codons downstream of the frameshift site, may also regulate PRF levels by allowing the pseudoknot to refold, thus preventing trailing ribosomes from continuing along the viral RNA that was unfolded by the leading ribosome40. PRF frequency is further modulated by a translation-attenuating RNA loop upstream of the slippery sequence40, which may either directly inhibit frameshifting42 or force elongating ribosomes to dissociate before reaching the PRF site41. Finally, interactions between specific residues in the nascent viral polyproteins and the ribosome exit tunnel are thought to co-determine PRF efficiency40.

RNA replication and transcription

As described earlier, ORF1a and ORF1b are translated into the pp1a and pp1ab precursors that give rise to 16 non-structural proteins1,2,3,4. Fourteen of these replicase subunits have been ascribed some type of function in coronavirus replication, several with manifold activities, as listed in Table 1. These non-structural proteins either have been directly implicated in nucleic acid metabolism or enable or promote the activity of the catalytic non-structural proteins, to stimulate RNA synthesis and processing or participate in the formation of replication organelles. In this Review, we discuss in detail the molecular and biochemical features of these proteins, focusing mostly on nsp7, nsp8, nsp9, nsp10, nsp11, nsp12, nsp13, nsp14 and nsp16, but do not delve into the details of the role of nsp15, which includes a unique uridylate-specific endoribonuclease43 that is conserved in most vertebrate nidoviruses. Nsp15 has been implicated in innate immunity evasion44,45, possibly by shortening the poly(U) stretches that are present at the 5′ end of viral minus-strand RNAs45. In coronaviruses, uridylate-specific endoribonuclease activity is required for efficient replication, but that requirement can be bypassed in host cells with depressed type I interferon sensing or production44,45. In the following subsections we summarize our current understanding of the mechanisms of coronavirus gRNA and sg-mRNA synthesis, and the involvement of specific viral proteins in controlling these processes. We note that some non-structural proteins have other functions, which we do not mention because they are outside the scope of this Review. For a more in-depth summary of the literature on the functions of each non-structural protein, see Table 1 and references therein.

Continuous and discontinuous RNA synthesis in replication and gene expression

Following gRNA translation and proteolytic maturation of the replicase polyproteins, a relatively complex programme of SARS-CoV-2 RNA synthesis and gene expression is initiated, which depends on the interplay between viral RNA and non-structural proteins on the one hand (Fig. 3), and host-cell proteins and membranes on the other hand (Box 1). A variety of RNA sequences and structural elements in the terminal regions of the coronavirus genome have been implicated in the specific recognition of RNA templates by the coronavirus RTC16,17,18 (Fig. 3a). Long-range RNA–RNA interactions may be important for replication and transcription, although in many cases direct experimental support for their biological relevance remains to be obtained.

Fig. 3: SARS-CoV-2 RNA replication and transcription.
figure 3

a | The genomic RNA (gRNA) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has cis-acting structures at its 5′ untranslated region (UTR) and 3′ UTR, and engages in long-distance intramolecular RNA–RNA interactions; these structures and interactions are thought to be involved in the regulation of replication and transcription. The 5′ UTR contains five conserved stem–loop (SL) structures, and the 3′ UTR contains a bulged stem–loop (BSL), a (predicted) pseudoknot (PK) and a stem–loop structure with a hypervariable region (HVR). SARS-CoV-2 gRNA cyclization results in complete opening of SL3, where the leader transcription regulatory sequence (TRS-L) resides. b | The gRNA serves as a template for gRNA replication (step 1) and for subgenomic mRNA (sg-mRNA) transcription (step 2); each process requires dedicated minus-strand templates: the anti-genome and a set of minus-strand sgRNAs, respectively. Synthesis of the latter involves a discontinuous step in which the replication–transcription complex (RTC) pauses RNA synthesis after copying one of the body transcription regulatory sequences (TRS-B), and detaches from the template. Subsequently, the RTC relocates to a position near the 5′ end of the gRNA template, where the complement of the TRS-B (anti-TRS-B) in the nascent minus-strand sgRNA engages in base pairing with the TRS-L. This template switch leads to the addition of the complement of the gRNA leader sequence (anti-leader) to the 3′ end of each of minus-strand sgRNA, which are used as templates for sg-mRNA production, thereby ensuring that all coronavirus sg-mRNAs include a 5′-terminal leader sequence of ~75 nucleotides that is identical to the 5′-terminal sequence of the gRNA. Positions of the TRSs and anti-TRSs are schematic and not drawn exactly to scale. Part b adapted from ref.161, CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/).

As outlined earlier, SARS-CoV-2 RNA synthesis can be divided into genome replication and sg-mRNA transcription (Fig. 3b). Replication yields full-length viral plus-strand gRNA, which can be translated into additional replicase polyproteins, serve as a template for additional minus-strand RNA synthesis or be packaged into progeny virions. Transcription produces the nested set of sg-mRNAs used to express the structural and accessory proteins. Replication and transcription both require dedicated minus-strand RNA templates; full-length minus strands serve as a template for gRNA replication, whereas a nested set of minus-strand sgRNAs serve as templates for transcription, as first proposed about a quarter of a century ago46. The sg-mRNAs have the same 3′-terminal sequence, and carry a common 5′ leader sequence that is identical to the 5′-terminal 75 nucleotides of the gRNA. The leader derives from a discontinuous step (that is, from template switching47 during minus-strand sgRNA synthesis), which occurs when the RTC stalls at the 3′-proximal quarter of the gRNA template (Fig. 3b). This interruption mediates RTC detachment and relocation to a position near the 5′ end of the gRNA template (discussed later), where minus-strand synthesis resumes. This process yields a set of nested minus-strand sgRNAs with common 5′-terminal and 3′-terminal sequences, which serve as templates for the synthesis of a complementary set of sg-mRNAs47,48.

The presence of the common 5′ leader sequence in coronavirus mRNAs may offer several advantages. Its complement (the anti-leader sequence) offers a conserved starting point for plus-strand RNA synthesis, which can be used to initiate the synthesis of both gRNA and all sg-mRNAs. Although not studied in detail thus far, the common 5′ leader sequence may also serve as a recognition signal for the viral mRNA capping machinery (Box 2). Furthermore, the nsp1 proteins of SARS-CoV, MERS-CoV and SARS-CoV-2 all mediate a translation shut-off in the infected cell49, which is based on their ability to block the ribosomal mRNA entry channel24,25 and induce endonucleolytic cleavage of host mRNAs23. The common 5′ leader sequence present in all coronavirus mRNAs allows escape from translation shut-off50, by yet unknown mechanisms, resulting in simultaneous viral mRNA translation and impairment of host-cell gene expression, including of genes mediating the early responses to virus infection.

Regulation of template switching

The template switching required to extend the ‘body’ of the nascent minus-strand sgRNA with the anti-leader is primarily guided by the body transcription regulatory sequence (TRS-B) elements. These short sequences are found just upstream of the ORFs that encode structural and accessory proteins (except for those expressed through leaky scanning). After copying of a TRS-B sequence, minus-strand RNA synthesis stalls and the 3′ end of the nascent RNA strand is translocated to reinitiate RNA synthesis at the leader TRS (TRS-L) near the 5′ end of the gRNA template. This step is strongly facilitated by a base pairing interaction between the TRS-B complement at the 3′ end of the nascent minus strand (anti-TRS-B) and the TRS-L sequence in the gRNA template, as demonstrated previously in related viruses by site-directed mutagenesis studies51,52,53,54 (Fig. 3b). Coronavirus TRSs comprise a conserved core sequence (5′-ACGAAC-3′ in the case of SARS-CoV and SARS-CoV-2) that is flanked by sequences of variable length that may also contribute to the base pairing interaction with the TRS-L region20,29,30. In addition to the strength of the RNA duplex that is formed with the TRS-L region, other factors may co-determine the relative activity of a TRS-B element, and consequently the level at which the corresponding sg-mRNA is produced. These factors include the relative position of a TRS-B with respect to the 3′ end of the gRNA template, flanking RNA sequences51 and the local or overall RNA structure of the gRNA template16. Genome cyclization driven by long-distance RNA–RNA interactions (Fig. 3a), was recently proposed to expose the TRS-L for base pairing during discontinuous minus-strand synthesis16, similarly to what was previously postulated for arteriviruses55.

The series of ‘stop-or-go decisions’ at the consecutive TRS-B elements encountered by the minus strand-transcribing RTC is thought to fine-tune the relative abundances of the various sg-mRNAs, which remain largely similar throughout the course of infection56. The TRS-L is effectively ‘merged’ with one of the TRS-B elements in each of the sg mRNAs, thus positioning the ORF downstream of that TRS-B at the 5′-proximal position in the sg-mRNA and allowing it to be accessed by host ribosomes. Thus, coronavirus sg-mRNAs are nested and, except for the smallest species, polycistronic. However, they are presumed to be functionally monocistronic, with translation being restricted to the ORF most proximal to the 5′ end of the RNA, except in the case of sg-mRNAs on which leaky ribosomal scanning occurs to access a second ORF.

The canonical 5′-ACGAAC-3′ TRS core sequence occurs only nine times in the SARS-CoV-2 genome (in TRS-L and eight TRS-Bs), coordinating the production of eight sg-mRNA species (RNAs 2–9)20,29,30,57 (Fig. 2a). The smallest of these (mRNA 9) encodes the nucleocapsid protein and is by far the most abundant transcript30,56. For most sg-mRNAs, transcript abundance strongly correlates with ribosome footprint densities, indicating that they are translated with similar efficiencies, in line with the fact that their 5′ UTRs starting with the 75-nucleotide common leader sequence are largely identical (Fig. 2a). The detection of a separate sg-mRNA to express ORF7b was reported, derived from template switching at a TRS-B-like sequence (5′-AAGAAC-3′) located just upstream of ORF7b. Although the effective contribution of this TRS-B-like sequence to ORF7b expression may be limited20, this exemplifies how the (low-frequency) use of TRS-B-resembling sequences may yield additional subgenomic transcripts.

SARS-CoV-2 in-depth transcriptomics

Recently, the use of different highly sensitive techniques to study the SARS-CoV-2 transcriptome has identified numerous ‘non-canonical’ subgenomic transcripts20,29,30,57. These derive from TRS-L-dependent transcription, with the TRS-L, for example, being fused to downstream TRS-B-like sequences located in the middle of known ORFs; from large or local deletions generated without the apparent involvement of TRSs; or from the generation of (possibly) defective RNAs that may interfere with replication of the full-length genome by competing for the viral RdRp and other crucial replication factors, as described in several other coronaviruses58. These RNA species may in part derive from RNA recombination, which occurs at high frequency in coronaviruses59,60,61. The most accepted model for recombination in RNA viruses, similarity-assisted copy-choice RNA recombination, bears strong resemblance to the mechanism of coronavirus discontinuous minus-strand sgRNA synthesis47. Recombination involving host RNAs has also been invoked to explain gene acquisition during the evolution of coronaviruses and other nidoviruses3. It was hypothesized that TRS-B elements serve as recombination hotspots60,61, and that RNA secondary structures promote template switching in a TRS-independent manner61. Together, in-depth transcriptomics and ribosome-profiling experiments have revealed a complex landscape of SARS-CoV-2 RNAs and (potential) proteins, which extends well beyond the ‘canonical’ gene expression programme based on translation of the gRNA and canonical sg-mRNA20,29,30,57. Similar observations were made in other coronaviruses62,63. The additional transcripts may serve to express previously unknown small ORFs, truncated proteins or fused (partial) gene products, but their potential roles in SARS-CoV-2 replication and pathogenesis remain to be thoroughly investigated.

Balancing replication and transcription

It is unknown whether the composition of RTCs engaging in synthesis of minus-strand gRNA versus minus-strand sgRNAs is identical. Interactions with specific protein factors may govern the balance between replication and transcription. Two examples of transcription-specific protein functions have been documented in arteriviruses, which are distant coronavirus relatives in the order Nidovirales that also use discontinuous RNA synthesis to generate a nested set of sg-mRNAs64. The N-terminal subunit of the arterivirus replicase, nsp1, controls the accumulation of gRNA and sg-mRNAs by determining the levels at which their respective minus-strand templates are produced65. Specifically, mutagenesis of the N-terminal zinc-finger domain of nsp1 fully abrogated sg-mRNA synthesis, whereas gRNA production by such mutants increased 2.5–3-fold66. A serendipitous mutation just downstream of the zinc-binding domain of the helicase subunit also decreased arterivirus transcription and increased replication67,68, a finding that may be relevant to a recent hypothesis69,70 postulating that helicase-induced RTC backtracking contributes to the interruption of minus-strand RNA synthesis and/or to template switching (discussed later).

The role of the nucleocapsid protein in RNA synthesis

Although the primary role of the coronavirus nucleocapsid protein is gRNA encapsidation, it has also been implicated in a variety of other functions and interactions, including in regulating or modulating viral replication and transcription, although the interpretation of the supporting evidence is often complicated by the generally strong affinity of the nucleocapsid protein for RNA71,72,73. Both nonspecific RNA binding and binding to specific RNA sequences, including the TRS, have been reported, but often based on in vitro assays using purified nucleocapsid protein (reviewed in ref.71). A human coronavirus 229E RNA replicon lacking the nucleocapsid-encoding gene (and all other structural protein genes) retained the capability to replicate itself and synthesize sg-mRNAs74. This finding, and the fact that nucleocapsid-protein expression promotes viral replication75,76, suggests that the nucleocapsid protein has a modulatory rather than an essential role in coronavirus RNA synthesis. In line with this notion, the launching of coronavirus replication from in vitro-generated gRNA can be enhanced by co-expression of the nucleocapsid protein71. The nucleocapsid proteins of different coronaviruses interact with numerous other proteins, including with coronavirus replicase subunits such as nsp3 (ref.77) and host cell factors such as the RNA helicase DDX1 (ref.78). In the latter case, a complex formed between DDX1 and phosphorylated nucleocapsid protein was proposed to control the balance between replication and transcription by modulating the level of template switching at the successive TRS-B elements encountered by the RTC. The SARS-CoV-2 nucleocapsid protein was also shown to promote the cooperative association of the nsp7–nsp8–nsp12 complex with poly(U) RNA in vitro, thereby possibly facilitating initiation and/or elongation of viral RNA synthesis79.

The replication–transcription complex

The nsp7–nsp8–nsp12 holo-RdRp is the central component of the coronavirus RTC, and investigating the molecular basis of its RNA-synthesizing activity will facilitate rational drug design. Akin to other polynucleotide polymerases, the RdRp catalyses the incorporation of ribose nucleoside triphosphates (NTPs) into a nascent ‘product’ RNA using the information provided by the template RNA. However, maintaining the integrity of the template poses significant challenges to coronaviruses as their genomes are approximately 30 kb long, which is a burden considering the so-called error threshold that demarcates the genome size above which the long-term survival of an RNA virus species would be tenuous80,81. Thus, to preserve the integrity of their genetic information, coronaviruses have evolved mechanisms to mitigate the impact of nucleotide misincorporations during RNA synthesis3. This section highlights recent biochemical and structural studies of how coronaviruses orchestrate their replication and transcription, with the added aim of enhancing our understanding of the druggable SARS-CoV-2 proteome.

Molecular mechanism of RNA synthesis

Structural and bioinformatics work classified the coronavirus polymerase subunit (nsp12) into three domains: an N-terminal nidovirus RdRp-associated nucleotidyltransferase (NiRAN) domain (residues 1–250), an interface region between the NiRAN domain and the RdRp domain (residues 251–398) and the core RdRp domain (residues 399–932)82,83 (Fig. 4). The core RdRp domain assumes an architecture analogous to a cupped right hand composed of three subdomains: the fingers, palm and thumb84 (Fig. 4b). Within the core RdRp domain, the active site is further subdivided into seven functional features, known as motifs A–G, which are highly conserved across positive-stranded RNA viruses84 (Fig. 4d).

Fig. 4: Architecture of SARS-CoV-2 RNA replication and transcription complexes.
figure 4

a | Surface-rendered representation illustrating two molecules of non-structural protein 13 (nsp13) bound to the replication–transcription complex (nsp132–RTC) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Depicted are the nsp7–nsp8–nsp12 RNA-dependent RNA polymerase (RdRp) holoenzyme bound to a double-stranded RNA (dsRNA) scaffold and to two molecules of nsp13: nsp13 thumb (nsp13T) and nsp13 fingers (nsp13F). The 1B, RecA1 and RecA2 domains of nsp13 are highlighted. b | The nsp12 RdRp domain adopts the canonical conformation of a ‘cupped right hand’ and its structural motifs are named accordingly. The nidovirus RdRp-associated nucleotidyltransferase (NiRAN) domain constitutes the amino-terminal 250 residues of nsp12. c | View of the nsp132–RTC complex rotated by 90° compared with part a. d | Zoomed-in view of the RdRp active site and its conserved structural motifs, except for motif G, which was excluded from the illustration for clarity. The active site is depicted with an incoming nucleotide, modelled using the pre-incorporation structures of the hepatitis C virus RdRp (Protein Data Bank entry 4WTL162). Residues involved in both chelating the catalytic Mg2+ ions and orientating the incoming nucleotide are shown in a stick representation. Protein Data Bank entry 6XEZ69 was used in preparation of this figure. NTP, nucleoside triphosphate; p-RNA, product RNA; t-RNA, template RNA; ZBD, zinc-binding domain.

Studies of smaller RdRps, amenable to X-ray crystallography, have provided a wealth of information on the role of these conserved structural motifs during the nucleotide addition cycle84,85. Initial nucleotide recognition is mediated by nonspecific charge–charge interactions of the nucleotide substrate with a series of positively charged Lys and Arg residues in motifs D and F of the nsp12 RdRp domain86. Molecular dynamics simulations of the hepatitis C virus RdRp indicate that the nucleotide diffuses into the central cleft through the NTP entry channel, until it reaches the active site located in the main channel86. The nucleotide ribose and base moieties subsequently flip into the active site through stabilizing hydrogen-bonding interactions with residues in motifs A and B and base-specific interactions with residues in motif F to form a Watson–Crick base pair with the template nucleotide86,87,88 (Fig. 4d). Correct positioning of the incoming nucleotide stabilizes the two catalytic Mg2+ ions that are necessary for the condensation reaction via interactions with the α and β phosphates of the incoming nucleotide, the product RNA 3′-hydroxy group and the catalytic Asp residues of motif C87,89. Closure of the RdRp active site via the stabilizing interactions of motifs A and B with the base enables the acid–base chemistry that drives the attack of the deprotonated product RNA 3′-hydroxy on the α-phosphorus atom of the incoming NTP, resulting in nucleotide addition and the release of pyrophosphate87,89.

The conformational state immediately after catalysis, in which the product RNA 3′ base occupies the incoming-nucleotide site, is referred to as the ‘pre-translocated state’ (Fig. 5). The conversion into the ‘post-translocated’ state mandates the release of pyrophosphate via opening of the active site through a subtle rotation of motif A84. Entry into the post-translocated state resets the active site for the next nucleotide addition cycle. A generalized kinetic scheme indicates that forward translocation is driven by the high nucleotide concentrations in the cellular milieu, which saturate the incoming-nucleotide site (Fig. 5). Failure to translocate would arrest the RTC and lead to the termination of RNA synthesis unless the translocation impediment is cleared. As we discuss later, one mechanism of action of remdesivir, an RdRp inhibitor used for COVID-19 treatment, can be summarized as hindering RdRp translocation through steric effects of its base in the active site.

Fig. 5: The nucleotide addition cycle.
figure 5

Model of the interactions between non-structural protein 13 thumb (nsp13T) and nsp12 during RNA synthesis. Zoomed-in views of the active site during the nucleotide addition cycle are presented above or below each panel. In the presence of a natural, cognate nucleoside triphosphate (NTP), the incoming NTP forms a phosphodiester bond with the product RNA (p-RNA) (top). Nsp13T is not engaged with the template RNA (t-RNA), and forward translocation of the nsp12 RNA-dependent RNA polymerase (RdRp) on the t-RNA strand is unimpeded. In the event of incorporation of a non-natural or non-cognate nucleotide, the active site is perturbed and activates the helicase nsp13 to induce RdRp backtracking, as shown in the last panel (bottom). nsp132, nsp13T–nsp13F; PPi, pyrophosphate; RTC, replication–transcription complex.

The enigmatic NiRAN domain of coronavirus nsp12

The NiRAN domain (Fig. 4b) attracted considerable interest following revelations that it is an essential enzymatic domain that is absent in RNA viruses outside the order Nidovirales82. Early investigations revealed that the NiRAN domain of the RdRp of the arterivirus equine arteritis virus possesses a self-nucleotidylating activity (NMPylation), which may prime the domain to transfer a nucleoside monophosphate (NMP) to another protein or nucleic acid substrate82. This activity is essential for nidovirus replication, as mutations that catalytically inactivate the NiRAN domains of equine arteritis virus and SARS-CoV were lethal82. Recent studies of the coronavirus NiRAN domain suggest that it might function as an RNA ligase, serve as a guanylyltransferase that catalyses the transfer of guanine monophosphate (GMP) during mRNA capping (Box 2) or transfer an NMP to another viral protein to serve in protein-primed initiation of RNA synthesis82. It is possible that the NiRAN domain performs multiple activities or each of these activities, depending on the substate and context. The possible role of the NiRAN in the capping mechanism is discussed in Box 2, and the RNA ligase activity of NiRAN has not been demonstrated so far.

The NiRAN domain’s possible role in protein-primed RNA synthesis warrants further exploration. Members of another clade of positive-strand RNA viruses, the order Picornavirales, initiate RNA synthesis using the protein primer viral protein genome-linked (VPg). A dinucleotide (UpU) that is covalently linked to a hydroxy group of a tyrosine, serine or threonine residue in VPg serves to prime RNA synthesis on the poly(A)-tailed template90. Consistent with the earlier work on equine arteritis virus82, the NiRAN domains of SARS-CoV, human coronavirus 229E and SARS-CoV-2 display higher specificity in vitro for UTP over GTP (UMPylation activity over GMPylation activity)91,92. Given this UTP specificity, it was posited that NiRAN could facilitate the UMPylation of a priming protein to initiate minus-strand RNA synthesis at the 3′ poly(A) tail of the gRNA template92. Recent in vitro evidence indicates that nsp9, a single-stranded RNA-binding protein, is a substrate for UMPylation at or near its N terminus92. Consistent with this finding, mutagenesis of the N-terminal residues of nsp9 severely affected viral replication. These results, combined with a structure of nsp9 bound to the NiRAN domain in an inhibited state93, implicate nsp9 in the initiation of RNA synthesis. Given observations that other RTC components and purification contaminants (such as proteins from bacterial expression systems) can be NMPylated in vitro91,92, further studies are needed to define the biologically relevant repertoire of substrates targeted by NiRAN’s NMPylation activity.

The expanded replication machinery

Replication and transcription are assumed to be executed by various subcomplexes, which include the holo-RdRp associated with other non-structural protein enzymes and accessory subunits. These viral enzymes are required to promote the fidelity of RNA synthesis, equip viral mRNAs with a 5′ cap structure and orchestrate the template switching needed for sgRNA synthesis. The interplay between these subunits and their interactions with regulatory viral RNA elements coordinate the timely replication and expression of the coronavirus genome, and provide a platform for continuous coronavirus evolution.

The holo-RdRp

To faithfully replicate the coronavirus genome, an arsenal of factors is needed to enhance the processivity of the RTC and repair errors in RNA synthesis4. Early biochemical experiments revealed that the primer-extension processivity of SARS-CoV RdRp (nsp12) is greatly increased in the presence of nsp7 and nsp8, cementing their role as essential subunits of the holo-RdRp complex94. Advancements in cryo-electron microscopy led to the seminal structure of the SARS-CoV apo-holo-RdRp complex83 and more recently to several structures of the SARS-CoV-2 RTC95,96,97,98. In the presence of an RNA duplex, the N termini of two nsp8 subunits form ordered helices that have nonspecific ionic interactions with the RNA backbone, illuminating the importance of nsp8 for enhanced RdRp processivity95 (Fig. 4a,c). In addition to enhancing processivity, nsp8 interacts in vitro with various other non-structural proteins thought to assist the RTC99,100. Thus, nsp8 is posited to be important for forming higher-order RTCs that couple replication and transcription with template unwinding (nsp13), proofreading (nsp10, nsp12, nsp13 and nsp14) and RNA capping (nsp10, nsp13, nsp14 and nsp16).

The exoribonuclease activity of nsp14

Coronaviruses encode a unique proofreading activity that is not found in other RNA viruses, including in nidoviruses with small genomes such as arteriviruses3,22,101. This activity is encoded in the N-terminal exoribonuclease (ExoN) domain of nsp14, which together with nsp10 forms an RNA proofreading complex102,103,104 that is presumed to promote faithful replication of large nidovirus genomes22. In some coronaviruses, such as murine hepatitis virus (MHV) and SARS-CoV, nsp14-ExoN knockout yielded crippled but viable viruses exhibiting a mutator phenotype105,106,107. Remarkably, equivalent ExoN-inactivating substitutions completely abolish MERS-CoV and SARS-CoV-2 replication, suggesting a function for ExoN in primary RNA synthesis108. Because of its role in enhancing the fidelity of genome replication, nsp14 ExoN can promote antiviral drug resistance. For example, in an in vitro assay, its 3′-5′ exoribonuclease activity can efficiently cleave ribavirin, an antiviral nucleoside analogue, from the 3′ end of an RNA substrate, indicating that the enzyme may promote high-level resistance to certain nucleoside analogue antiviral drugs109. Recent cryo-electron microscopy structures have revealed the molecular basis for how the ExoN domain of the nsp10–nsp14 complex interacts with double-stranded RNA containing a 5′ overhang and a one-nucleotide mismatch at the 3′ end110. The mismatched base enters the shallow ExoN active site and interacts with conserved catalytic residues via its 3′-hydroxy and 2′-hydroxy groups. In addition, the double-stranded portion of the RNA interacts with both the nsp10 N terminus and nsp14-ExoN residues outside the catalytic site. These structures provide direct visualization of recognition by ExoN of its preferred mismatched RNA substrate110.

Colliding motors: how the helicase induces RdRp backtracking

Nsp13 is an SF1B-family RNA helicase that is essential for coronavirus replication111,112,113,114,115. Biochemical assays revealed that its 5′-3′ nucleic acid unwinding activity is enhanced twofold in the presence of nsp12 (ref.116). Nsp13 makes stable interactions with the SARS-CoV-2 RTC, which enabled the structural determination of nsp13 bound to the RTC (nsp132–RTC)69,117. Single-particle image classification revealed that the major particle class consists of two molecules of nsp13 bound to the RTC. Both nsp13 molecules (nsp13F (fingers) and nsp13T (thumb)) have extensive interactions with the holo-RdRp, whereas only one copy (nsp13T) is bound to the RNA scaffold at the 5′ end of the template RNA (Fig. 4a,c). Nsp13T, bound to the template strand downstream of the RdRp active site, is positioned to translocate in the opposite direction relative to the RdRp69 (Fig. 5). The opposing directionalities of nsp13 and the RdRp are hypothesized to trigger a translocation conflict. Forward translocation of nsp13T on the template strand was proposed to lead to the reverse threading of the RdRp on the product RNA strand69. The reverse movement of polymerases, relative to their nucleic acid substrate, is well known for all cellular RNA polymerases and is termed ‘backtracking’118 (Fig. 5). The role of the second nsp13, nsp13F, is poorly understood but it has been proposed to regulate the unwinding activity of the substrate-bound nsp13T (ref.117).

A perspective on the role of backtracking in proofreading by the RdRp

Similarly to its role in cellular RNA polymerases, backtracking may be essential for excision of misincorporated nucleotides in coronavirus RNA synthesis69,70,119,120. Indeed, behaviours consistent with backtracking have been observed in single-molecule magnetic tweezer experiments for the SARS-CoV-2 RdRp and RdRps from the Φ6 bacteriophage and poliovirus, illuminating the potentially widespread nature of backtracking in the viral realm119,120,121. Furthermore, recent evidence indicates that the NTP entry channel can accommodate a single-stranded product RNA 3′ overhang, which mirrors the backtracking product70. Molecular dynamics simulations further showed that entry into the backtracking state occurs when a misincorporated RNA base flips from the pre-translocated state towards the mouth of the NTP entry channel70. Subsequently, the engagement of nsp13 with the template RNA would enhance the backtracking activity and offer a means to control entry into a long-lived backtracked state (Fig. 5).

Backtracking may grant nsp14 ExoN access to any misincorporated nucleotide at the 3′ end of nascent product RNAs, thereby coupling proofreading with RNA synthesis69,70. Alternatively, as suggested by the recent cryo-electron microscopy structures of nsp10–nsp14 bound to a double-stranded RNA substrate with a 5′ overhang and a 3′ nucleotide mismatch110, misincorporation events may lead the RdRp to release the mismatch-containing RNA duplex, thereby granting ExoN access for proofreading. Additionally, it is envisaged that backtracking could have a role in discontinuous RNA synthesis by exposing the anti-TRS-B at the 3′ end of the nascent minus-strand sgRNAs and mediating template switching69,70 (Fig. 3b). This hypothesis is supported by a mutation in the arterivirus helicase that does not affect genome replication but abolishes all sgRNA transcription68,122.

Coupling of backtracking with nsp14-ExoN activity could also explain how coronaviruses excise non-natural nucleotides from their nascent product RNA. Genetic loss of function experiments indicated that nsp14 ExoN mitigates the effect of nucleoside and base analogues such as ribavirin and fluorouracil, respectively, in the betacoronavirus MHV102,105,106. These inhibitors are ineffective for treating SARS-CoV, MERS-CoV and SARS-CoV-2 infections, highlighting that the protection conferred by nsp14 ExoN is of clinical concern in the hunt for promising nucleoside analogues106,123. Combination therapy approaches may yield fruitful outcomes given the likely synergy between nsp14-ExoN and nsp12 (RdRp) inhibition. Therefore, it is pertinent to better understand how nsp14 is recruited for excision and repair124. In addition, shedding light on why some nucleoside analogues, such as remdesivir and molnupiravir (discussed in the next section), are effective could provide valuable insight into the design of novel antiviral nucleoside analogues for monotherapy.

A rational design of RdRp inhibitors

Although vaccines against SARS-CoV-2 have shown remarkable efficacy, COVID-19 continues to spread and affect communities globally. The reasons for this are multiple, and include vaccine shortages, public vaccine hesitancy, reduced vaccine effectiveness in immunosuppressed people and the emergence of new virus variants. It is therefore anticipated that SARS-CoV-2 will become endemic125, potentially evolving in the human host and leading to gradual or more sudden reductions of vaccine efficacy. Given such concerns, the search for drugs against SARS-CoV-2 and related viruses remains a priority in the research community. In this section, we discuss the mechanisms of action of two RdRp inhibitors, remdesivir and molnupiravir, that show clinical benefit in treating COVID-19.

Mechanisms of action of remdesivir and molnupiravir

Nucleoside analogues, which can target the RdRp, are common antiviral therapeutics125. Currently remdesivir and molnupiravir are two antiviral drugs used to treat COVID-19 (ref.126). Studies indicate that treatment with remdesivir decreases the duration of the infection in hospitalized patients127. Biochemical evidence demonstrates that the SARS-CoV-2 RdRp preferably incorporates remdesivir (Fig. 6a) rather than its natural analogue adenosine and can incorporate molnupiravir (Fig. 6b) rather than its natural analogue cytidine126,128,129,130,131. Once incorporated, neither inhibitor induces immediate pausing of RNA synthesis, in contrast to classical chain terminators126,129,131 (Fig. 6ce). Initial studies had suggested that remdesivir inhibits RNA synthesis via a delayed chain termination mechanism126,130,132,133. Delayed chain termination occurs when remdesivir impedes RdRp translocation following a steric clash between its nitrile group and nsp12 Ser861, which occurs when remdesivir reaches the fourth position from the 3′ end of the product RNA126,132,133 (Fig. 6f). This steric inhibition is surmounted in vitro in the presence of subphysiological concentrations of NTPs, indicating it is unlikely the major inhibitory hurdle for viral replication in living cells. Instead, recent data suggest that remdesivir may impair replication when incorporated into the template strand following an initial round of viral RNA synthesis134. In the template strand, remdesivir hinders the incorporation of the incoming nucleotide. This mode of activity has been termed ‘template-dependent inhibition’134 (Fig. 6d). Following the eventual incorporation of this incoming nucleotide, a second potential checkpoint was proposed, in which remdesivir would bias the RdRp towards the pre-translocated state, although direct evidence for this is lacking134.

Fig. 6: Mechanisms of inhibition of coronavirus RNA synthesis by remdesivir and molnupiravir.
figure 6

a | Chemical structure of remdesivir triphosphate, the activated form of remdesivir (Veklury). The ribose 1′-nitrile group is highlighted. b | Chemical structure of molnupiravir triphosphate, the activated form of molnupiravir (EIDD-2801/MK-4482). c | Schematic showing initial recognition and incorporation of a nucleotide analogue. ‘+1’ refers to the position of the nucleotide analogue before incorporation, whereas ‘−1’ refers to the position of the nucleotide analogue following catalysis and movement into the post-translocated register. d | Remdesivir exerts its inhibitory effect through ‘delayed chain termination’ and ‘template-dependent inhibition’126,134. Delayed chain termination impedes RNA-dependent RNA polymerase (RdRp) translocation through a steric clash that occurs when remdesivir reaches the fourth position (see part f) from the 3′ end of the product RNA (p-RNA) strand (red strand). Delayed chain termination is alleviated by high nucleoside trisphosphate (NTP) concentrations, leading to the proposal of template-dependent inhibition, which occurs when remdesivir is positioned at the RdRp active site in the template RNA (t-RNA) strand. e | Molnupiravir perturbs replication through ‘lethal mutagenesis’, by enabling the indiscriminate incorporation of either ATP (A) or GTP (G) when it is positioned in the t-RNA strand129,131. f | Structural analysis of the delayed chain termination mechanism. Following incorporation of remdesivir and its translocation to the −3 active site position, the ribose 1′-nitrile group is positioned to collide with the side chain of non-structural protein 12 (nsp12) Ser861 (expected clash shown by inhibitory arrows). Addition of the next nucleotide triggers the steric clash of the Ser861 side chain and the remdesivir nitrile group, forcing the growing RNA chain to populate the ‘pre-translocated’ state133. High NTP concentrations may alleviate the energetic cost imposed by the translocation barrier, by occupying the +1 site and thereby driving the forward movement of the RdRp. The structural models are based on Protein Data Bank entries 7B3B, 7B3C and 7B3D133.

Like remdesivir, molnupiravir is a prodrug that is converted in cells into its triphosphate form, thereby serving as a nucleotide analogue. Molnupiravir inhibits replication through lethal mutagenesis of the genomes of multiple viruses, including SARS-CoV-2 (refs135,136,137,138). Molnupiravir treatment presents a high barrier to resistance in cell culture assays135,136. Importantly, like remdesivir, molnupiravir escapes from the coronavirus nsp14-ExoN proofreading activity136. Unlike remdesivir, molnupiravir is delivered orally, which, combined with its high barrier to resistance and potent antiviral activity, led to its pursuit as an alternative therapeutic for COVID-19 (ref.139). Molnupiravir triphosphate is a cytidine analogue that exerts its effect by indiscriminately serving as a template for the incorporation of either adenine or guanine, thus explaining the observation of the transition mutations G>A and C>U in coronaviruses exposed to molnupiravir129 (Fig. 6e). Two recently resolved cryo-electron microscopy structures of molnupiravir base-paired with adenine or guanine revealed the structural basis of molnupiravir-mediated lethal mutagenesis131.

The RdRp possesses high selectivity for remdesivir and molnupiravir due to their excellent mimicry of natural nucleotides. Therefore, these compounds do not significantly affect the initial round of RNA synthesis after incorporation, a feature which likely reduces their recognition and excision by nsp14 ExoN surveying the fidelity of RTCs.

Overcoming the proofreading barrier in antiviral drug design

Designing nucleoside analogues that escape the proofreading activity of nsp14 ExoN is a trial-by-error endeavour since it is challenging to pinpoint chemical properties that would lead to nucleotide mimicry. A more rational approach could entail targeting the enzymatic activity of ExoN or its interfaces with nsp10 or the RTC. The ExoN activity is essential for SARS-CoV-2 and MERS-CoV replication, but it is not vital for viral propagation across the betacoronavirus clade108. nsp14-ExoN inactivation in MHV and SARS-CoV, although not lethal, enhances the susceptibility of the virus to nucleoside analogues, highlighting the benefits of dual-inhibition strategies in coronaviruses105. One concern is the potential for off-target effects when the ExoN active site is being targeted with a small-molecule inhibitor, due to structural similarities to other cellular DEDD-family exonucleases. Designing inhibitors against the interface of nsp10 with nsp14 ExoN and nsp16 has attracted interest given that viral replication is abrogated in interface mutants140,141.

A long-standing research interest is the characterization of how the proofreading complex, nsp10–nsp14, interacts with the RTC. Pull-down experiments using a series of truncated proteins indicated that nsp12 interacts with nsp14 and its subdomains109. Recent structural work showed that SARS-CoV-2 nsp10–nsp14 can be recruited to the RTC by forming a covalent link with nsp9, which is bound to the nsp12 NiRAN domain124. Prior observations in MHV inspired the rationale for using as nsp9–nsp10 fusion protein in that study, as ablation of the protease cleavage site between nsp9 and nsp10 in MHV maintains a viable phenotype. This nsp9–nsp10 protease cleavage mutant, however, experienced a pronounced overall defect in RNA synthesis, and the propagation of this mutant was severely compromised142. Given the crippled phenotype of the MHV nsp9–nsp10 cleavage-site mutant142, it remains to be shown whether the same interaction with nsp10–nsp14 can occur when nsp9 and nsp10 are separated. The recent structural analysis of the SARS-CoV-2 nsp12–nsp9–nsp10–nsp14 complex did not reveal any features that would suggest that the incorporation of an nsp9–nsp10 fusion protein affects RTC assembly124. Probing the role of nsp10–nsp14 in greater detail will benefit from single-molecule experiments using reconstituted RTC components. This approach could also test whether nsp14 ExoN alleviates backtracking as proposed69,70. Furthermore, both the engagement of proteins with RTCs and the proposed proofreading activity of nsp14 ExoN will have to be demonstrated in vivo.

Future perspectives

Although the surge of new coronavirus research has expanded our understanding of the molecular mechanisms of SARS-CoV-2 replication and gene expression, the foundation of our knowledge of these processes is built primarily on previous research of other coronaviruses and the distantly related arteriviruses. Corroborating our understanding will likely reveal properties and processes shared between viral species, which are of potential value for the design of pan-coronavirus inhibitors. More specifically, it is pertinent to unravel poorly understood intricacies of spatiotemporal regulation of RNA synthesis in coronaviruses. Unknown to us is the complete repertoire of host-cell factors involved in assisting the coronavirus infection cycle and how these factors may, for example, be subverted to assist in the formation of replication organelles or contribute to the formation of RTCs. The spatial segregation of coronavirus replication in virus-induced membranous organelles appears to be a requisite for successful virus propagation — a feature shared among positive-strand RNA viruses, although it remains to be elucidated where in the cell RNA synthesis occurs during the earliest phase of infection.

Downstream of replication organelle formation, key questions include how RNA synthesis is primed on its templates and how regulation of the two major synthesis pathways is orchestrated. Such regulation must be achieved by the concerted action of the replicase proteins and may be further assisted by host factors, whose role in these pathways is relatively unexplored. Regulating RNA synthesis necessitates the faithful maintenance of the encoded genetic information, as unwanted mutations can alter the RNA elements required for processes such as template switching and that lead to the production of nonsense transcripts. Yet to be worked out in detail is how the coronavirus proofreading complex coordinates its activity with the polymerase, leading to the excision of misincorporated RNA nucleotides and nucleotide analogues. Understanding how mutations accumulate during replication and how they are corrected can inform us on the evolution of drug resistance mutations and aid the design of inhibitors that directly target the replicase complex. We hope that such considerations may guide research that will shape our response to future deadly outbreaks of coronaviruses, a consequence of the encroaching footprint of humanity on the natural world.