Structures and functions of coronavirus replication–transcription complexes and their relevance for SARS-CoV-2 drug design

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has killed millions of people and continues to cause massive global upheaval. Coronaviruses are positive-strand RNA viruses with an unusually large genome of ~30 kb. They express an RNA-dependent RNA polymerase and a cohort of other replication enzymes and supporting factors to transcribe and replicate their genomes. The proteins performing these essential processes are prime antiviral drug targets, but drug discovery is hindered by our incomplete understanding of coronavirus RNA synthesis and processing. In infected cells, the RNA-dependent RNA polymerase must coordinate with other viral and host factors to produce both viral mRNAs and new genomes. Recent research aiming to decipher and contextualize the structures, functions and interplay of the subunits of the SARS-CoV-2 replication and transcription complex proteins has burgeoned. In this Review, we discuss recent advancements in our understanding of the molecular basis and complexity of the coronavirus RNA-synthesizing machinery. Specifically, we outline the mechanisms and regulation of RNA translation, replication and transcription. We also discuss the composition of the replication and transcription complexes and their suitability as targets for antiviral therapy.

The devastation caused by the COVID-19 pandemic has led to an extraordinary expansion of research focused on the causative agent, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This virus has been classified within the species Severe acute respiratory syndrome-related coronavirus, which belongs to the genus Betacoronavirus of the family Coronaviridae 1 . The coronavirus family belongs to the order Nidovirales, which includes a rapidly expanding and diverse group of enveloped viruses with a single-stranded RNA genome of positive polarity. Different nidoviruses use similar strategies to organize, express and replicate their genomes. They constitute a monophyletic virus cluster that is characterized by the universal conservation of seven domains in their large replicase gene, which encodes the functions required for viral RNA replication and transcription in the infected cell 2 . Much of the foundational research on this replicase addressed viruses in related nidovirus families such as the Arteriviridae and, more relevantly, other members of the family Coronaviridae. The features unique and common to viruses within and outside the order Nidovirales have been described in several comprehensive reviews [3][4][5] . Using the first available genome sequences, early studies unravelled the genome organization and expression strategy used by coronaviruses and other nidoviruses. These studies were followed by pioneering bioinformatics, biochemical and genetic studies that established or confirmed the essential functions of many of the replicase proteins, thereby laying a road map for the currently ongoing SARS-CoV-2 research.
Betacoronaviruses have caused previous deadly epidemics, including the 2003 SARS outbreak and the ongoing Middle East respiratory syndrome (MERS) epidemic, which was first detected in 2012 (Ref. 6 ). These zoonotic events inspired earlier efforts to develop coronavirus-specific antiviral drugs [6][7][8][9] . The onset of the COVID-19 pandemic spurred scientists around the globe to apply their respective expertise to address how SARS-CoV-2 infects humans, avoids or delays host immune responses, copies its genome and expresses its proteins to make new virions. As successful replication directly depends on efficient synthesis of viral RNA, the replicase proteins responsible for this process are obvious antiviral drug targets.
Coronaviruses use an unusually large collection of RNA-synthesizing and RNA-processing enzymes to express and replicate a genome that is two to three times larger than that of most other RNA viruses. The central enzyme of transcription and replication is the RNA-dependent RNA polymerase (RdRp), which synthesizes all viral RNA and is a proven target for antiviral drugs. Successful replication and transcription also entails the use of specific RNA recognition signals to initiate RNA synthesis, sustaining RdRp processivity and fidelity, viral mRNA capping to ensure translation by host ribosomes, and the spatial and temporal regulation of the viral cycle within the infected cell. The non-structural proteins that assist the RdRp to perform these functions (TAble 1) constitute additional targets for antiviral drug development. Like the RdRp, some of these non-structural proteins are conserved across most RNA viruses, whereas others are unique to coronaviruses [3][4][5] .
In this Review, we discuss recent advances in deciphering the molecular mechanisms of coronavirus gene expression and RNA replication, with special focus on SARS-CoV-2. We aim to contextualize the more recent studies with the foundational work, to provide a coherent view of our current understanding of the successive steps in SARS-CoV-2 replication and transcription. We focus on the processes required for viral gene expression and replication: translation, replication organelle formation, and production and capping of genomic RNA (gRNA) and subgenomic RNAs (sgRNAs). We also describe the known and proposed functions of the viral nucleic acid-metabolizing proteins, as revealed by new biochemical, structural and virological studies. We then discuss two well-studied examples of antiviral nucleoside analogues that target the RdRp; owing to space constraints, we do not address therapeutics designed to target viral proteins other than the RdRp. Likewise, we do not extensively cover the mechanisms of viral protein synthesis, or the (proposed) involvement of a substantial number of host factors in coronavirus replication and transcription. We conclude the Review by describing the remaining gaps in our knowledge to help guide new research.

The SARS-CoV-2 infection cycle
To infect a cell, coronaviruses use multiple host factors, whose expression patterns therefore co-determine viral tropism. Delivery into the cell and translation of the large RNA genome launches a cytoplasmic replication cycle that integrates a remarkable variety of strategies to fine-tune viral gene expression on both the translational level and the transcriptional level. The successive steps that ultimately lead to the release of viral progeny are coordinated temporally and spatially, and rely extensively on the infrastructure and metabolism of the host cell. In this section, we outline the key steps of the SARS-CoV-2 infection cycle.
Entry into the host cell. SARS-CoV-2 entry into the cell depends on several host attachment and entry factors, chiefly among them on the receptor angiotensinconverting enzyme 2 (ACE2), to which the SARS-CoV-2 spike (S) glycoprotein binds (reviewed in Refs 10,11 ) ( fig. 1). The fusogenic spike protein consists of two parts, S1 and S2, which mediate attachment to and fusion with the cell membrane, respectively. Cellular proteases such as transmembrane serine protease 2 cleave the spike protein, a step required to prime its membrane fusion activity 12,13 .
Following membrane fusion, the SARS-CoV-2 gRNA is released into the cytosol. The genome is a 5′-capped, single-stranded RNA of 29,870 bases with a 3′ poly(A) tail of variable length 14,15 . It encodes at least 13 recognized open reading frames (ORFs), organized largely linearly from the 5′ end to the 3′ end ( fig. 2a). The coding part of the gRNA is flanked by a 5′ untranslated region (UTR) and a 3′ UTR of 265 and 337 nucleotides (excluding the poly(A) tail), respectively. The gRNA possesses a number of regulatory sequences and higher-order RNA structures (discussed later) that are involved in its translation, replication and transcription [16][17][18][19] .

Replication and transcription
Process of amplification of the viral genome through a full-length minus-strand intermediate serving as template and synthesis of subgenomic mRNAs through a set of subgenomic minus-strand templates.  To enter a host cell, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike protein interacts with the cellular surface protein angiotensin-converting enzyme 2 (ACE2), while being cleaved by cellular proteases such as transmembrane serine protease 2 (TMPRSS2) to activate its membrane-fusion capacity. The genomic RNA (gRNA), which is capped on its 5′ end (red circle) and polyadenylated ((A) n ) on its 3′ end, is released from the viral particle and -after recruiting host-cell ribosomes -translated into two replicase poly proteins, pp1a and pp1ab. Proteases embedded in viral nonstructural protein 3 (nsp3) and nsp5 cleave pp1a and pp1ab into 16 non-structural proteins that assemble into replication-transcription complexes (RTCs proteins 20 . Sixteen mature non-structural proteins are released from pp1a and pp1ab following 15 proteolytic cleavages performed by the virus-encoded papainlike protease (PL pro ) in non-structural protein 3 (nsp3) and chymotrypsin-like or main protease (M pro ) in nsp5 (TAble 1). In this manner, pp1a yields nsp1 to nsp11, whereas pp1ab is cleaved into nsp1 to nsp10 and nsp12 to nsp16 (Refs 21,22 ) ( fig. 2a).
The rapidly released nsp1 mediates the shutdown of the translation of host mRNAs [23][24][25] , while the other non-structural proteins form protein complexes, yet to be definitely determined, that engage in viral RNA synthesis and are referred to as the replication-transcription complexes (RTCs). Replication and transcription are driven primarily by the enzymes contained in nsp12, nsp13, nsp14 and nsp16 (TAble 1). Nsp12, the subunit   26,27 . The formation of replication organelles typically precedes the exponential phase of viral RNA synthesis and is discussed in box 1.

RNA synthesis and virion assembly.
Using the gRNA template, RNA synthesis by the RTC starts with producing both a full-length genome complement (the antigenome) and a set of minus-strand sgRNAs, which are derived from the gRNA region downstream of ORF1a and ORF1b (the replicase gene). Whereas the antigenome serves as a template to produce new gRNA, the minus-strand sgRNAs direct the synthesis of a nested set of subgenomic mRNAs (sg-mRNAs) (discussed later).
Although transcription is defined principally as the synthesis of RNA from a DNA template, in this Review we use the term to describe the synthesis of sg-mRNAs from RNA templates, to conform with the terminology used in the coronavirus literature. The sg-mRNAs are crucial for the production of the four coronavirus structural proteins, which are required for virion assembly and egress. New virions were recently reported to leave the cell via lysosomal trafficking rather than the biosynthetic secretory pathway used by many other enveloped viruses 28 . A number of the sg-mRNAs are used to express so-called accessory proteins, many of which have been implicated in modulating cellular innate immune responses 20,29,30 (fig. 1).

Regulation of SARS-CoV-2 translation
Expression of the SARS-CoV-2 proteins in infected cells depends primarily on the translation of gRNA and the eight 'canonical' sg-mRNAs 20,29,30 . The viral genes fall into three groups ( fig. 2a): replicase ORF1a and ORF1b, which are translated from gRNA with ORF1b expression depending on −1 PRF; ORFs encoding the four 'universal' coronavirus structural proteins (the spike, membrane, envelope and nucleocapsid proteins) ( fig. 1), which are translated from sg-mRNAs; and ORFs encoding accessory proteins, which are translated from the remaining sg-mRNAs, but differ widely between various coronavirus lineages 31 ( fig. 2a). In addition, small (putative) ORFs that overlap with several of the ORFs outlined above were identified by theoretical and experimental approaches. These small ORFs were also the subject of nomenclature confusion 32 , and their expression and biological relevance continue to be investigated [32][33][34][35] . In this light, we have included only ORF3c and ORF9b of the small ORFs in the current map of the SARS-CoV-2 genome ( fig. 2a).
Like many RNA viruses, coronaviruses use noncanonical translation mechanisms to expand their coding capacity and fine-tune the expression levels of particular viral proteins 36 . Specifically, PRF (for ORF1b) and 'leaky ribosomal scanning' (for some ORFs in sg-mRNAs) co-regulate SARS-CoV-2 genome expression. In leaky ribosomal scanning, ribosomes load onto the 5′ end of the viral sg-mRNAs, but initiate translation from a more downstream, internal start codon. The use of leaky scanning has been suggested or demonstrated for SARS-CoV-2 ORF3c 33,34 , and for ORF7b 20,22,37 and ORF9b 38 of both SARS-CoV and SARS-CoV-2. In these cases, leaky scanning and expression of the more downstream ORF appear to be promoted by the suboptimal nature of upstream translation initiation signals.
Expression of ORF1b from gRNA depends on −1 PRF occurring just upstream of the ORF1a stop codon 39 , a highly conserved feature among coronaviruses and other nidoviruses. Termination of SARS-CoV-2 ORF1a translation yields the 4,405-residue-long pp1a, and −1 PRF results in extension to yield the 7,096-residue-long pp1ab. The ORF1a-ORF1b PRF mechanism directs the expression of the key RNA metabolism enzymes of the RTC and regulates the relative expression levels of the proteins encoded by ORF1a and ORF1b. Frameshifting occurs on a specific 'slippery sequence' (5′-U UUA AAC-3′ (nucleotides 13,462-13,468 in GenBank genome entry MN908947.3; all genome reference numbers in this Review refer to this sequence)), followed by GGG in the case of SARS-CoV-2) -ribosomes translating the UUA and AAC codons of ORF1a can shift one nucleotide backwards, and translation then continues with a CGG codon in ORF1b 40,41 (fig. 2b,c). The importance of maintaining the optimal ratio between ORF1a expression and ORF1b expression was demonstrated experimentally with use of SARS-CoV mutants with altered PRF levels, which were found to be dramatically crippled 41 .
Regulation of PRF efficiency is achieved by the formation of several RNA structures and through interactions of the nascent protein chain with the ribosome.
The key PRF-stimulating element is a three-stemmed RNA pseudoknot structure located downstream of the slippery sequence [39][40][41][42] (fig. 2b). This element interacts with the ribosome at the entry of the mRNA channel of the 40S ribosomal subunit and induces translational pausing before −1 PRF; complete unfolding of this tertiary RNA structure is slow and thought to promote ribosomal frameshifting on the viral mRNA 40 . Open reading frames (ORFs) are not drawn exactly to scale. The SARS-CoV-2 genomic RNA (gRNA) has a 5′ cap (red circle) followed by a leader sequence (red line) that is shared with all subgenomic mRNAs (sg-mRNAs), and a 3′ poly(A) tail. The 5′-proximal three quarters of the genome encode the replicase polyproteins pp1a and pp1ab, which are cleaved to yield 16 non-structural proteins (nsp1-nsp16; blue from ORF1a and red from ORF1b). The 3′-proximal one quarter of the genome encodes the structural (brown) and accessory (azure) proteins. Structural and accessory proteins are expressed from a nested set of sg-mRNAs, with ORF3c, ORF7b and ORF9b being expressed via ribosomal 'leaky scanning'. b | RNA motif and structures that promote a frameshift from ORF1a to ORF1b, thereby controlling the synthesis of pp1ab. The key programmed ribosomal frameshifting (PRF)-stimulating RNA structure -a pseudoknot -interacts with the ribosome and induces its pausing, which generates tension in the gRNA template. As a result, ribosomes can slip one nucleotide backwards on the 'slippery sequence' (−1 PRF). An attenuating RNA loop located upstream of the slippery sequence also contributes to modulating PRF frequency. c | Model of −1 PRF at the ORF1a-ORF1b junction, showing the regulatory RNA elements inducing a simultaneous −1 shift of the tRNAs bound to the A and P sites of the ribosome, which can then translate ORF1b. The one-letter code for amino acids (circles) is used. A stop sign represents the ORF1a stop codon. E, envelope protein; M, membrane protein; N, nucleocapsid protein; S, spike protein.

Nucleocapsid
Complex of genomic RNA and nucleocapsid proteins that forms the core of a coronavirus particle.

Pseudoknot
A structural motif that arises as a result of base paring between the loop of an RNA hairpin and a complementary single-stranded (unpaired) region within RNA.
The position of the ORF1a stop codon, five codons downstream of the frameshift site, may also regulate PRF levels by allowing the pseudoknot to refold, thus preventing trailing ribosomes from continuing along the viral RNA that was unfolded by the leading ribosome 40 . PRF frequency is further modulated by a translation-attenuating RNA loop upstream of the slippery sequence 40 , which may either directly inhibit frameshifting 42 or force elongating ribosomes to dissociate before reaching the PRF site 41 . Finally, interactions between specific residues in the nascent viral polyproteins and the ribosome exit tunnel are thought to co-determine PRF efficiency 40 .

RNA replication and transcription
As described earlier, ORF1a and ORF1b are translated into the pp1a and pp1ab precursors that give rise to 16 non-structural proteins [1][2][3][4] . Fourteen of these replicase subunits have been ascribed some type of function in

Box 1 | Coronavirus replication organelles
Coronavirus infection induces the extensive remodelling of endoplasmic reticulum (er) membranes 26,27,[163][164][165] . This yields a complex vesiculotubular network of convoluted membranes (Cms) and unusual double-membrane vesicles (Dmvs), in part interconnected through their outer membranes. Collectively these structures are referred to as 'replication organelles', which presumably help to organize viral rna synthesis in time and space. moreover, replication organelles may hinder detection of the negative-strand rna and double-stranded rna intermediates of viral rna synthesis by innate immunity receptors. The formation of doublemembrane structures can be induced by co-expression of coronavirus transmembrane non-structural protein 3 (nsp3), nsp4 and nsp6 (Refs 148,149 ). Coronavirus replication organelles can be visualized by (cryo) electron tomography, as shown for middle east respiratory syndrome coronavirusinfected cells in the figure, part a (tomogram slice) and part b (surfacerendered model) 27 . These images show large numbers of Dmvs, commonly 200-300 nm in diameter; double-membrane spherules (DmS; arrowheads) of unknown function and having a diameter of ~80 nm; Cms; and er. metabolic labelling (with 3 H-uridine) of newly synthesized coronavirus rna in combination with autoradiography and electron microscopy identified Dmvs as the main platform for coronavirus rna synthesis 27 . This notion was further supported by the abundant immunolabeling of Dmvs for double-stranded rna 164 and the presence of 'fibrillar material' in their interior 150,164,166 . However, for many years, coronavirus Dmvs were observed as fully sealed compartments 27,164,165 , lacking a connection to the cytosol that could be used to export genomic rna and subgenomic mrnas, thus maintaining controversy about the exact site of coronavirus rna synthesis.
recently, cryotomography of coronavirus-infected cells was used to study replication organelles as closely as possible to their native state 150,166 . This approach revealed that murine hepatitis virus Dmvs contain membrane-spanning structures (see the figure, part c). Subsequent subtomogram averaging revealed a hexameric, crown-shaped pore complex (total mass ~3 mDa) surrounding a central channel that would be wide enough to allow rna transport 150 (see the figure, part d). use of a recombinant murine hepatitis virus expressing GFP-tagged nsp3 revealed that this protein, which is the largest replicase subunit (~2,000 residues), is a major component of the Dmv pore complex 150 . on the cytosolic side of the pore, protein assemblies strongly resembling coronavirus nucleocapsid structures were visualized (see the figure, part e), suggesting that newly made rna is encapsidated following its export from Dmvs 150 (see the figure, part f). The localization of the coronavirus replication-transcription complex within Dmvs and its potential association with the Dmv pore complex remain to be investigated in more detail. Size bars are 250 nm (part a) and 50 nm (parts c and e). Parts a and b adapted from Ref. 27   coronavirus replication, several with manifold activities, as listed in TAble 1. These non-structural proteins either have been directly implicated in nucleic acid metabolism or enable or promote the activity of the catalytic non-structural proteins, to stimulate RNA synthesis and processing or participate in the formation of replication organelles. In this Review, we discuss in detail the molecular and biochemical features of these proteins, focusing mostly on nsp7, nsp8, nsp9, nsp10, nsp11, nsp12, nsp13, nsp14 and nsp16, but do not delve into the details of the role of nsp15, which includes a unique uridylate-specific endoribonuclease 43 that is conserved in most vertebrate nidoviruses. Nsp15 has been implicated in innate immunity evasion 44,45 , possibly by shortening the poly(U) stretches that are present at the 5′ end of viral minus-strand RNAs 45 . In coronaviruses, uridylate-specific endoribonuclease activity is required for efficient replication, but that requirement can be bypassed in host cells with depressed type I interferon sensing or production 44,45 . In the following subsections we summarize our current understanding of the mechanisms of coronavirus gRNA and sg-mRNA synthesis, and the involvement of specific viral proteins in controlling these processes. We note that some non-structural proteins have other functions, which we do not mention because they are outside the scope of this Review. For a more in-depth summary of the literature on the functions of each non-structural protein, see

Continuous and discontinuous RNA synthesis in replication and gene expression.
Following gRNA translation and proteolytic maturation of the replicase polyproteins, a relatively complex programme of SARS-CoV-2 RNA synthesis and gene expression is initiated, which depends on the interplay between viral RNA and non-structural proteins on the one hand ( fig. 3), and host-cell proteins and membranes on the other hand (box 1). A variety of RNA sequences and structural elements in the terminal regions of the coronavirus genome have been implicated in the specific recognition of RNA templates by the coronavirus RTC [16][17][18] (fig. 3a). Long-range RNA-RNA interactions may be important for replication and transcription, although in many cases direct experimental support for their biological relevance remains to be obtained. As outlined earlier, SARS-CoV-2 RNA synthesis can be divided into genome replication and sg-mRNA transcription ( fig. 3b). Replication yields full-length viral plus-strand gRNA, which can be translated into additional replicase polyproteins, serve as a template for additional minus-strand RNA synthesis or be packaged into progeny virions. Transcription produces the nested set of sg-mRNAs used to express the structural and accessory proteins. Replication and transcription both require dedicated minus-strand RNA templates; full-length minus strands serve as a template for gRNA replication, whereas a nested set of minus-strand sgRNAs serve as templates for transcription, as first proposed about a quarter of a century ago 46 . The sg-mRNAs have the same 3′-terminal sequence, and carry a common 5′ leader sequence that is identical to the 5′-terminal 75 nucleotides of the gRNA. The leader derives from a discontinuous step (that is, from template switching 47 during minus-strand sgRNA synthesis), which occurs when the RTC stalls at the 3′-proximal quarter of the gRNA template ( fig. 3b). This interruption mediates RTC detachment and relocation to a position near the 5′ end of the gRNA template (discussed later), where minus-strand synthesis resumes. This process yields a set of nested minus-strand sgRNAs with common 5′-terminal and 3′-terminal sequences, which serve as templates for the synthesis of a complementary set of sg-mRNAs 47,48 .
The presence of the common 5′ leader sequence in coronavirus mRNAs may offer several advantages. Its complement (the anti-leader sequence) offers a conserved starting point for plus-strand RNA synthesis, which can be used to initiate the synthesis of both gRNA and all sg-mRNAs. Although not studied in detail thus far, the common 5′ leader sequence may also serve as a recognition signal for the viral mRNA capping machinery (box 2). Furthermore, the nsp1 proteins of SARS-CoV, MERS-CoV and SARS-CoV-2 all mediate a translation shut-off in the infected cell 49 , which is based on their ability to block the ribosomal mRNA entry channel 24,25 and induce endonucleolytic cleavage of host mRNAs 23 . The common 5′ leader sequence present in all coronavirus mRNAs allows escape from translation shut-off 50 , by yet unknown mechanisms, resulting in simultaneous viral mRNA translation and impairment of host-cell gene expression, including of genes mediating the early responses to virus infection.
Regulation of template switching. The template switching required to extend the 'body' of the nascent minusstrand sgRNA with the anti-leader is primarily guided by the body transcription regulatory sequence (TRS-B) elements. These short sequences are found just upstream of the ORFs that encode structural and accessory proteins (except for those expressed through leaky scanning). After copying of a TRS-B sequence, minus-strand RNA synthesis stalls and the 3′ end of the nascent RNA strand is translocated to reinitiate RNA synthesis at the leader TRS (TRS-L) near the 5′ end of the gRNA template. This step is strongly facilitated by a base pairing interaction between the TRS-B complement at the 3′ end of the nascent minus strand (anti-TRS-B) and the TRS-L sequence in the gRNA template, as demonstrated previously in related viruses by site-directed mutagenesis studies [51][52][53][54] ( fig. 3b). Coronavirus TRSs comprise a conserved core sequence (5′-ACGAAC-3′ in the case of SARS-CoV and SARS-CoV-2) that is flanked by sequences of variable length that may also contribute to the base pairing interaction with the TRS-L region 20,29,30 . In addition to the strength of the RNA duplex that is formed with the TRS-L region, other factors may co-determine the relative activity of a TRS-B element, and consequently the level at which the corresponding sg-mRNA is produced. These factors include the relative position of a TRS-B with respect to the 3′ end of the gRNA template, flanking RNA sequences 51 and the local or overall RNA structure of the gRNA template 16 . Genome cyclization driven by long-distance RNA-RNA interactions ( fig. 3a), was recently proposed to expose the TRS-L for base pairing naTure revIewS | MOleCulAR Cell BiOlOgy during discontinuous minus-strand synthesis 16 , similarly to what was previously postulated for arteriviruses 55 .
The series of 'stop-or-go decisions' at the consecutive TRS-B elements encountered by the minus strand-transcribing RTC is thought to fine-tune the relative abundances of the various sg-mRNAs, which remain largely similar throughout the course of infection 56 . The TRS-L is effectively 'merged' with one of the TRS-B elements in each of the sg mRNAs, thus positioning the ORF downstream of that TRS-B at the 5′-proximal position in the sg-mRNA and allowing it to be accessed by host ribosomes. Thus, coronavirus sg-mRNAs are nested and, except for the smallest species, polycistronic. However, they are presumed to be functionally monocistronic, with translation being restricted to the ORF most proximal to the 5′ end of the RNA, except in the case of sg-mRNAs on which leaky ribosomal scanning occurs to access a second ORF.  The canonical 5′-ACGAAC-3′ TRS core sequence occurs only nine times in the SARS-CoV-2 genome (in TRS-L and eight TRS-Bs), coordinating the production of eight sg-mRNA species (RNAs 2-9) 20,29,30,57 ( fig. 2a). The smallest of these (mRNA 9) encodes the nucleocapsid protein and is by far the most abundant transcript 30,56 . For most sg-mRNAs, transcript abundance strongly correlates with ribosome footprint densities, indicating that they are translated with similar efficiencies, in line with the fact that their 5′ UTRs starting with the 75-nucleotide common leader sequence are largely identical (fig. 2a). The detection of a separate sg-mRNA to express ORF7b was reported, derived from template switching at a TRS-B-like sequence (5′-AAGAAC-3′) located just upstream of ORF7b. Although the effective contribution of this TRS-B-like sequence to ORF7b expression may be limited 20 , this exemplifies how the (low-frequency) use of TRS-B-resembling sequences may yield additional subgenomic transcripts.

SARS-CoV-2 in-depth transcriptomics.
Recently, the use of different highly sensitive techniques to study the SARS-CoV-2 transcriptome has identified numerous 'non-canonical' subgenomic transcripts 20,29,30,57 . These derive from TRS-L-dependent transcription, with the TRS-L, for example, being fused to downstream TRS-B-like sequences located in the middle of known ORFs; from large or local deletions generated without the apparent involvement of TRSs; or from the generation of (possibly) defective RNAs that may interfere with replication of the full-length genome by competing for the viral RdRp and other crucial replication factors, as described in several other coronaviruses 58 . These RNA species may in part derive from RNA recombination, which occurs at high frequency in coronaviruses [59][60][61] . The most accepted model for recombination in RNA viruses, similarity-assisted copy-choice RNA recombination, bears strong resemblance to the mechanism of coronavirus discontinuous minus-strand sgRNA synthesis 47 . Recombination involving host RNAs has also been invoked to explain gene acquisition during the evolution of coronaviruses and other nidoviruses 3 . It was hypothesized that TRS-B elements serve as recombination hotspots 60,61 , and that RNA secondary structures promote template switching in a TRS-independent manner 61 . Together, in-depth transcriptomics and ribosomeprofiling experiments have revealed a complex landscape of SARS-CoV-2 RNAs and (potential) proteins, which extends well beyond the 'canonical' gene expression programme based on translation of the gRNA and canonical sg-mRNA 20,29,30,57 . Similar observations were made in other coronaviruses 62,63 . The additional transcripts may serve to express previously unknown small ORFs, truncated proteins or fused (partial) gene products, but their potential roles in SARS-CoV-2 replication and pathogenesis remain to be thoroughly investigated.
Balancing replication and transcription. It is unknown whether the composition of RTCs engaging in synthesis of minus-strand gRNA versus minus-strand sgRNAs is identical. Interactions with specific protein factors may govern the balance between replication and transcription. Two examples of transcription-specific protein functions have been documented in arteriviruses, which are distant coronavirus relatives in the order Nidovirales that also use discontinuous RNA synthesis to generate a nested set of sg-mRNAs 64 . The N-terminal subunit of the arterivirus replicase, nsp1, controls the accumulation of gRNA and sg-mRNAs by determining the levels at which their respective minus-strand templates are produced 65 . Specifically, mutagenesis of the N-terminal zinc-finger domain of nsp1 fully abrogated sg-mRNA synthesis, whereas gRNA production by such mutants increased 2.5-3-fold 66 . A serendipitous mutation just downstream of the zinc-binding domain of the helicase subunit also decreased arterivirus transcription and increased replication 67,68 , a finding that may be relevant to a recent hypothesis 69,70 postulating that helicase-induced RTC backtracking contributes to the interruption of minusstrand RNA synthesis and/or to template switching (discussed later).

The role of the nucleocapsid protein in RNA synthesis.
Although the primary role of the coronavirus nucleocapsid protein is gRNA encapsidation, it has also been implicated in a variety of other functions and interactions, including in regulating or modulating viral replication and transcription, although the interpretation of the supporting evidence is often complicated by the generally strong affinity of the nucleocapsid protein for RNA [71][72][73] . Both nonspecific RNA binding and binding to specific RNA sequences, including the TRS, have been reported, but often based on in vitro assays using purified nucleocapsid protein (reviewed in Ref. 71 ). A human coronavirus 229E RNA replicon lacking the nucleocapsid-encoding gene (and all other structural protein genes) retained the capability to replicate itself and synthesize sg-mRNAs 74 . This finding, and the fact that nucleocapsid-protein expression promotes viral replication 75,76 , suggests that the nucleocapsid protein has a modulatory rather than an essential role in coronavirus RNA synthesis. In line with this notion, the launching of Fig. 3 | SARS-CoV-2 RNA replication and transcription. a | The genomic RNA (gRNA) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has cis-acting structures at its 5′ untranslated region (UTR) and 3′ UTR, and engages in long-distance intramolecular RNA-RNA interactions; these structures and interactions are thought to be involved in the regulation of replication and transcription. The 5′ UTR contains five conserved stemloop (SL) structures, and the 3′ UTR contains a bulged stem-loop (BSL), a (predicted) pseudoknot (PK) and a stem-loop structure with a hypervariable region (HVR). SARS-CoV-2 gRNA cyclization results in complete opening of SL3, where the leader transcription regulatory sequence (TRS-L) resides. b | The gRNA serves as a template for gRNA replication (step 1) and for subgenomic mRNA (sg-mRNA) transcription (step 2); each process requires dedicated minus-strand templates: the anti-genome and a set of minus-strand sgRNAs, respectively. Synthesis of the latter involves a discontinuous step in which the replication-transcription complex (RTC) pauses RNA synthesis after copying one of the body transcription regulatory sequences (TRS-B), and detaches from the template. Subsequently, the RTC relocates to a position near the 5′ end of the gRNA template, where the complement of the TRS-B (anti-TRS-B) in the nascent minus-strand sgRNA engages in base pairing with the TRS-L. This template switch leads to the addition of the complement of the gRNA leader sequence (anti-leader) to the 3′ end of each of minus-strand sgRNA, which are used as templates for sg-mRNA production, thereby ensuring that all coronavirus sg-mRNAs include a 5′-terminal leader sequence of ~75 nucleotides that is identical to the 5′-terminal sequence of the gRNA. Positions of the TRSs and anti-TRSs are schematic and not drawn exactly to scale. Part b adapted from Ref. 161 , CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/).

Similarity-assisted copy-choice RNA recombination
A process of template switching during replication of an RNA virus genome that is guided by local sequence complementarity between the nascent strand and an alternative template, resulting in progeny genomes of mixed ancestry.

RNA replicon
A self-replicating RNA molecule, derived from a viral genome, that contains the replicase gene but lacks at least one essential structural protein gene and thus is unable to produce infectious progeny. ◀ naTure revIewS | MOleCulAR Cell BiOlOgy coronavirus replication from in vitro-generated gRNA can be enhanced by co-expression of the nucleocapsid protein 71 . The nucleocapsid proteins of different coronaviruses interact with numerous other proteins, including with coronavirus replicase subunits such as nsp3 (Ref. 77 ) and host cell factors such as the RNA helicase DDX1 (Ref. 78 ). In the latter case, a complex formed between DDX1 and phosphorylated nucleocapsid protein was proposed to control the balance between replication and transcription by modulating the level of template switching at the successive TRS-B elements encountered by the RTC. The SARS-CoV-2 nucleocapsid protein was also shown to promote the cooperative association of the nsp7-nsp8-nsp12 complex with poly(U) RNA in vitro, thereby possibly facilitating initiation and/or elongation of viral RNA synthesis 79 .

The replication-transcription complex
The nsp7-nsp8-nsp12 holo-RdRp is the central component of the coronavirus RTC, and investigating the molecular basis of its RNA-synthesizing activity will facilitate rational drug design. Akin to other polynucleotide polymerases, the RdRp catalyses the incorporation of ribose nucleoside triphosphates (NTPs) into a nascent 'product' RNA using the information provided by the template RNA. However, maintaining the integrity of

Box 2 | Capping of the SARS-CoV-2 RNA
rna capping is required for efficient translation of eukaryotic mrnas and is also thought to protect the viral rna from host immune response and nucleases 167,168 . However, although synthesis of severe acute respiratory syndrome coronavirus 2 (SarS-Cov-2) rna occurs in the cytoplasm of the infected cell, endogenous capping enzymes are mostly sequestered in the nucleus 167 . To address this complication, the coronavirus replicase acquired enzymatic domains to perform the four activities required for mrna capping in the cytoplasm. Following transcription, the postulated sequence of capping steps, verified in vitro 93 , is as follows: the γ-phosphate of the 5′-triphosphate of the rna is hydrolysed and released (see the figure). This rna triphosphatase (TPase) activity is mediated by the helicase non-structural protein 13 (nsp13) using the same aTPase site as required for its translocation and unwinding activity 115,169 . The identity of the capping enzyme, defined as the protein that transfers the cap -guanosine monophosphate (GmP) -to the rna eluded researchers for several decades, until it was discovered that the nidovirus rna-dependent rna polymerase (rdrp)-associated nucleotidyltransferase (niran) domain of the rdrp of the arterivirus equine arteritis virus, a distant coronavirus relative, possesses nucleotidyltransferase activity and can be covalently modified with GmP and uDP upon the addition of GTP and uTP, respectively 82 . This finding led to the proposal that the niran domain is a guanylyltransferase (GmPylase) and transfers the covalently linked GmP to the now exposed 5′ β-phosphate of rna. The GmPylase activity of the SarS-Cov-2 nsp12 niran towards an rna substrate was recently verified biochemically 93 . Following the capping reaction, n7 of the guanine base is methylated (CH 3 ). The bifunctional nsp14 performs the N 7 -methyltransferase (N 7 -mTase) activity using S-adenosylmethionine (Sam) as the methyl donor and releasing the reaction by-product S-adenosylhomocysteine (SaH) 158,159 . The resulting intermediate cap structure is called 'cap 0' or ' m7 Gpppn-rna', and is presumed to protect viral rnas from 5′-3′ exonucleases and to secure ribosomal recognition for translation 167,168 . nsp10, which binds and stimulates the exonuclease activity of nsp14, does not affect nsp14's mTase activity 158 . The crystal structures of the SarS-Cov nsp10-nsp14 complex revealed that nsp10 is bound to the exonuclease domain but not to the mTase domain of nsp14 and that the two domains are linked by a flexible hinge 109,157 . notably, the structures revealed that the active site represents a new fold for mTases. This finding is helpful in developing inhibitors specifically targeting the nsp14 mTase activity without collateral inhibition of host methylases. The crystal structure of nsp10-nsp14 revealed the basis of the mTase reaction by capturing the Sam donor and a capped base (Gpppn) acceptor in the active site 157 , potentially facilitating drug development. The last step of the capping pathway is the methylation of the ribose 2′-O of the first nucleotide of the mrna. This 'cap 1' product or m7 Gpppn1 m is formed by a separate mTase, nsp16. The stimulatory subunit, nsp10, is crucial for the activity of this 2′-O-methyltransferase (2′-O-mTase), and, as for the nsp14 n7-mTase, Sam serves as the methyl donor for this final step 158,160 . nsp10-nsp16 will only methylate m7 Gpppa-rna substrates 158 . 2′-O-methylation is crucial for the escape of the virus from sensing by innate immunity receptors that can activate the type I interferon signalling pathway, and allows the virus to 'camouflage' its mrna as host mrnas 170 . The niran nucleotidyltransferase activity 82 and nsp14 mTase activity 159 are essential for viral replication. Therefore, the enzymes in the capping pathway are attractive targets for antiviral drugs (reviewed in Ref. 168 ). It is currently unclear how each protein coordinates its individual activity towards the capping pathway or whether a larger capping complex exists to facilitate this process. the template poses significant challenges to coronaviruses as their genomes are approximately 30 kb long, which is a burden considering the so-called error threshold that demarcates the genome size above which the long-term survival of an RNA virus species would be tenuous 80,81 . Thus, to preserve the integrity of their genetic information, coronaviruses have evolved mechanisms to mitigate the impact of nucleotide misincorporations during RNA synthesis 3 . This section highlights recent biochemical and structural studies of how coronaviruses orchestrate their replication and transcription, with the added aim of enhancing our understanding of the druggable SARS-CoV-2 proteome.
Molecular mechanism of RNA synthesis. Structural and bioinformatics work classified the coronavirus polymerase subunit (nsp12) into three domains: an N-terminal nidovirus RdRp-associated nucleotidyltransferase (NiRAN) domain (residues 1-250), an interface region between the NiRAN domain and the RdRp domain (residues 251-398) and the core RdRp domain (residues 399-932) 82,83 (fig. 4). The core RdRp domain assumes an architecture analogous to a cupped right hand composed of three subdomains: the fingers, palm and thumb 84 ( fig. 4b). Within the core RdRp domain, the active site is further subdivided into seven functional features, known as motifs A-G, which are highly conserved across positive-stranded RNA viruses 84 (fig. 4d).
Studies of smaller RdRps, amenable to X-ray crystallography, have provided a wealth of information on the role of these conserved structural motifs during the nucleotide addition cycle 84,85 . Initial nucleotide recognition is mediated by nonspecific charge-charge interactions of the nucleotide substrate with a series of positively charged Lys and Arg residues in motifs D and F of the nsp12 RdRp domain 86 . Molecular dynamics simulations of the hepatitis C virus RdRp indicate that the nucleotide diffuses into the central cleft through the NTP entry channel, until it reaches the active site located in the main channel 86 . The nucleotide ribose and base moieties subsequently flip into the active site through stabilizing hydrogen-bonding interactions with residues in motifs A and B and base-specific interactions with residues in motif F to form a Watson-Crick base pair with the template nucleotide [86][87][88] (fig. 4d). Correct positioning of the incoming nucleotide stabilizes the two catalytic Mg 2+ ions that are necessary for the condensation reaction via interactions with the α and β phosphates of the incoming nucleotide, the product RNA 3′-hydroxy group and the catalytic Asp residues of motif C 87,89 . Closure of the RdRp active site via the stabilizing interactions of motifs A and B with the base enables the acidbase chemistry that drives the attack of the deprotonated product RNA 3′-hydroxy on the α-phosphorus atom of the incoming NTP, resulting in nucleotide addition and the release of pyrophosphate 87,89 .
The conformational state immediately after catalysis, in which the product RNA 3′ base occupies the incoming-nucleotide site, is referred to as the 'pretranslocated state' (fig. 5). The conversion into the 'post-translocated' state mandates the release of pyrophosphate via opening of the active site through a subtle rotation of motif A 84 . Entry into the post-translocated state resets the active site for the next nucleotide addition cycle. A generalized kinetic scheme indicates that forward translocation is driven by the high nucleotide concentrations in the cellular milieu, which saturate the incoming-nucleotide site (fig. 5). Failure to translocate would arrest the RTC and lead to the termination of RNA synthesis unless the translocation impediment is cleared. As we discuss later, one mechanism of action of remdesivir, an RdRp inhibitor used for COVID-19 treatment, can be summarized as hindering RdRp translocation through steric effects of its base in the active site.

The enigmatic NiRAN domain of coronavirus nsp12.
The NiRAN domain (fig. 4b) attracted considerable interest following revelations that it is an essential enzymatic domain that is absent in RNA viruses outside the order Nidovirales 82 . Early investigations revealed that the NiRAN domain of the RdRp of the arterivirus equine arteritis virus possesses a self-nucleotidylating activity (NMPylation), which may prime the domain to transfer a nucleoside monophosphate (NMP) to another protein or nucleic acid substrate 82 . This activity is essential for nidovirus replication, as mutations that catalytically inactivate the NiRAN domains of equine arteritis virus and SARS-CoV were lethal 82 . Recent studies of the coronavirus NiRAN domain suggest that it might function as an RNA ligase, serve as a guanylyltransferase that catalyses the transfer of guanine monophosphate (GMP) during mRNA capping (box 2) or transfer an NMP to another viral protein to serve in protein-primed initiation of RNA synthesis 82 . It is possible that the NiRAN domain performs multiple activities or each of these activities, depending on the substate and context. The possible role of the NiRAN in the capping mechanism is discussed in box 2, and the RNA ligase activity of NiRAN has not been demonstrated so far. The NiRAN domain's possible role in protein-primed RNA synthesis warrants further exploration. Members of another clade of positive-strand RNA viruses, the order Picornavirales, initiate RNA synthesis using the protein primer viral protein genome-linked (VPg). A dinucleotide (UpU) that is covalently linked to a hydroxy group of a tyrosine, serine or threonine residue in VPg serves to prime RNA synthesis on the poly(A)-tailed template 90 . Consistent with the earlier work on equine arteritis virus 82 , the NiRAN domains of SARS-CoV, human coronavirus 229E and SARS-CoV-2 display higher specificity in vitro for UTP over GTP (UMPylation activity over GMPylation activity) 91,92 . Given this UTP specificity, it was posited that NiRAN could facilitate the UMPylation of a priming protein to initiate minus-strand RNA synthesis at the 3′ poly(A) tail of the gRNA template 92 . Recent in vitro evidence indicates that nsp9, a single-stranded RNA-binding protein, is a substrate for UMPylation at or near its N terminus 92 . Consistent with this finding, mutagenesis of the N-terminal residues of nsp9 severely affected viral replication. These results, combined with a structure of nsp9 bound to the NiRAN domain in an inhibited state 93 , implicate nsp9 in the initiation of RNA synthesis. Given observations that other RTC components and

Error threshold
The size limit of a viral genome, above which too many mutations accumulate to sustain long-term viability of the virus. naTure revIewS | MOleCulAR Cell BiOlOgy purification contaminants (such as proteins from bacterial expression systems) can be NMPylated in vitro 91,92 , further studies are needed to define the biologically relevant repertoire of substrates targeted by NiRAN's NMPylation activity.
The expanded replication machinery Replication and transcription are assumed to be executed by various subcomplexes, which include the holo-RdRp associated with other non-structural protein enzymes and accessory subunits. These viral enzymes are required to promote the fidelity of RNA synthesis, equip viral mRNAs with a 5′ cap structure and orchestrate the template switching needed for sgRNA synthesis. The interplay between these subunits and their interactions with regulatory viral RNA elements coordinate the timely replication and expression of the coronavirus genome, and provide a platform for continuous coronavirus evolution.
The holo-RdRp. To faithfully replicate the coronavirus genome, an arsenal of factors is needed to enhance the processivity of the RTC and repair errors in RNA synthesis 4 . Early biochemical experiments revealed that the primerextension processivity of SARS-CoV RdRp (nsp12) is greatly increased in the presence of nsp7 and nsp8, cementing their role as essential subunits of the holo-RdRp complex 94 . Advancements in cryo-electron microscopy led to the seminal structure of the SARS-CoV apo-holo-RdRp complex 83 and more recently to several structures of the SARS-CoV-2 RTC [95][96][97][98] . In the presence of an RNA duplex, the N termini of two nsp8 subunits form ordered helices that have nonspecific ionic interactions with the RNA backbone, illuminating the importance of nsp8 for enhanced RdRp processivity 95 ( fig. 4a,c). In addition to enhancing processivity, nsp8 interacts in vitro with various other non-structural proteins thought to assist the RTC 99,100 . Thus, nsp8 is posited to be important for forming higher-order RTCs that couple replication and transcription with template unwinding (nsp13), proofreading (nsp10, nsp12, nsp13 and nsp14) and RNA capping (nsp10, nsp13, nsp14 and nsp16).

The exoribonuclease activity of nsp14.
Coronaviruses encode a unique proofreading activity that is not found in other RNA viruses, including in nidoviruses with small genomes such as arteriviruses 3,22,101 . This activity is encoded in the N-terminal exoribonuclease (ExoN) domain of nsp14, which together with nsp10 forms an RNA proofreading complex 102-104 that is presumed to promote faithful replication of large nidovirus genomes 22 . In some coronaviruses, such as murine hepatitis virus (MHV) and SARS-CoV, nsp14-ExoN knockout yielded crippled but viable viruses exhibiting a mutator phenotype [105][106][107]   in enhancing the fidelity of genome replication, nsp14 ExoN can promote antiviral drug resistance. For example, in an in vitro assay, its 3′-5′ exoribonuclease activity can efficiently cleave ribavirin, an antiviral nucleoside analogue, from the 3′ end of an RNA substrate, indicating that the enzyme may promote high-level resistance to certain nucleoside analogue antiviral drugs 109 . Recent cryo-electron microscopy structures have revealed the molecular basis for how the ExoN domain of the nsp10-nsp14 complex interacts with double-stranded RNA containing a 5′ overhang and a one-nucleotide mismatch at the 3′ end 110 . The mismatched base enters the shallow ExoN active site and interacts with conserved catalytic residues via its 3′-hydroxy and 2′-hydroxy groups. In addition, the double-stranded portion of the RNA interacts with both the nsp10 N terminus and nsp14-ExoN residues outside the catalytic site. These structures provide direct visualization of recognition by ExoN of its preferred mismatched RNA substrate 110 .

Colliding motors: how the helicase induces RdRp backtracking.
Nsp13 is an SF1B-family RNA helicase that is essential for coronavirus replication [111][112][113][114][115] . Biochemical assays revealed that its 5′-3′ nucleic acid unwinding activity is enhanced twofold in the presence of nsp12 (Ref. 116 ). Nsp13 makes stable interactions with the SARS-CoV-2 RTC, which enabled the structural determination of nsp13 bound to the RTC (nsp13 2 -RTC) 69,117 . Singleparticle image classification revealed that the major particle class consists of two molecules of nsp13 bound to the RTC. Both nsp13 molecules (nsp13 F (fingers) and nsp13 T (thumb)) have extensive interactions with the holo-RdRp, whereas only one copy (nsp13 T ) is bound to the RNA scaffold at the 5′ end of the template RNA ( fig. 4a,c). Nsp13 T , bound to the template strand downstream of the RdRp active site, is positioned to translocate in the opposite direction relative to the RdRp 69 ( fig. 5). The opposing directionalities of nsp13 and the RdRp are hypothesized to trigger a translocation conflict. Forward translocation of nsp13 T on the template strand was proposed to lead to the reverse threading of the RdRp on the product RNA strand 69 . The reverse movement of polymerases, relative to their nucleic acid substrate, is well known for all cellular RNA polymerases and is termed 'backtracking' 118 ( fig. 5). The role of the second nsp13, nsp13 F , is poorly understood but it has been proposed to regulate the unwinding activity of the substrate-bound nsp13 T (Ref. 117 ).

A perspective on the role of backtracking in proofreading by the RdRp.
Similarly to its role in cellular RNA poly merases, backtracking may be essential for excision of misincorporated nucleotides in coronavirus RNA synthesis 69,70,119,120 . Indeed, behaviours consistent with backtracking have been observed in single-molecule magnetic tweezer experiments for the SARS-CoV-2 RdRp and RdRps from the Φ6 bacteriophage and poliovirus, illuminating the potentially widespread nature of backtracking in the viral realm [119][120][121] . Furthermore, recent evidence indicates that the NTP entry channel can accommodate a single-stranded product RNA 3′ overhang, which mirrors the backtracking product 70 .
Molecular dynamics simulations further showed that entry into the backtracking state occurs when a misincorporated RNA base flips from the pre-translocated state towards the mouth of the NTP entry channel 70 . Subsequently, the engagement of nsp13 with the template RNA would enhance the backtracking activity and offer a means to control entry into a long-lived backtracked state (fig. 5).
Backtracking may grant nsp14 ExoN access to any misincorporated nucleotide at the 3′ end of nascent product RNAs, thereby coupling proofreading with RNA synthesis 69,70 . Alternatively, as suggested by the recent cryo-electron microscopy structures of nsp10-nsp14 bound to a double-stranded RNA substrate with a 5′ overhang and a 3′ nucleotide mismatch 110 , misincorporation events may lead the RdRp to release the mismatch-containing RNA duplex, thereby granting ExoN access for proofreading. Additionally, it is envisaged that backtracking could have a role in discontinuous RNA synthesis by exposing the anti-TRS-B at the 3′ end of the nascent minus-strand sgRNAs and mediating template switching 69,70 (fig. 3b). This hypothesis is supported by a mutation in the arterivirus helicase that does not affect genome replication but abolishes all sgRNA transcription 68,122 .
Coupling of backtracking with nsp14-ExoN activity could also explain how coronaviruses excise non-natural nucleotides from their nascent product RNA. Genetic loss of function experiments indicated that nsp14 ExoN mitigates the effect of nucleoside and base analogues such as ribavirin and fluorouracil, respectively, in the betacoronavirus MHV 102,105,106 . These inhibitors are ineffective for treating SARS-CoV, MERS-CoV and SARS-CoV-2 infections, highlighting that the protection conferred by nsp14 ExoN is of clinical concern in the hunt for promising nucleoside analogues 106,123 . Combination therapy approaches may yield fruitful outcomes given the likely synergy between nsp14-ExoN and nsp12 (RdRp) inhibition. Therefore, it is pertinent to better understand how nsp14 is recruited for excision and repair 124 . In addition, shedding light on why some nucleoside analogues, such as remdesivir and molnupiravir (discussed in the next section), are effective could provide valuable insight into the design of novel antiviral nucleoside analogues for monotherapy.

A rational design of RdRp inhibitors
Although vaccines against SARS-CoV-2 have shown remarkable efficacy, COVID-19 continues to spread and affect communities globally. The reasons for this are multiple, and include vaccine shortages, public vaccine hesitancy, reduced vaccine effectiveness in immunosuppressed people and the emergence of new virus variants. It is therefore anticipated that SARS-CoV-2 will become endemic 125 , potentially evolving in the human host and leading to gradual or more sudden reductions of vaccine efficacy. Given such concerns, the search for drugs against SARS-CoV-2 and related viruses remains a priority in the research community. In this section, we discuss the mechanisms of action of two RdRp inhibitors, remdesivir and molnupiravir, that show clinical benefit in treating COVID-19.

Mechanisms of action of remdesivir and molnupiravir.
Nucleoside analogues, which can target the RdRp, are common antiviral therapeutics 125 . Currently remdesivir and molnupiravir are two antiviral drugs used to treat COVID-19 (Ref. 126 ). Studies indicate that treatment with remdesivir decreases the duration of the infection in hospitalized patients 127 . Biochemical evidence demonstrates that the SARS-CoV-2 RdRp preferably incorporates remdesivir ( fig. 6a) rather than its natural analogue adenosine and can incorporate molnupiravir ( fig. 6b) rather than its natural analogue cytidine 126,[128][129][130][131] . Once incorporated, neither inhibitor induces immediate pausing of RNA synthesis, in contrast to classical chain terminators 126,129,131 (fig. 6c- . 6f). This steric inhibition is surmounted in vitro in the presence of subphysiological concentrations of NTPs, indicating it is unlikely the major inhibitory hurdle for viral replication in living cells. Instead, recent data suggest that remdesivir may impair replication when incorporated into the template strand following an initial round of viral RNA synthesis 134 . In the template strand, remdesivir hinders the incorporation of the incoming nucleotide. This mode of activity has been termed 'template-dependent inhibition' 134 ( fig. 6d). Following the eventual incorporation of this incoming nucleotide, a second potential checkpoint was proposed, in which remdesivir would bias the RdRp towards the pre-translocated state, although direct evidence for this is lacking 134 . Like remdesivir, molnupiravir is a prodrug that is converted in cells into its triphosphate form, thereby serving as a nucleotide analogue. Molnupiravir inhibits replication through lethal mutagenesis of the genomes of multiple viruses, including SARS-CoV-2 (Refs 135-138 ). Molnupiravir treatment presents a high barrier to resistance in cell culture assays 135,136 . Importantly, like remdesivir, molnupiravir escapes from the coronavirus nsp14-ExoN proofreading activity 136 . Unlike remdesivir, molnupiravir is delivered orally, which, combined with its high barrier to resistance and potent antiviral activity, led to its pursuit as an alternative therapeutic for COVID-19 (Ref. 139 ). Molnupiravir triphosphate is a cytidine analogue that exerts its effect by indiscriminately serving as a template for the incorporation of either adenine or guanine, thus explaining the observation of the transition mutations G>A and C>U in coronaviruses exposed to molnupiravir 129 (fig. 6e). Two recently resolved cryo-electron microscopy structures of molnupiravir base-paired with adenine or guanine revealed the structural basis of molnupiravir-mediated lethal mutagenesis 131 .
The RdRp possesses high selectivity for remdesivir and molnupiravir due to their excellent mimicry of natural nucleotides. Therefore, these compounds do not significantly affect the initial round of RNA synthesis after incorporation, a feature which likely reduces their recognition and excision by nsp14 ExoN surveying the fidelity of RTCs.
Overcoming the proofreading barrier in antiviral drug design. Designing nucleoside analogues that escape the proofreading activity of nsp14 ExoN is a trial-by-error endeavour since it is challenging to pinpoint chemical properties that would lead to nucleotide mimicry. A more rational approach could entail targeting the enzymatic activity of ExoN or its interfaces with nsp10 or the RTC. The ExoN activity is essential for SARS-CoV-2 and MERS-CoV replication, but it is not vital for viral propagation across the betacoronavirus clade 108 . nsp14-ExoN inactivation in MHV and SARS-CoV, although not lethal, enhances the susceptibility of the virus to nucleoside analogues, highlighting the benefits of dual-inhibition strategies in coronaviruses 105 . One concern is the potential for off-target effects when the ExoN active site is being targeted with a small-molecule inhibitor, due to structural similarities to other cellular DEDD-family exonucleases. Designing inhibitors against the interface of nsp10 with nsp14 ExoN and nsp16 has attracted interest given that viral replication is abrogated in interface mutants 140,141 .
A long-standing research interest is the characterization of how the proofreading complex, nsp10-nsp14, interacts with the RTC. Pull-down experiments using a series of truncated proteins indicated that nsp12 interacts with nsp14 and its subdomains 109 . Recent structural work showed that SARS-CoV-2 nsp10-nsp14 can be recruited to the RTC by forming a covalent link with nsp9, which is bound to the nsp12 NiRAN domain 124 . Prior observations in MHV inspired the rationale for using as nsp9-nsp10 fusion protein in that study, as ablation of the protease cleavage site between nsp9 and nsp10 in MHV maintains a viable phenotype. This nsp9-nsp10 protease cleavage mutant, however, experienced a pronounced overall defect in RNA synthesis, and the propagation of this mutant was severely compromised 142 . Given the crippled phenotype of the MHV nsp9-nsp10 cleavage-site mutant 142 , it remains to be shown whether the same interaction with nsp10-nsp14 can occur when nsp9 and nsp10 are separated. The recent structural analysis of the SARS-CoV-2 nsp12-nsp9-nsp10-nsp14 complex did not reveal any features that would suggest that the incorporation of an nsp9-nsp10 fusion protein affects RTC assembly 124 . Probing the role of nsp10-nsp14 in greater detail will benefit from single-molecule experiments using reconstituted RTC components. This approach could also test whether nsp14 ExoN alleviates backtracking as proposed 69,70 . Furthermore, both the engagement of proteins with RTCs and the proposed proofreading activity of nsp14 ExoN will have to be demonstrated in vivo.

Future perspectives
Although the surge of new coronavirus research has expanded our understanding of the molecular mechanisms of SARS-CoV-2 replication and gene expression, the foundation of our knowledge of these processes is built primarily on previous research of other coronaviruses and the distantly related arteriviruses. Corroborating our understanding will likely reveal properties and processes shared between viral species, which are of potential value for the design of pan-coronavirus inhibitors. More specifically, it is pertinent to unravel poorly understood intricacies of spatiotemporal regulation of RNA synthesis in coronaviruses. Unknown to us is the complete repertoire of host-cell factors involved in assisting the coronavirus infection cycle and how these factors may, for example, be subverted to assist in the formation of replication organelles or contribute to the formation of RTCs. The spatial segregation of coronavirus replication in virus-induced membranous organelles appears to be a requisite for successful virus propagation -a feature shared among positive-strand www.nature.com/nrm RNA viruses, although it remains to be elucidated where in the cell RNA synthesis occurs during the earliest phase of infection.
Downstream of replication organelle formation, key questions include how RNA synthesis is primed on its templates and how regulation of the two major synthesis pathways is orchestrated. Such regulation must be achieved by the concerted action of the replicase proteins and may be further assisted by host factors, whose role in these pathways is relatively unexplored. Regulating RNA synthesis necessitates the faithful maintenance of the encoded genetic information, as unwanted mutations can alter the RNA elements required for processes such as template switching and that lead to the production of nonsense transcripts. Yet to be worked out in detail is how the coronavirus proofreading complex coordinates its activity with the polymerase, leading to the excision of misincorporated RNA nucleotides and nucleotide analogues. Understanding how mutations accumulate during replication and how they are corrected can inform us on the evolution of drug resistance mutations and aid the design of inhibitors that directly target the replicase complex. We hope that such considerations may guide research that will shape our response to future deadly outbreaks of coronaviruses, a consequence of the encroaching footprint of humanity on the natural world.
Published online 25 November 2021