Introduction

Coronaviruses (CoVs) are a highly diverse family of enveloped positive-sense single-stranded RNA viruses. They infect humans, other mammals and avian species, including livestock and companion animals, and are therefore not only a challenge for public health but also a veterinary and economic concern. Within the order of Nidovirales and the suborder of Coronavirineae lies the family Coronaviridae. The latter is further specified into the subfamily of Orthocoronavirinae, which consists of four genera: alphacoronavirus, betacoronavirus, gammacoronavirus and deltacoronavirus. Whereas alphacoronaviruses and betacoronaviruses exclusively infect mammalian species, gammacoronaviruses and deltacoronaviruses have a wider host range that includes avian species. Human and animal coronavirus infections mainly result in respiratory and enteric diseases1,2.

Human coronaviruses, such as HCoV-229E and HCoV-OC43, have long been known to circulate in the population and they, together with the more recently identified HCoV-NL63 and HCoV-HKU1, cause seasonal and usually mild respiratory tract infections associated with symptoms of the ‘common cold’. In strong contrast, severe acute respiratory syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome coronavirus (MERS-CoV) and SARS-CoV-2, which have emerged in the human population over the past 20 years, are highly pathogenic. By infecting bronchial epithelial cells, pneumocytes and upper respiratory tract cells in humans, SARS-CoV, MERS-CoV and SARS-CoV-2 infections can develop into severe, life-threatening respiratory pathologies and lung injuries for which no specific prophylactic or therapeutic treatment has been approved to date.

The initial steps of coronavirus infection involve the specific binding of the coronavirus spike (S) protein to the cellular entry receptors, which have been identified for several coronaviruses and include human aminopeptidase N (APN; HCoV-229E), angiotensin-converting enzyme 2 (ACE2; HCoV-NL63, SARS-CoV and SARS-CoV-2) and dipeptidyl peptidase 4 (DPP4; MERS-CoV). The expression and tissue distribution of entry receptors consequently influence viral tropism and pathogenicity. During the intracellular life cycle (Fig. 1), coronaviruses express and replicate their genomic RNA to produce full-length copies that are incorporated into newly produced viral particles. Coronaviruses possess remarkably large RNA genomes flanked by 5′ and 3′ untranslated regions that contain cis-acting secondary RNA structures essential for RNA synthesis. At the 5′ end, the genomic RNA features two large open reading frames (ORFs; ORF1a and ORF1b) that occupy two-thirds of the capped and polyadenylated genome. ORF1a and ORF1b encode 15–16 non-structural proteins (nsp), of which 15 compose the viral replication and transcription complex (RTC) that includes, amongst others, RNA-processing and RNA-modifying enzymes and an RNA proofreading function necessary for maintaining the integrity of the >30 kb coronavirus genome3. ORFs that encode structural proteins and interspersed ORFs that encode accessory proteins are transcribed from the 3′ one-third of the genome to form a nested set of subgenomic mRNAs (sg mRNAs). Coronavirus accessory proteins are highly variable sets of virus-specific proteins that display limited conservation even within individual species but they are principally thought to contribute to modulating host responses to infection and are determinants of viral pathogenicity4,5. Nevertheless, the molecular functions of many accessory proteins remain largely unknown owing to the lack of homologies to accessory proteins of other coronaviruses or to other known proteins6.

Fig. 1: The coronavirus virion and life cycle.
figure 1

a | The coronavirus virion consists of structural proteins, namely spike (S), envelope (E), membrane (M), nucleocapsid (N) and, for some betacoronaviruses, haemagglutinin-esterase (not shown). The positive-sense, single-stranded RNA genome (+ssRNA) is encapsidated by N, whereas M and E ensure its incorporation in the viral particle during the assembly process. S trimers protrude from the host-derived viral envelope and provide specificity for cellular entry receptors. b | Coronavirus particles bind to cellular attachment factors and specific S interactions with the cellular receptors (such as angiotensin-converting enzyme 2 (ACE2)), together with host factors (such as the cell surface serine protease TMPRSS2), promote viral uptake and fusion at the cellular or endosomal membrane. Following entry, the release and uncoating of the incoming genomic RNA subject it to the immediate translation of two large open reading frames, ORF1a and ORF1b. The resulting polyproteins pp1a and pp1ab are co-translationally and post-translationally processed into the individual non-structural proteins (nsps) that form the viral replication and transcription complex. Concordant with the expression of nsps, the biogenesis of viral replication organelles consisting of characteristic perinuclear double-membrane vesicles (DMVs), convoluted membranes (CMs) and small open double-membrane spherules (DMSs) create a protective microenvironment for viral genomic RNA replication and transcription of subgenomic mRNAs (sg mRNAs) comprising the characteristic nested set of coronavirus mRNAs. Translated structural proteins translocate into endoplasmic reticulum (ER) membranes and transit through the ER-to-Golgi intermediate compartment (ERGIC), where interaction with N-encapsidated, newly produced genomic RNA results in budding into the lumen of secretory vesicular compartments. Finally, virions are secreted from the infected cell by exocytosis. Key steps inhibited by compounds that are currently being validated and which represent attractive antiviral targets are highlighted in red. An, 3′ polyA sequence; cap, 5′ cap structure; dsRNA, double-stranded RNA; L, leader sequence; RdRP, RNA-dependent RNA polymerase.

Despite the previous public health emergencies caused by the SARS-CoV and MERS-CoV outbreaks and the impact of the ongoing SARS-CoV-2 pandemic on society and human health, intervention strategies to combat coronavirus infections are only in their early stages and await proof of clinical efficacy. Their development intimately relies on the deepened understanding of basic mechanisms of coronavirus gene functions as well as of the molecular interactions with host factors. Since the discovery of the first coronavirus (avian infectious bronchitis virus) in the 1930s7 and the discovery of the first human coronaviruses (HCoV-229E and HCoV-OC43) in the 1960s8,9, the coronavirus research field has made substantial progress in understanding the basic principles of coronavirus replication and pathogenesis (Box 1). This advancement was accelerated after the emergence of SARS-CoV in 2002 and MERS-CoV in 2012 and has broadened our view on coronaviruses as zoonotic pathogens that can severely affect human health. Moreover, the unprecedented speed and technical progress of coronavirus research that has become evident in a few months after the appearance of SARS-CoV-2 at the end of 2019 has led to a rapidly growing understanding of this newly emerging pathogen and of its associated disease, COVID-19. In this Review, we discuss key aspects of coronavirus biology and their implications for SARS-CoV-2 infections as well as the treatment and prevention strategies.

Entry of coronaviruses

Coronavirus S proteins are homotrimeric class I fusion glycoproteins that are divided into two functionally distinct parts (S1 and S2) (Fig. 2). The surface-exposed S1 contains the receptor-binding domain (RBD) that specifically engages the host cell receptor, thereby determining virus cell tropism and pathogenicity. The transmembrane S2 domain contains heptad repeat regions and the fusion peptide, which mediate the fusion of viral and cellular membranes upon extensive conformational rearrangements10,11,12. Shortly after the 2002–2003 SARS-CoV outbreak, ACE2 was identified as the functional receptor that enables infection by SARS-CoV13. The high genomic and structural homology between the S proteins of SARS-CoV and SARS-CoV-2 (76% amino acid identity) supported the identification of ACE2 as the cell-surface receptor for SARS-CoV-2 (refs12,14,15,16). Remarkably, essential SARS-CoV contact residues that interact with ACE2 were highly conserved in SARS-CoV-2 as well as in members of the species Severe acute respiratory syndrome-related coronavirus that use ACE2 or have similar amino acid side chain properties14,15,17,18,19. These data were corroborated by the atomic resolution of the interface between the SARS-CoV-2 S protein and ACE2 (refs16,19,20,21). By contrast, the bat Severe acute respiratory syndrome-related coronavirus RaTG13 S sequence (93.1% nucleotide identity to SARS-CoV-2) shows conservation of only one out of six amino acids directly involved in ACE2 binding, even though, based on the entire genomic sequence, RaTG13 is the closest relative of SARS-CoV-2 known to date (96.2%)14 (Box 2).

Fig. 2: Severe acute respiratory syndrome-related coronavirus spike sequence variation.
figure 2

a | Schematic illustration of coronavirus spike, indicating domain 1 and domain 2. The receptor-binding motif (RBM) is located on S1 and the fusion peptide (FP), heptad repeat 1 (HR1), HR2 and the transmembrane (TM) domains are located on S2. The cleavage sites are indicated. The colour code designates conserved spike regions surrounding the angiotensin-converting enzyme 2 (ACE2)-binding domain among severe acute respiratory syndrome-related coronaviruses (SARSr-CoVs) and high amino acid sequence variations within the site of receptor interaction. b | Amino acid alignment of human SARS-CoV-2 (Wuhan-Hu-1) and SARS-CoV (Frankfurt-1), bat (RaTG13, RmYN02, CoVZC45 and CoVZXC21) and pangolin (MP789, P1E) SARSr-CoVs. The spike gene sequence alignment was performed using MUSCLE and using the default settings and codon alignment, then translated into amino acids using MEGA7, version 7.0.26. The alignment was coloured according to percentage amino acid similarity with a Blosum 62 score matrix. The colour code designates conserved spike regions surrounding the ACE2-binding domain among SARSr-CoVs and high amino acid sequence variations within the site of receptor interaction. The insertion of a polybasic cleavage site (PRRAR, amino acids 681 to 685) in Wuhan-Hu-1 is indicated, and similar insertions are depicted in bat SARSr-CoV RmYN02. c | Within the spike sequence, the ACE2 receptor-binding motif (amino acids 437 to 509, black line) is depicted. The spike contact residues for ACE2 interaction are marked with asterisks.

These data suggest that, much like during the evolution of SARS-CoV, frequent recombination events between severe acute respiratory syndrome-related coronaviruses that coexist in bats probably favoured the emergence of SARS-CoV-2 (ref.22). Indeed, predicted recombination breakpoints divide the S gene into three parts. The middle part of the S protein (amino acids 1,030–1,651, encompassing the RBD) is most similar to SARS-CoV and bat severe acute respiratory syndrome-related coronaviruses WIV1 and RsSHC014, all of which use human ACE2 as a cellular entry receptor23. However, the amino-terminal and carboxy-terminal parts of the SARS-CoV-2 S protein (amino acids 1–1,029 and 1,651–3,804, respectively) are more closely related to severe acute respiratory syndrome-related coronaviruses ZC45 and ZXC21. These observations highlight the importance of recombination as a general mechanism contributing to coronavirus diversity and might therefore drive the emergence of future pathogenic human coronaviruses from bat reservoirs. This emphasizes the need for surveillance to determine the breadth of diversity of severe acute respiratory syndrome-related coronaviruses, to evaluate how frequently recombination events take place in the field and to understand which virus variants have the potential to infect humans. Increased surveillance is thus instrumental to improve our preparedness for future outbreaks of severe acute respiratory syndrome-related coronaviruses.

Besides receptor binding, the proteolytic cleavage of coronavirus S proteins by host cell-derived proteases is essential to permit fusion24,25. SARS-CoV has been shown to use the cell-surface serine protease TMPRSS2 for priming and entry, although the endosomal cysteine proteases cathepsin B (CatB) and CatL can also assist in this process24,25,26,27,28. Concordantly, the simultaneous inhibition of TMPRSS2, CatB and CatL efficiently prevents SARS-CoV entry into in vitro cell cultures29. TMPRSS2 is expressed in the human respiratory tract and thus strongly contributes to both SARS-CoV spread and pathogenesis. Notably, SARS-CoV-2 entry relies mainly on TMPRSS2 rather than on CatB and CatL, as inhibition of TMPRSS2 was sufficient to prevent SARS-CoV-2 entry in lung cell lines and primary lung cells15,30. These data support the evaluation of the TMPRSS2 inhibitors camostat mesylate and nafamostat mesylate in clinical trials, since in vitro studies have demonstrated their potent antiviral activity against emerging coronaviruses, including SARS-CoV-2 (refs29,31,32).

Given these similarities in receptor usage and cleavage requirements, it is surprising that SARS-CoV and SARS-CoV-2 display marked differences in virus replication efficiency and spread. SARS-CoV primarily targets pneumocytes and lung macrophages in lower respiratory tract tissues, where ACE2 is predominantly expressed, consistent with the lower respiratory tract disease resulting from SARS-CoV infection and the limited viral spread33,34,35. By contrast, SARS-CoV-2 replicates abundantly in upper respiratory epithelia, where ACE2 is also expressed, and is efficiently transmitted36,37,38.

Different host cell tropism, replication kinetics and transmission of SARS-CoV and SARS-CoV-2 might be determined by S protein–ACE2 binding affinities. For example, it has been reported that the S protein and ACE2 binding affinity is correlated with disease severity in SARS-CoV infections18. The affinity of the SARS-CoV-2 RBD to ACE2 has been shown to be similar16,19 or stronger20,30 than that of the SARS-CoV RBD. However, the binding affinity of the entire SARS-CoV-2 S protein to ACE2 seems to be equal or lower than that of SARS-CoV, suggestive of a less exposed RBD16,28,30. In addition to ACE2, attachment and entry factors, such as cellular glycans and integrins or neuropilin 1, may also have an impact on the observed phenotypic differences of SARS-CoV and SARS-CoV-2 (refs39,40,41,42,43).

A peculiar feature of the SARS-CoV-2 S protein is the acquisition of a polybasic cleavage site (PRRAR) at the S1–S2 boundary, which permits efficient cleavage by the prototype proprotein convertase furin. Cleavage results in enhanced infection and has been proposed to be a key event in SARS-CoV-2 evolution as efficient S protein cleavage is required for successful infection and is a main determinant in overcoming species barriers10,11,12,15,16,28,30,44,45,46. This pre-processing of the SARS-CoV-2 S protein by furin may contribute to the expanded cell tropism and zoonotic potential and might increase transmissibility16,46. Importantly, such cleavage sites have not been identified in other members of the Sarbecovirus genus46. However, there are multiple instances of furin-like cleavage site acquisitions that occurred independently during coronavirus evolution and similar cleavage sites are present in other human coronaviruses such as HCoV-HKU1 (ref.47), HCoV-OC43 (ref.48) and MERS-CoV49. Recently, an independent insertion of amino acids (PAA) at the same region of the S protein has been identified in the bat coronavirus RmYN02 (ref.50). Such independent insertion events highlight the zoonotic potential of bat severe acute respiratory syndrome-related coronaviruses and may increase the possibility of future outbreaks.

The importance of coronavirus S protein-mediated receptor binding and temporally coordinated conformational rearrangements that result in membrane fusion make this process a prime target of innate and adaptive antiviral responses. Notably, a screen involving several hundred interferon-stimulated genes identified lymphocyte antigen 6 family member E (Ly6E) as a potent inhibitor of coronavirus fusion51. Ly6E-mediated inhibition of coronavirus entry was demonstrated for various coronaviruses, including SARS-CoV-2, and seems to have pivotal importance in protecting the haematopoietic immune cell compartment in a mouse model of coronavirus infection. Moreover, the exposure of S protein on the surface of the virion results in the induction of specific neutralizing humoral immune responses52. Coronavirus S proteins are heavily glycosylated, which promotes immune evasion by shielding epitopes from neutralizing antibodies16,53,54. Nevertheless, sera from patients with SARS and COVID-19 can neutralize SARS-CoV and SARS-CoV-2, respectively15,28. Several specific or cross-reactive antibodies that bind the SARS-CoV-2 S protein have been recently reported and their administration to infected patients could potentially provide immediate protection55,56,57,58. Human monoclonal antibodies from previous hybridoma collections from SARS-CoV S protein-immunized transgenic mice55 or from the memory B cell repertoire of convalescent patients with SARS and COVID-19 have been shown to either directly interfere with RBD–ACE2 interaction55,57,58,59 or to destabilize intermediate pre-fusion conformations upon binding different epitopes55,56. Taken together, the exploitation of a combination of multiple neutralizing antibodies that do not compete for overlapping epitopes may not only result in synergistic improvements but also impede the appearance of escape mutations.

Viral gene expression and RNA synthesis

Genome translation

The release of the coronavirus genome into the host cell cytoplasm upon entry marks the onset of a complex programme of viral gene expression, which is highly regulated in space and time. The translation of ORF1a and ORF1b from the genomic RNA produces two polyproteins, pp1a and pp1ab, respectively. The latter results from a programmed –1 ribosomal frameshift at the short overlap of ORF1a and ORF1b4. Ribosome profiling revealed that the efficiency of the frameshift between ORF1a and ORF1b lies between 45% and 70% in the case of SARS-CoV-2 (ref.60), similar to that measured for mouse hepatitis virus (MHV)61. This determines the stoichiometry between pp1a and pp1ab, with pp1a being approximately 1.4–2.2 times more expressed than pp1ab60. Sixteen non-structural proteins are co-translationally and post-translationally released from pp1a (nsp1–11) and pp1ab (nsp1–10, nsp12–16) upon proteolytic cleavage by two cysteine proteases that are located within nsp3 (papain-like protease; PLpro) and nsp5 (chymotrypsin-like protease) (Fig. 3). The protease residing in nsp5 is frequently referred to as 3C-like protease (3CLpro), because of its similarities to the picornaviral 3C protease, or as main protease (Mpro), because it is responsible for proteolytic processing of the majority of polyprotein cleavage sites. Proteolytic release of nsp1 is known to occur rapidly62, which enables nsp1 to target the host cell translation machinery63,64,65. Nsp2–16 compose the viral RTC and are targeted to defined subcellular locations where interactions with host cell factors determine the course of the replication cycle66,67,68. Nsp2–11 are believed to provide the necessary supporting functions to accommodate the viral RTC, such as modulating intracellular membranes, host immune evasion and providing cofactors for replication, whereas nsp12–16 contain the core enzymatic functions involved in RNA synthesis, RNA proofreading and RNA modification4,67. RNA synthesis is performed by the nsp12 RNA-dependent RNA polymerase (RdRP) and its two cofactors nsp7 and nsp8, the latter with proposed primase or 3′-terminal adenylyltransferase activity4,67,69,70. Notably, nsp14 provides a 3′–5′ exonuclease activity that assists RNA synthesis with a unique RNA proofreading function71. The coronavirus capping machinery, which is not yet fully elucidated, is composed of nsp10, which functions as a cofactor, nsp13, which provides the RNA 5′-triphosphatase activity, and nsp14 and nsp16, which perform the functions of N7-methyltransferase and 2′-O-methyltransferase, respectively67,72,73,74. Notably, one key enzyme typically involved in the formation of the 5′ cap structure, the guanylyltransferase, has not yet been identified in coronaviruses.

Fig. 3: Coronavirus polyprotein processing and non-structural proteins.
figure 3

Coronavirus polyprotein processing and domains of non-structural proteins (nsp) are illustrated for severe acute respiratory syndrome-related coronaviruses. Proteolytic cleavage of the polyproteins pp1a and pp1ab is facilitated by viral proteases residing in nsp3 (PLpro) and nsp5 (Mpro). PLpro proteolytically releases nsp1, nsp2, nsp3 and the amino terminus of nsp4 from the polyproteins pp1a and pp1ab (indicated by the blue arrows). Mpro proteolytically releases nsp5–16 and the carboxy terminus of nsp4 from the polyproteins pp1a and pp1ab (indicated by the red arrows)176. Conserved domains and known functions are schematically depicted for nsp1–16 (refs4,66,67,177). DMV, double-membrane vesicle; DPUP, Domain Preceding Ubl2 and PLpro; EndoU, endoribonuclease; ExoN, exoribonuclease; HEL, helicase; Mac I–III, macrodomains 1–3; Mpro, main protease; NiRAN, nidovirus RdRP-associated nucleotidyltransferase; NMT, guanosine N7-methyltransferase; OMT, ribose 2′-O-methyltransferase; PLpro, papain-like protease; Pr, primase or 3′-terminal adenylyl-transferase; RdRP, RNA-dependent RNA polymerase; TM, transmembrane domains; Ubl, ubiquitin-like domain; Y, Y and CoV-Y domain; ZBD, zinc-binding domain.

The establishment of the viral RTC is crucial for virus replication and thus a promising target for antivirals against SARS-CoV-2. One such target is Mpro, which resides in nsp5. Mpro releases the majority of nsps from the polyproteins and is essential for the viral life cycle. Furthermore, as Mpro is very sequence specific, compounds that structurally mimic those cleavage sites can specifically target the viral protease with little or no impact on host cellular proteases75,76,77. Based on structural analysis of the protein, multiple research groups have successfully developed lead compounds that block Mpro function in cell culture assays, thus providing frameworks that could aid in rapid drug discovery75,77.

RNA synthesis

Viral genomic replication is initiated by the synthesis of full-length negative-sense genomic copies, which function as templates for the generation of new positive-sense genomic RNA. These newly synthesized genomes are used for translation to generate more nsps and RTCs or are packaged into new virions. A hallmark of coronaviruses and most members of the order Nidovirales is the discontinuous viral transcription process, first proposed by Sawicki and Sawicki78, that produces a set of nested 3′ and 5′ co-terminal subgenomic RNAs (sgRNAs)78,79 (Fig. 4). During negative-strand RNA synthesis, the RTC interrupts transcription following the encounter of transcription regulatory sequences (TRSs) that are located upstream to most ORFs in the 3′ one-third of the viral genome. At these TRS elements, also called TRS ‘body’, the synthesis of the negative-strand RNA stops and is re-initiated at the TRS adjacent to a leader sequence (TRS-L) located at about 70 nucleotides from the 5′ end of the genome78,79,80,81,82. This discontinuous step of coronavirus RNA synthesis involves the interaction between complementary TRSs of the nascent negative strand RNA (negative-sense TRS body) and the positive strand genomic RNA (positive-sense TRS-L). Upon re-initiation of RNA synthesis at the TRS-L region, a negative strand copy of the leader sequence is added to the nascent RNA to complete the synthesis of negative-strand sgRNAs. The discontinuous step of negative strand RNA synthesis results in the production of a set of negative-strand sgRNAs that are then used as templates to synthesize a characteristic nested set of positive-sense sg mRNAs that are translated into structural and accessory proteins. Although the coronavirus sg mRNAs are structurally polycistronic, it is assumed that they are functionally monocistronic and that only the first ORF at the 5′ end, which is absent in the next smaller sgRNA, is translated from each sgRNA78,81.

Fig. 4: Coronavirus replication and discontinuous transcription.
figure 4

Schematic depiction of coronaviral RNA synthesis. Full-length positive-sense genomic RNA is used as a template to produce both full-length negative-sense copies for genome replication and subgenomic negative-sense RNAs (–sgRNA) to produce the subgenomic mRNAs (sg mRNA). The negative strand RNA synthesis involving a template switch from a body transcription regulatory sequences (TRS-B) to the leader TRS (TRS-L) is illustrated to produce one sg mRNA. This process can take place at any TRS-B and will collectively result in the production of the characteristic nested set of coronaviral mRNAs.

The TRS elements for SARS-CoV-2 have already been determined by RNA sequencing analyses of viral RNAs80,83. Like for SARS-CoV, the consensus TRS core of SARS-CoV-2 is 5′-ACGAAC-3′ and eight sg mRNAs have been shown to be produced in SARS-CoV-2-infected cells (sg mRNAs 2–9). In addition to canonical sgRNAs, recent reports also determined the existence of numerous non-canonical RNA products of discontinuous transcription, including fusions of the 5′ leader sequence to unexpected 3′ sites, TRS-L independent long-distance fusions, and local fusions resulting in small deletions mainly in the structural and accessory genes60,80. However, it remains to be determined whether all of these non-canonical sgRNAs truly arise by discontinuous transcription or whether they represent RNAs that result from recombination. Nevertheless, similar findings were previously reported for other coronaviruses, including MHV61 and HCoV-229E81, which indicates an enhanced coding potential for coronaviruses80. Overall, these unexpected fusion events may drive coronavirus evolution through variant generation, and novel ORFs could encode additional accessory proteins that are involved in either viral replication or modulation of the host immune response60,80.

The RdRP residing in nsp12 is the centrepiece of the coronavirus RTC and has been suggested as a promising drug target as it is a crucial enzyme in the virus life cycle both for replication of the viral genome but also for transcription of sgRNAs. The structure of the SARS-CoV-2 RdRP nsp12 and its cofactors nsp7 and nsp8 has been elucidated and shows a high degree of conservation to the SARS-CoV structure69,84,85. The amino acid sequence of the SARS-CoV and SARS-CoV-2 RdRPs show a >95% similarity with most changes located in the nidovirus RdRP-associated nucleotidyltransferase domain, which, despite being a genetic marker of Nidovirales, has yet to be functionally elucidated69. The structural similarities of the RdRP active site, including conserved key amino acid residues, with other positive-sense RNA viruses suggest the possibility to repurpose known drugs that are effective against other RNA viruses69. One of the most promising candidates is the phosphoramidate remdesivir (RDV), which, in its triphosphate form, acts as a substrate for viral RdRPs and competes with ATP86. RDV has shown potential as an antiviral agent against a broad range of RNA viruses, including Filoviridae (for example, Ebola virus), Paramyxoviridae (for example, Nipah virus) and Pneumoviridae (for example, respiratory syncytial virus) as well as other coronaviruses, including SARS-CoV and MERS-CoV86,87. The RdRP of SARS-CoV-2 selectively incorporates RDV over ATP, which subsequently results in a delayed-chain termination86,88. In contrast to classic nucleoside analogues that lead to immediate termination of the synthesis reaction after incorporation, the RdRP continues for three nucleotides after RDV has been incorporated before chain termination. Nucleotide analogues like RDV may have limited efficacy owing to the proofreading function of the exonuclease domain contained in nsp14 (ExoN)89. The corrective function that is exerted by ExoN is not only responsible for maintaining the stability of the coronavirus genome but also enables the excision of erroneous mutagenic nucleotides71,89. The mode of action observed for RDV might be an explanation for its increased efficiency over other nucleoside analogues as the delayed-chain termination could lead to improved evasion from the proofreading function of nsp14. The current model suggests steric hindrance as a likely reason for termination, disturbing the positioning of the RNA and thus hampering the translocation to the next position86,88. RDV was shown to reduce virus replication of SARS-CoV-2 in vitro90 and was demonstrated to restrict clinical symptoms of SARS-CoV-2 in rhesus macaques upon early pre-symptomatic treatment91. However, a recent randomized, double-blind, placebo-controlled clinical trial in humans with severe COVID-19 showed limited clinical efficacy of RDV treatment92 and further studies will be necessary. Another promising candidate is the purine analogue favipiravir (FPV), which has been shown to effectively target multiple RNA viruses93. Although the mechanism of action is not yet completely understood, a recent study of the in vitro mechanism of FPV suggested a combination of chain termination, slowed RNA synthesis and lethal mutagenesis as the mode of action against SARS-CoV-2, which indicates that FPV might be used to effectively restrict viral replication93. Indeed, results of an experimental pilot study showed that using FPV as treatment against COVID-19 led to increased recovery and faster viral clearance times in treated patients compared to control treatments94. Clinical studies with both RDV and FPV are currently ongoing and will establish whether these compounds are effective antivirals to treat coronavirus infections93.

Expression of structural and accessory proteins

The ORFs encoding the structural proteins (that is, S protein, envelope (E) protein, membrane (M) protein and nucleocapsid (N) protein) are located in the 3′ one-third of coronavirus genomes. Interspersed between these ORFs are the ORFs encoding for so-called accessory proteins. The structural proteins of SARS-CoV-2 have not yet been assessed in terms of their role in virus assembly and budding. In general, coronavirus structural proteins assemble and assist in the budding of new virions at the endoplasmic reticulum (ER)-to-Golgi compartment that are suggested to exit the infected cell by exocytosis95,96,97. However, recent evidence shows that betacoronaviruses, including MHV and SARS-CoV-2, rather egress infected cells via the lysosomal trafficking pathway98. During this process, viral interference with lysosomal acidification, lysosomal enzyme activity and antigen presentation was demonstrated.

At least five ORFs encoding accessory genes have been reported for SARS-CoV-2: ORF3a, ORF6, ORF7a, ORF7b and ORF8 (GenBank entry NC_045512.2) as well as potentially ORF3b99 and ORF9b100, the latter of which is probably expressed as a result of leaky scanning of the sgRNA of the nucleocapsid protein80,99,101. In addition, ORF10 has been postulated to be located downstream of the N gene. However, not all of these ORFs have been experimentally verified yet and the exact number of accessory genes of SARS-CoV-2 is still debated80,102. For example, in the case of ORF10, recent sequencing data questioned whether ORF10 is actually expressed, as the corresponding sgRNA could only be detected once in the entire dataset80. Furthermore, using proteomics approaches, the ORF10 protein has not been found in infected cells100,102, whereas ribosome profiling data suggested that ORF10 may be translated60.

The accessory genes display a high variability among coronavirus groups and usually show no sequence similarity with other viral and cellular proteins. Although they are not required for virus replication in cell culture4,5, they are, to some extent, conserved within the respective virus species and are suspected to have important roles in the natural host. Indeed, in the case of SARS-CoV, it was shown that at least ORF3b, ORF6 and ORF9b function as interferon antagonists6,102,103,104. There are some notable differences between the accessory genes of SARS-CoV-2 and SARS-CoV, with the latter having a total of eight described accessory genes (ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8a, ORF8b and ORF9b). In SARS-CoV-2, ORF3b contains a premature stop codon and is thus substantially shorter than the SARS-CoV variant. Although there are indications that ORF3b could exhibit its interferon antagonistic function also in a truncated form99, it has not yet been found to be expressed at the protein level in virus-infected cells100,102. SARS-CoV-2 ORF8 shows an especially low homology to SARS-CoV ORF8. The coding sequence of SARS-CoV ORF8 went through a gradual deletion over the course of the SARS-CoV epidemic. Whereas the early isolates from human patients contained a full-length ORF8, a deletion of 29 nucleotides was observed in all SARS-CoV strains during the middle-to-late stages. This deletion caused the split of one full-length ORF8 into two truncated gene products, ORF8a and ORF8b. Furthermore, less frequent deletion events were also observed, including an 82-nucleotide deletion and a 415-nucleotide deletion, which led to a complete loss of ORF8 (refs105,106), suggesting a possible benefit of SARS-CoV ORF8 deletions in vivo. Notably, however, reconstitution of SARS-CoV ORF8 by reverse genetics was associated with slightly increased fitness in cell culture106. Recently, SARS-CoV-2 ORF8 was reported to bind to major histocompatibility complex and mediate its degradation in cell culture107. This indicates that SARS-CoV-2 ORF8 might mediate a form of immune evasion, which is not the case for the split SARS-CoV ORF8a or ORF8b107. Interestingly, a mutant SARS-CoV-2 strain found in Singapore displayed a deletion of 382 nucleotides in the region of ORF8, indeed spanning most of the ORF and the adjacent the TRS. This may indicate a tendency towards host adaption and decreased pathogenicity108 or, alternatively, that the ORF8 protein is dispensable in humans, whereas it is required in the natural host.

Replication compartments

Primary interactions between nsps and host cell factors during the early coronavirus replication cycle initiate the biogenesis of replication organelles66,109,110. Although mechanisms underlying replication organelle formation are not fully understood, the concerted role of the membrane-spanning nsp3, nsp4 and nsp6 has been implicated in diverting host endomembranes into replication organelles111,112,113. Detailed electron microscopy investigations have described the phenotypic appearance and extent of membrane modifications induced by coronaviruses to accommodate viral replication. Coronavirus infection, like many other positive-sense RNA viruses, manifests in the generation of ER-derived and interconnected perinuclear double-membrane structures such as double-membrane vesicles (DMVs), convoluted membranes and the recently discovered double-membrane spherules112,114,115,116. Interestingly, these structures are highly dynamic and develop during the viral life cycle114,117. Even though replicase subunits — notably SARS-CoV nsp3, nsp5 and nsp8 — have been shown to be anchored on convoluted membranes, to date, the specific location of viral RNA synthesis remains the most intriguing unanswered question114,117. Double-stranded RNA (dsRNA), commonly considered as viral replication intermediates, segregates into the DMV interior97,114,118. Consistently, viral RNA synthesis was shown to occur within DMVs by using metabolic labelling of newly synthesized viral RNA in the context of SARS-CoV, MERS-CoV and infectious bronchitis virus infections116. Although, until recently, no openings towards the cytosol have been observed97,114, molecular pores involving nsp3 were demonstrated to span DMVs in MHV-infected cells118. These newly identified structures, which were also observed in SARS-CoV-2-infected cells, provide a connection between the dsRNA-containing DMV interior and the cytosol, thereby hypothetically rendering newly synthesized viral RNAs available for translation and encapsidation into nascent virions118. They also provide new opportunities to experimentally address the origin, fate and trafficking routes of viral RNAs contained in DMVs.

Replication organelles are a conserved and characteristic feature of coronavirus replication and, consistent with suggested roles of rewired intracellular membranes in the context of other positive-sense RNA virus infections, they provide a propitious niche with adequate concentrations of macromolecules necessary for RNA synthesis while preventing the exposure of viral replication intermediates to cytosolic innate immune sensors95,119. The functional dissection of coronavirus replication organelles has proven challenging as their contributions to viral fitness and pathogenesis are indistinguishable from functions provided by enzymes of the RTC, which are anchored on the membranes of the replication organelle120,121,122. Nevertheless, recent studies revealed the overall composition of the coronavirus RTC, with nsp2–nsp16 and the nucleocapsid protein comprising the viral components68,123. Moreover, several genetic and proteomic screening approaches aimed at deciphering essential coronavirus–host interactions and the RTC microenvironment identified supportive roles of the ER and the early secretory system as well as related vesicular trafficking pathways for efficient replication68,124,125,126 and provided a comprehensive list of cellular proteins that are in close proximity to the coronaviral RTC68,127,128,129. Collectively, these studies, in combination with advanced electron microscopy, provide ground for future studies to dissect the microarchitecture of the coronaviral RTC in relation to remodelled ER-derived membranes and to functionally link those structures to processes taking place in close proximity to the RTC such as translation, replication and transcription of viral RNA.

Virus–host interactions and host response

A successful intracellular coronavirus life cycle invariably relies on critical molecular interactions with host proteins that are repurposed to support the requirements of the virus. This includes host factors required for virus entry (such as the entry receptor and host cell proteases), factors required for viral RNA synthesis and virus assembly (such as ER and Golgi components and associated vesicular trafficking pathways) and factors required for the translation of viral mRNAs (such as critical translational initiation factors)68,124,125,126,127,128,129.

A first systematic expression study of SARS-CoV-2 proteins and subsequent affinity purification followed by mass spectrometry identified more than 300 potential coronavirus–host protein interactions. Although outside the context of a SARS-CoV-2 infection, interactors of individually overexpressed SARS-CoV-2 proteins uncovered several cellular processes reminiscent of those of other coronaviruses that are likely to also be involved in the SARS-CoV-2 life cycle130. Importantly, 69 compounds, either FDA approved or at different stages of clinical development, that target putative SARS-CoV-2 protein interactors were foregrounded, a subset of which efficiently prevented SARS-CoV-2 replication in vitro. These systematic screening approaches of large compound libraries that target host proteins provide means of rapidly identifying antiviral (repurposed) drugs and accelerated clinical availability131. However, a detailed functional characterization of conserved host pathways that promote coronavirus replication will guide the development of efficacious targeted therapeutics against coronavirus infections.

In addition, coronaviruses efficiently evade innate immune responses. Virus–host interactions in this context are multifaceted and include strategies to hide viral pathogen-associated molecular patterns, such as replication intermediates (dsRNA), that may be sensed by cytosolic pattern recognition receptors132,133. DMVs have been proposed to shield dsRNA and sites of viral RNA synthesis; however, experimental proof supporting this idea has not yet been obtained. The coronaviral RTC also contributes to innate immune evasion through several nsp-encoded functions. These include PLpro-mediated deubiquitylation activity134,135, de-ADP-ribosylation by nsp3-encoded macro domains136, RNA-modifying enzymatic activities such as 5′-cap N7-methylation and 2′-O-methylation (nsp14 and nsp16, respectively)74,137,138, and exonuclease139 and endoribonuclease140,141 activities (nsp14 and nsp15, respectively). Although these mechanisms have been elucidated in considerable detail for several prototype coronaviruses, data for SARS-CoV-2 are not yet available.

Besides the well-conserved functions residing in the nsps that comprise the RTC, additional mechanisms to counteract innate immune responses are known for coronaviruses. For example, nsp1 is rapidly proteolytically released from pp1a and pp1ab and affects cellular translation in the cytoplasm to favour viral mRNAs over cellular mRNA, and thereby also decreases the expression of type I and III interferons and of other host proteins of the innate immune response. Indeed, a first structural and functional analysis of SARS-CoV-2 nsp1 showed binding of nsp1 to ribosomes and nsp1-mediated impairment of translation64. Furthermore, several coronavirus accessory proteins are known to affect innate immune responses, most prominently MHV NS2 and MERS-CoV ORF4b proteins, that have 2′,5′-phosphodiesterase activity to antagonize the OAS–RNase L pathway142. Although this activity is not predicted for any accessory protein of SARS-CoV or SARS-CoV-2, the ORF3b, ORF6 and N proteins of SARS-CoV have been shown to interfere at multiple levels of the cellular interferon signalling pathway, thereby efficiently inhibiting innate immune responses103. Interestingly, an initial report recently suggested a similar role of SARS-CoV-2 ORF3b as an effective interferon antagonist99. Although this property remains to be demonstrated in the context of viral infection, these results suggest that SARS-CoV-2 shares some preserved accessory protein activities with SARS-CoV that interfere with antiviral host responses.

Coronavirus biology and COVID-19

Our knowledge on SARS-CoV-2 replication, gene function and host interactions is accumulating at unprecedented speed and it will be important to link those findings to the disease induced by SARS-CoV-2 infection, COVID-19. Thus, there is a need to establish experimental systems, such as representative animal models to study the transmission and pathogenicity of SARS-CoV-2, primary airway epithelial cultures and organoids to study SARS-CoV-2 replication and host responses to infection in relevant cell types, and reverse genetics systems to study the specific gene functions of SARS-CoV-2 (Table 1). These tools will be instrumental to understanding how the molecular biology of SARS-CoV-2 affects the development of COVID-19.

Table 1 Opportunities and limitations of current SARS-CoV and SARS-CoV-2 model systems

As we currently understand, SARS and COVID-19 are a consequence of virus-encoded functions and delayed interferon responses and, in severe cases, they are associated with dysregulated immune responses and immunopathologies143,144. Indeed, rapid and uncontrolled viral replication of SARS-CoV has been demonstrated to evade the host innate immune activation during its initial steps. As a consequence, the increase in aberrant pro-inflammatory responses and immune cell infiltration in the lungs provoke tissue damage and contribute to the clinical manifestation of SARS145.

Consistently, host responses, such as cytokine expression, that are known to drive inflammation and immunopathologies have been assessed in studies that revealed that SARS-CoV-2 considerably affects the transcriptional landscape of infected cells by inducing inflammatory cytokine and chemokine signatures38,146,147. Although interferon responses have been shown to potently impair SARS-CoV-2 replication, only moderate induction of type I interferon, type II interferon and interferon-stimulated genes was reported38,147.

Together, these effects may translate into strong and dysregulated pro-inflammatory responses, while cells display low innate antiviral defence activation as revealed by single-cell transcriptomic studies of nasopharyngeal and bronchial patient samples38,146,148,149. In severe COVID-19 cases, as opposed to mild cases, aberrant recruitment of inflammatory macrophages and infiltration of T lymphocytes, including cytotoxic T cells, as well as of neutrophils have been measured in the lung146,149. The accumulating evidence of dysregulated pro-inflammatory responses during SARS-CoV-2 infections has led to the use of immune modulators to inhibit hyperactivated pathogenic immune responses143,144,150,151.

Conclusions

In contrast to the SARS-CoV epidemic of almost 20 years ago, improved technologies, such as transcriptomics, proteomics, single-cell RNA sequencing, global single-cell profiling of patient samples, advanced primary 3D cell cultures and rapid reverse genetics, have been valuable tools to understand and tackle SARS-CoV-2 infections. Furthermore, several existing animal models initially established for SARS-CoV are applicable to study SARS-CoV-2 and will help to identify the critical viral and host factors that impact on COVID-19. We need to understand why SARS-CoV-2, in contrast to SARS-CoV, is replicating so efficiently in the upper respiratory tract and which viral and host determinants are decisive on whether COVID-19 patients will develop mild or severe disease152,153,154. Finally, we need to put the first encouraging studies on SARS-CoV-2 into the context of coronavirus biology to develop efficacious strategies to treat COVID-19 and to develop urgently needed vaccines.