Expression of a gene in the human genome is a multistep and heavily regulated process that resembles a production line. Protein-coding genes are transcribed almost exclusively by RNA polymerase II (RNAPII). During transcription, quality-control checkpoints are implemented to ensure that a gene is properly recognized and transcribed. A number of factors (epigenetic enzymes, chromatin remodellers, transcription factors and activators–coactivators) ensure gene recognition and RNAPII progression on the genic template. The progression of RNAPII—which includes RNAPII initiation, pause–release, elongation and the termination of transcription—occurs in sync with co-transcriptional events (that is, 5′ capping, splicing and polyadenylation). The end result of gene transcription and RNA processing is the generation of a mature RNA, in which coding exons are fused in a linear order that depends on the isoform of the gene. Mature mRNA is subsequently exported from nucleus into the cytoplasm, where it is directed to ribosomes for translation. The canonical model of translation initiation starts with recognition of the 7-methylguanylate cap on the 5′ end of most eukaryotic mRNA by the initiation factor eIF4, which recruits a pre-initiation complex that comprises the 40S ribosomal subunit and several eukaryotic initiation factors (eIF3, eIF1, eIF1A and the ternary complex eIF2–GTP–Met–tRNAiMet). This complex then scans continuously from the 5′ to the 3′ end for the first initiation codon in an optimal context (the RCCAUGG Kozak sequence, in which R stands for purine)1. Once the start codon of a gene is read by the initiator tRNAMet, translation progresses and ends when a stop codon in the mRNA (UAA, UAG or UGA) is recognized by release factors. Depending on the subcellular localization of a given protein, co- and post-translational events might take place to sort proteins to their destinations. In brief, this is the conventional eukaryotic production line through which a gene makes a protein ready to be used in the cell.

To overcome their small genomes and increase their coding capacity, viruses have evolved to co-opt the transcriptional, epigenetic and translational mechanisms of the infected host cell. To generate protein diversity, viruses can adopt the existing mechanisms of the host (for example, alternative splicing) or use unique strategies. Here we describe the diverse ways by which viral genomes give rise to genes and proteins that deviate from the canonical framework of human genes, restricting our analyses to eukaryotes and their viruses.

Small-genome solutions to big problems

A main strategy to increase the number of coded proteins from a small genome is the use of overlapping or overprinted genes. Nucleic acid sequences can simultaneously encode two or more proteins in alternative reading frames (ARFs). To synthesize these proteins, unconventional transcriptional (‘copying’) or translational (‘reading’) events need to take place (Fig. 1). Although a comprehensive characterization of gene overprinting in large mammalian genomes is lacking, estimates on the basis of simulating codon use2 or ribosome footprinting3 suggest that only 1% of human genes are overprinted. By contrast, gene overlapping is very common among viruses. Despite differences in the size and structure of viral genomes, 53% of sequenced viral genomes containing at least one pair of genes that overlap for more than 50 nucleotides4. Proteins that originate by overprinting often encode accessory proteins that feature short sequences, and can provide a selective advantage for viruses5,6,7. Many overlapping genes are fixed in viral genomes because of their functions as host antagonists, such as those that affect the interferon response of the host8,9, suppress RNA interference10, and induce apoptosis of host cells11. In addition, as a mutation in an overlapping genomic region affects both the canonical and the overprinted genes, overlapping genes may also serve as a safety mechanism that protects the virus from deleterious mutations. However, because proteins that are encoded by gene overprinting are often enriched in disordered regions and show a tendency to have no known homologues12,13, many overprinting viral proteins are poorly characterized.

Fig. 1: The host and virus adopt different strategies for gene expression as a result of differences in genome size.
figure 1

Left, in organisms with a large genome, expression of a cellular gene typically follows a linear pathway that leads to the synthesis of the respective canonical protein product. Right, viruses, which are confined by their much-smaller genome sizes, use unconventional pathways that mostly involve transcription-level (decoding multiple messages (mRNA)) or translation-level regulation to generate several protein products from a single locus.

Another challenge that is inherent to a small genome is a lack of regulatory space for maintaining the correct stoichiometry and temporality of the expression of overprinted proteins. To overcome these limitations, viruses use several methods that include (1) intrinsic cis and trans regulation of polymerase and other enzymatic activities and (2) a codependency on host functions. We summarize the most relevant strategies used by viruses for expanding the coding and regulatory potentials of their overlapping genes, focusing mostly on viruses that are human pathogens and that represent current and future threats.

Expression of overlapping genes

Copying multiple messages

One set of strategies used by viruses to increase the efficiency of their small genomes involves transcriptional mechanisms that generate several mRNAs from overprinted coding sequences.

Transcriptional slippage

Transcriptional slippage is a process in which several overlapping transcripts are generated from the same gene via viral RNA polymerase stuttering, which results in the incorporation (and, occasionally, the deletion) of one or more nucleotides in the transcript (Fig. 2a). Sequences that are prone to transcriptional slippage include homopolymeric A/T tracts, the U6A motif in human immunodeficiency virus (HIV)14, and the UC-rich slippery sequence in the paramyxoviruses15. The efficiency of transcriptional slippage is regulated by the stability and length of the nascent RNA relative to the template RNA, as well as by the structure of RNA-dependent RNA polymerase (RdRp)15. Owing to frameshift upon the insertion of nucleotides, the translation of overlapping transcripts typically results in proteins with a common N-terminus, but different C termini. Aside from using transcriptional slippage to generate mRNAs in different reading frames, some virus also use it to polyadenylate their mRNAs16.

Fig. 2: Small-genome solutions to expanding coding potential.
figure 2

a, Polymerase frameshifting, in which backward or forward slippage of RNA polymerase (pol) results in nucleotide insertions or deletions, and generates a heterogeneous population of viral mRNAs. b, PRFs lead to the synthesis of viral proteins from several reading frames. c, Leaky scanning, in which the ribosome scans through and skips an AUG start codon that is typically located in a less-optimal sequence context, and initiates at a downstream start codon. d, Generation of noncanonical sites of translation initiation through upstream ORFs or non-AUG start codons. In start-snatching, an upstream AUG start codon is obtained via cap-snatching of host RNA (which enables the translation of novel proteins on the basis of both host and viral genetic information).

Transcriptional slippage was first identified in the synthesis of V proteins from the phosphoprotein (P) gene in Parainfluenza virus 5 (previously known as Simian virus 5)17, and has subsequently been observed in other pathogenic RNA viruses: mostly of members of Mononegavirales, including viruses in the Paramxyoviridae (such as Sendai virus) and Filoviridae (such as ebolavirus). Positive-strand viruses in the Potyviridae18 and Flaviviridae19 families have also been described as using this mechanism. In paramyxoviruses, transcriptional slippage can occur when RdRp encounters a ‘slippery’ sequence of 3′-UUUUUUCCC-5′ in the P gene and stutters at the underlined cytidine15. The polymerase then backtracks and realigns the newly synthesized mRNA with the template by non-destabilizing G:U base-pairing, which results in G insertions. The possible number of G insertions is limited to six by a sequence that contains adenosine that is located immediately upstream of the slippery site (as A:A base-pairing is not tolerated)20. In Sendai virus, at least three distinct mRNAs of the P gene are produced by transcriptional slippage. The unedited mRNA encodes P protein, which is a component of RdRp that regulates transcriptional fidelity and limits antiviral responses21,22. mRNA with +1 G or +2 G insertions code for two accessory proteins (V and W, respectively), both of which regulate viral replication kinetics and the activation of host responses23,24. Additionally, the unique hexameric genome-packaging rule of paramyxovirus might regulate the efficiency of mRNA editing mediated by transcriptional slippage in this virus20,25, as it has been shown that mRNA editing is at its most extensive when the cytidine at which the RdDp stutters is in position 2 or 5 in a hexamer, which suggests that N proteins might remain in close proximity to RdRp during transcription26. Further examples of transcriptional slippage occur in ebolaviruses and Marburg viruses27, both of which belong to the Filoviridae family. In ebolavirus, transcriptional slippage occurs at a 30% frequency on a stretch of seven uridines in the glycoprotein (GP) gene and results in the insertion of one or two additional adenines in the mRNA28,29,30,31. The unedited transcript translates into a nonstructural and secreted glycoprotein28, and the +1 A and +2 A shifts result in an extended glycoprotein that bears a transmembrane domain and a small soluble glycoprotein, repsectively28. More recently, deep mRNA sequencing has revealed other possible polyuridine transcriptional slippage sites in the GP, NP, VP30 and L mRNAs of ebolavirus27, which suggests that there may be more uncharacterized polypeptide species expressed than has previously been believed.

RNA splicing

RNA splicing is a commonly used and tightly regulated eukaryotic mechanism of generating distinct mature transcripts from a single gene, and has also been exploited by several families of viruses that replicate in the host nucleus, such as members of the Adenoviridae and Parvoviridae (DNA viruses), retroviruses, and members of the Bornaviridae and Orthomyxoviridae (RNA viruses). However, because of the more compact nature of viral genomes, splicing in viruses—unlike in humans—often serves to express overprinted genes.

In the segmented RNA genome of influenza A viruses (IAV), splicing occurs in viral segments 8 (which encodes the NS gene), 7 (which encodes the M gene) and 2 (which encodes the PB1 gene). Depending on the viral strain, up to three or four unique mRNAs can be generated from segments 8 and 7, respectively. The noncanonical proteins that are produced by splicing are involved in important functions, such as the nuclear export of viral RNA and host adaptation32,33. Importantly, the splicing of segments 7and 8 is regulated by an array of viral and host factors that includes trans regulators of splicing, such as NS1-BP, HNRNPK34, SRSF1 (also known as SF2/ASF)35, SRSF336 and protein kinase CLK136. Finally, cis-regulatory RNA secondary structures at the 3′ splice site of segment 7 have been suggested to be potential regulators of splicing efficiency in IAV37,38, and a determinant of host tropism37.

Circular RNA is a relatively stable and exonuclease-resistant RNA that is produced by backsplicing, and has recently been identified39 across many viruses—including members of the gammaherpesvirus family (Epstein–Barr virus and Kaposi sarcoma virus) and the oncogenic human papillomaviruses. The functions of circular RNA in viruses are largely unknown, but a recent study has shown that knockdown of the E7 circular RNA produced by human papillomavirus 16 using short hairpin RNA inhibits oncogenic transformation of infected cells40.

Reading multiple messages

Other mechanisms used by viruses to expand the set of proteins expressed from their small genomes include those that act at the level of mRNA translation, which allow for the expression of multiple overprinted proteins from one mRNA.

Programmed ribosome frameshifting

Programmed ribosomal frameshifts (PRFs) (Fig. 2b) occur when elongating ribosomes slip by one base upstream (5′, known as a −1 PRF) or downstream (3′, known as a +1 PRF), thus shifting the ribosomal reading frame. PRFs allow for the expression of overprinted proteins from the same mRNA and can also serve to regulate the stoichiometry of viral proteins. There are two prerequisites for a −1 PRF: (1) a slippery site with the sequence motif XXXYYYZ (in which X is any three identical nucleotides, Y represents U or A, and Z is A, C or U (although with some exceptions, such as GGU); as has previously been reviewed in detail41,42) and (2) a downstream pseudoknot structure that comprises two stems and a connecting loop as a stimulatory element for ribosomal pausing at the slippery site43,44. In +1 PRFs, ribosome pausing is also directed by the presence of rare or ‘hungry’ codons at the slippery site, which shifts the ribosomal A site onto a more abundant codon to resume elongation.

Much of our early understanding of −1 PRFs came from studies of the Rous sarcoma virus7 and HIV-145, in both of which the structural protein precursor (Gag) and the enzyme precursor (Pol) are translated from the same viral mRNA. Gag is produced through conventional translation. A −1 PRF midway through Gag synthesis occurs in 2–10% of translating ribosomes and results in a fusion protein that is known as Gag–Pol, which is later cleaved by viral proteases to generate full-length Pol5,46,47. PRFs also have an important role in members of the Coronaviridae (for example, severe acute respiratory syndrome coronavirus (SARS-CoV), severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and Middle Eastern respiratory syndrome coronavirus) and Flaviviridae (for example, West Nile virus)48,49. In the Coronaviridae, the replicase gene is organized into two partially overlapping open reading frames (ORFs) known as ORF1a and ORF1ab that encode polyprotein 1a and the fused polyprotein 1a–1b, respectively, the latter of which is generated by a −1 PRF. This frameshift event occurs at a frequency of 14–27%50, and has been suggested as a mechanism that maintains the ratio of ORF1a to ORF1ab51. Unlike members of the Retroviridae, SARS-CoV contains an atypical three-stem pseudoknot and an additional, structurally conserved attenuator sequence that is 5′ of the PRF signal50,51,52, which has been shown to control the frequency of −1 PRFs in coronaviruses51,52. Notably, lowering the efficiency of frameshifts markedly reduces viral replication and infectivity6,51,53,54,55, which underscores the importance of the −1 PRF for these viruses. Importantly, host factors have been identified that interfere with virus PRFs. For instance, the human protein C19Orf66—first identified for its inhibitory effect on the replication of dengue virus56—has been shown to inhibit −1 PRFs in Gag–Pol synthesis57. C19Orf66 has further been shown to exhibit broad-spectrum activity in blocking PRFs in HIV-2, Rous sarcoma virus, human T lymphotropic virus and mouse mammary tumour virus57. Whether C19Orf66 functions only by limiting PRFs requires investigation, but targeting PRF factors could provide a selective and powerful antiviral strategy.

Leaky scanning

In ribosomal leaky scanning, the ribosome skips a translation initiation site (especially if this site is located in the context of a weak Kozak sequence) and initiates at a downstream one (Fig. 2c). Many viruses—including retroviruses58, paramyxoviruses59, papillomaviruses60 and bunyaviruses—adopt leaky scanning to express several proteins from one transcript61,62. In pandemic strains of HIV, a bicistronic mRNA transcript encodes a conserved upstream, small 81-amino-acid protein known as Vpu, which confers a fitness advantage by degrading the CD4 viral receptor and enhancing virion release58,63,64,65,66. The bypassing of the Vpu start codon leads to initiation on a downstream start codon, which results in the synthesis of the viral envelope protein58. In the segmented RNA genome of IAV, leaky scanning can generate four proteins in addition to the canonical protein that is encoded by segment 267. For example, a downstream AUG leads to the synthesis of PB1-F2, a protein that localizes to mitochondria and elicits a pro-inflammatory and pro-apoptotic effect on host cells11,68,69,70.

Translation of upstream ORFs

Although viruses have a relatively short 5′ untranslated region, an increasing body of evidence suggests that upstream ORFs that are led by upstream start codons (AUGs) can be translated (Fig. 2d). Upstream translation has widely been observed in DNA viruses and positive- and negative-sense RNA viruses, as well as in mammalian genomes71,72,73,74,75,76,77,78,79,80,81. Upstream ORFs in viruses have been suggested to have two major functional consequences. First, and similar to mammalian upstream ORFs78,79,80,81,82, many viral upstream ORFs suppress the translation of the downstream canonical ORF. For instance, in ebolavirus, an upstream ORF of the L gene (which is important for replication and RNA capping) suppresses the translation of the L ORF under normal conditions and enhances it under stress conditions75. This bimodal regulation fine tunes the synthesis of L protein and helps to maintain optimal polymerase activity75. Similarly, upstream ORFs can regulate the expression of viral proteins in coronaviruses (such as murine hepatitis virus and bovine coronaviruses) and in several DNA viruses (such as hepatitis B virus and human cytomegalovirus)72,74,75,76,77. Second, the products of upstream ORFs can be involved in regulating virulence and tropism. In the monopartite genome of enteroviruses, a highly conserved upstream ORF partially overprints the canonical polyprotein ORF71 and encodes a putative transmembrane protein that facilitates viral release and invasion of echovirus 7 in human gut epithelial cells71.

Initiation of translation from non-AUG codons

The translation of many virus genes has been shown to initiate on noncanonical start codons that are typically found upstream of the canonical AUG codon81 (Fig. 2d). These noncanonical start codons fall mainly into two categories. First, a near-cognate start codon that normally varies by one nucleotide from AUG can be recognized by the initiator tRNAiMet, which occurs at the P-site of the ribosome. For instance, the polycistronic P/C mRNA of Sendai virus and parainfluenza virus type 1 encodes five proteins (P, C, C′, Y1 and Y2) from overlapping ORFs. The C′ protein is generated by the efficient initiation of translation from an upstream non-AUG codon (ACG for Sendai virus and GUG for parainfluenza virus type 1), which has a N′ extension compared to the C protein83,84. Similar uses of non-AUG start codons (most frequently CUG, and sometimes GUG) have been identified in viruses that infect a wide range of hosts, including murine leukaemia virus85, human T cell lymphotropic virus type 186, influenza virus87, soil-borne wheat mosaic virus88 and equine infectious anaemia virus89. Second, the non-AUG start codon can be recognized by a non-methionine tRNA. In this case, the initiator tRNAiMet is not required and translation initiates in the A site. This leads to proteins that start with non-methionine amino acids, which have mainly been identified in insect viruses90,91.

Start-snatching to generate hybrid proteins

Translation in eukaryotic cells requires the recognition of the 5′ methyl-7-guanosine (m7G) cap on mRNA. Segmented negative-sense RNA viruses in the order Bunyavirales and the families Orthomyxoviridae (for example, IAV) and Arenaviridae (for example, Lassa virus) do not encode capping enzymes, but instead rely on a process known as ‘cap-snatching’ to access cap-dependent translation. In this process, viral polymerase binds to the m7G cap of host RNA and cleaves off a short stretch (7–20 nucleotides in the case of IAV and about 7 nucleotides for Lassa virus) of host capped-RNA92,93. These host-derived fragments are then used as a primer to initiate the transcription of viral mRNAs94. As a consequence, mRNAs of segmented negative-sense RNA viruses exist as genetic hybrids, in which 5′ sequence heterogeneity is provided by snatched host-derived sequences92,95,96,97.

Instead of merely providing a m7G cap, cap-snatched host sequences that bear AUGs also allow segmented negative-sense RNA viruses to express cryptic ORFs within their 5′ untranslated regions (known as upstream viral ORFs). This process has been termed ‘start-snatching’ (Fig. 2d). During IAV infection, about 12% of host-derived cap-snatched sequences bear AUG start codons that confer translation. Depending on the reading frame of the host-derived AUG with respect to the viral RNA, these codons initiate the synthesis of either host–virus chimeric N-terminally extended viral proteins or novel polypeptides (up to 80 amino acids in length) that are overprinted with the major viral ORF98. Start-snatching and the genesis of upstream viral ORFs may be a way for segmented negative-sense RNA viruses to sample evolutionary space before gene functionalization. A recent study has shown that some strains of IAV have evolved to encode an AUG start codon in the untranslated region of the nucleoprotein segment. Expression of this N-terminally extended nucleoprotein increases viral virulence99.

Additional mechanisms

Genome compaction in viruses has driven additional mechanisms that do not rely on genic overprinting to express several proteins from a single locus, which have previously been reviewed81 and are summarized in Box 1.

Lessons for the development of therapeutic agents

A fundamental principle that underlies the development of antiviral drugs is to evaluate the benefit (for example, infection suppression) versus the cost (for example, off-target effects or toxicity on the host) provided by a drug (Fig. 3a). Two general strategies are currently used to combat microbial infections: training the host by vaccination and using small-molecule inhibitors to target the virus or the host. Here we provide perspectives on how common features of noncanonical viral gene expression could serve as a starting point for the development of antiviral therapies.

Fig. 3: Strategy for therapeutic and prophylactic development of novel antiviral agents.
figure 3

a, A balance between viral inhibition and host toxicity underlies therapeutic development. Targeting viral-specific functions or host functions that are more important (in a given time frame) to the virus than the host paves the way for the generation of therapeutic agents. b, Cis-acting nucleic-acid structural elements that are involved in unconventional viral expression mechanisms (such as pseudoknots in PRFs, and stem loops in polymerase slippage sites and IRES) can be directly targeted by small molecules, host factors and antisense oligonucleotides (AON) or indirectly targeted by modulating the related host factors. c, Targeting of virus-specific processes in gene expression (such as cap-snatching and RdRp) that are shared among viruses and not found in hosts offers a high specificity for antiviral agents. The targeting of host dependencies that are used by several virus provides an alternative route to pan-viral therapeutic agents.

ARFs as vaccination targets

A goal of vaccination is to generate broadly protective antibodies and/or cross-reactive T cells that are directed against viral targets. However, the design of effective and universal vaccines is often hampered by rapid changes of viral antigens through mutation, recombination or re-assortment. For instance, antigenic drift and shift in the surface glycoproteins of IAV have hampered the development of a universal vaccine against influenza virus100. Thus, a major challenge remains to find ideal vaccination targets that are both highly immunogenic and genetically constrained from mutation owing to potential fitness loss.

ARFs have long been neglected as potential candidates for vaccine or drug development, and might provide a solution to this conundrum. ARFs (such as overprinted ORFs) feature an overall low synonymous divergence101,102,103, and are therefore expected to be relatively constrained from accumulating mutations (as mutations in these regions are likely to disrupt more than one viral protein). Importantly, proteins encoded by ARFs have been shown to be abundantly synthesized during infections104,105,106,107 and can be efficiently processed through class-I MHC processing pathways and induce cytotoxic T lymphocyte responses108,109,110.

The use of ARF as epitopes has been proposed for HIV108,111,112,113, influenza virus110 and in some cancers109 and has several major advantages. First, ARFs in simian immunodeficiency virus and HIV contribute greatly to CD8+ T cell responses in infected individuals and trigger a stronger cytotoxic T lymphocyte response compared to epitopes that target the canonical proteins108,114. The potential of ARFs as epitopes is further substantiated by the observation that codon-optimized recombinant HIV vaccines (in which ARFs are disrupted or skewed) trigger a reduced cytotoxic T lymphocyte response compared to non-codon optimized vaccines112. Second, cytotoxic T lymphocyte responses to at least some ARF epitopes do not drive viral escape113 and presentation of ARF epitopes has been associated with favourable clinical outcomes111. Finally, overprinting ORFs tend to be highly conserved among strains of the same virus, as in IAV98. Taken together, these findings suggest that ARFs and overprinting ORFs present potential antigen candidates for the development of new vaccines and for therapies based on chimeric antigen receptor T cells115.

Targeting viral nucleic acid structures

Many viruses rely on the presence of cis-acting structural elements in their genomes for protein expression. These elements tend to be highly conserved, and have both structural and sequence-specific properties; they therefore present excellent targets for drug development (Fig. 3b). These strategies require precise knowledge of the sequence and structure of the nucleic acid target region, as well as its viral and host binding partners.

Structure-targeting drugs can be designed following two strategies. First, a drug can disrupt or alter the structure of a cis element. For example, a compound (known as ligand 43) discovered from an in silico small-molecule screen has been shown to specifically inhibit −1 PRFs in SARS-CoV by altering the plasticity of a viral RNA pseudoknot116,117,118. Second, a drug can inhibit cofactor binding to a structural element. For example, benzimidazole (a potential inhibitor of hepatitis C virus (HCV)119,120) functions by widening the interhelical angle in the viral internal ribosomal entry site (IRES), which results in reduced interaction with ribosome subunits and thus the inhibition of translation121,122.

In theory, the high conservation at structure and sequence levels makes viral cis elements ideal targets for antisense oligonucleotides, which work by disrupting structure formation or induce degradation of the RNA by recruitment of RNase H. Indeed, the first drug approved by the US Food and Drug Administration (fomivirsen) for treating cytomegalovirus retinitis in individuals infected with HIV is an antisense drug. Several other antisense-based antiviral drugs against HIV, HCV, ebolavirus and Marburg virus have entered clinical trials. However, antisense oligonucleotide technology has some caveats. Besides considerations of delivery method (which have previously been reviewed123), virus escape can occur. For example, an antisense oligonucleotide inhibitor (ISI-14803) of HCV that targets the IRES has been shown to exert selective pressure on the IRES sequence124,125. This resulted in mutations accumulating in the virus in patients during a phase-I clinical trial, although no mutations were detected at the antisense oligonucleotide binding site124. Taken together, these data suggest that the design of drugs based on antisense oligonucleotides requires a careful analysis of the surrounding structures. Alternatively, it may be necessary to use multiplex delivery of antisense oligonucleotides (that is, to target several regions of the structure at the same time), such that compensatory escape mutations will be unable to take hold.

Targeting virus-specific mechanisms of gene expression

Many viruses rely on their own proxies of host enzymes (for example, the capping machinery of the Coronaviridae) or pathways (for example, the cap-snatching of the Orthomyxoviridae) to express viral proteins (Box 2). Inhibitory drugs against these virus-specific proteins and pathways should achieve high specificity for the virus with minimal effect on the host (Fig. 3a).

Cap-snatching, which is used only by influenza viruses and other segmented negative-sense viruses, presents one such targetable pathway. To date, at least three small-molecule antiviral agents (favipiravir, pimodivir and baloxavir) that target the PB1, PB2 and PA subunits, respectively, of the influenza viral polymerase trimer have entered clinical development (as has previously been reviewed126). Baloxavir has been approved for treating influenza virus infections in the USA and Japan, and was generated through rational design against the cap-dependent endonuclease active site of the IAV PA protein127. Baloxavir has been shown to effectively inhibit cap-snatching activities in both IAV and influenza B virus127, and has broader antiviral effects than current standard-of-care anti-influenza drugs128,129. Success with these drugs may pave the way for the development of antiviral agents against other highly pathogenic cap-snatching viruses.

Conserved protein domains across viral families might provide targets for broader-acting antiviral agents (Fig. 3c). For example, RdRp is essential to RNA viruses and shares a similar 3D structural conformation130 and mechanism of action across species, which suggests that drugs that target RdRp could have activities in different viral families. Favipiravir—which was initially discovered on the basis of its antiviral activity against IAV—has been shown to exhibit antiviral activity against other RNA viruses, including viruses that cause fatal haemorrhagic fevers (arenaviruses, peribunyaviruses and filoviruses)131.

Although viral-targeting drugs offer high specificity, a potential issue is the acquisition of drug-resistant mutations in the viral targets. In the case of baloxavir, IAV recovered from 1.1 to 19.5% of patients treated with the drug developed up to 138 compensatory mutations132. A possible solution is combination therapy: because the targets of combination therapy are often located in different pathways or proteins, it is more difficult for the viral to acquire resistance compared to monotherapies. Indeed, combination therapies have been shown to slow down the acquisition of resistance and yield effective viral clearance133, as exemplified by the combinatorial ‘highly active antiretroviral therapy’ (HAART) used in controlling HIV infections134, as well as similar strategies using in the treatments of cancers135 and multidrug-resistant bacterial infections (as has previously been reviewed136).

Unfortunately most drugs—whether developed by academic or commercial institutions—are developed as single agents, and face a range of legal and regulatory issues that might hamper their use in the testing of combination therapies. Thus, a shift in drug-development paradigms towards a more collaborative environment among research bodies and clinicians is imperative for the future development of combinatorial strategies.

Host dependencies as targets of pan-viral therapies

Although the high mutation rates of viruses suggest an unlimited evolutionary potential, a virus that is fully co-adapted to its host will have very few neutral sites in its genome137—which locks the virus into evolutionary stasis and limits marked divergence over the long term. In support of this, an analysis of HBV genomes recovered from prehistoric periods has shown that these viruses were only 1.3–3% divergent from modern circulating strains138,139. This suggests that a viable strategy for antiviral development can be achieved by targeting host dependencies, which can result from indirect or direct interactions between a virus and its host (Fig. 3c).

When considering the inhibition of a host dependency a trade-off exists between viral inhibition and the potential disruption of host cellular functions. A parallel can be observed with cancer therapeutic agents: cancer cells that are heavily reliant on essential host functions can be killed by short-term or partial inhibition against these functions (for example, topoisomerase or proteasome inhibitors), while maintaining minimal long-term damage to the patient. The ideal therapeutic targets for viral infections would be host factors upon which viruses heavily depend, and the short-term or partial inhibition of which over the course of an infection is well-tolerated by the host. Furthermore, if commonalities in host dependencies exist among different viruses, targeting these dependencies might allow the development of broad-spectrum or pan-viral therapeutic agents. This could contribute to combating newly emerging infections that lack efficient antiviral therapies (for example, as in the current COVID-19 pandemic).

Direct dependencies

Viral proteins or RNA may directly interact with host factors to give rise to direct dependencies. The identification of direct host dependencies requires knowledge of host–viral protein–protein and protein–nucleic acid interactions that are shared and important among different viral families. The inhibition of these proteins or processes is therefore likely to have broad-spectrum antiviral effects.

Several viral species require a common set of host factors (collectively known as the IRES trans-acting factors) for viral IRES translation. The inhibition of these factors therefore blocks replication of viruses from several unrelated families. For example, the inhibition of the host ribosome-binding protein receptor for activated C kinase 1 (which is co-opted by many viruses in IRES-mediated translation140) effectively inhibited HCV and herpes simplex virus infection with no significant effect on the viability or proliferation of the human host cells140,141.

Another host dependency is protein localization to the endoplasmic reticulum, which is shared by several evolutionarily distant viruses such IAV, HIV and dengue virus142. As predicted, treatment with small-molecule inhibitors of SEC61 (a protein complex that mediates co-translational translocation in endoplasmic reticulum and endoplasmic reticulum–Golgi intermediate compartments) showed suppression of replication of all three of these viruses in vitro142. Different iterations of SEC61 inhibitors have been shown to effectively suppress Zika virus and coronavirus replication in vitro143,144. Further work is needed to evaluate their activity in vivo, but the underlying general concept is that viruses have a strong requirement—in a small temporal window of active infection—for oxidative folding and modification associated with apical trafficking142,143,144. Along similar lines, host glycosylation enzymes (which are extensively used for viral surface protein modification) have inspired the development of vaccines and therapeutic agents—for example, the use of glycans as vaccine adjuvants for HIV145,146 and antiviral drugs (zanamivir and oseltamivir) for IAV.

Indirect dependencies

An indirect host dependency arises from indirect functional interactions between the virus and a host protein or process. One example of such a dependency is the importance of the host splicing machinery for viruses that replicate in cytosol. For instance, infections with SARS-CoV-2 have been shown to cause a marked increase in spliceosome components in host cells147. Viruses can disrupt host splicing function by triggering nucleo-cytoplasmic translocation and the sequestering of spliceosome components (in the case of rotavirus148,149, which has previously been reviewed150) or by inducing changes in splicing patterns of host cellular genes (in the case of influenza virus149, Zika virus151, human cytomegalovirus152, and in hepatitis B virus- and HCV-related hepatocellular carcinoma153).

The therapeutic targeting of alternative splicing by small molecules or protein inhibitors and antisense oligonucleotides has been proposed in the treatment of cancer, on the basis of the observation of pro-oncogenic isoforms generated by defective alternative splicing (as previously reviewed154,155). Altering the splice pattern of a receptor for viral entry using antisense oligonucleotides could generate a decoy receptor and prevent infection. Overall, the pervasive involvement of host splicing machinery in viral gene expression suggests that modulation of splicing might serve as a promising antiviral therapeutic strategy.


Viruses use a diverse array of noncanonical transcriptional and translational strategies to greatly expand the coding potential of, and add novel functionality to, their small genomes. However, to do so they have relied on unique enzymatic activities or become dependent on host functions. Viral enzymes that have no homology with human enzymes represent ideal targets for the development of virus-specific inhibitors. Host dependencies are also valuable targets as—in many cases—these dependencies exist broadly across different viruses. We surmise that future developments in our biochemical and detailed mechanistic understanding of how viruses make proteins will inform the development of therapeutic agents and vaccines.