Main

The pioneering work that established pseudoknots as a genuine folding motif in RNA was carried out in the laboratories of Cornelis Pleij, Krijn Rietveld and Leendert Bosch in the early 1980s. These authors were investigating how it was possible for the 3′ ends of some plant virus genomes to possess a number of the functional characteristics of transfer RNAs (tRNAs) yet lack an obvious clover-leaf secondary structure. By applying a 'pseudoknot building principle', it became clear how these viral RNAs could fold into L-shaped structures resembling tRNAs1,2. A seminal paper from the same authors3 subsequently defined the general principles of pseudoknot folding and provided the first examples of pseudoknots in other RNAs — the central pseudoknot of Escherichia coli 16S ribosomal RNA (rRNA) and a pseudoknot present in group I introns.

Since then, many more pseudoknots have been discovered, and they are associated with a remarkably diverse range of biological activities (Supplementary information S1, S2 (tables); reviewed in Refs 4–11). They are especially associated with key roles in the replication cycles of numerous animal and plant viruses, including, in humans, the flavivirus hepatitis C virus (HCV)12, the coronavirus responsible for severe acute respiratory syndrome (SARS-CoV)13, the oncogenic retrovirus T-cell lymphotrophic virus types I and II14 and certain strains of HIV15.

The function of a viral pseudoknot is linked logically to its location in the genome (Fig. 1; Supplementary information S1 (table)). So, in non-coding regions (NCRs) of positive-strand RNA viruses (in which the genomic RNA serves as the mRNA template for translation and then as a template for replication), pseudoknots act in the regulation of initiation of protein synthesis and in template recognition by the viral replicase. By contrast, in coding regions they modulate the elongation and termination steps of translation. Fewer pseudoknots have been documented in the mRNAs of viruses with DNA genomes, but several DNA bacteriophage mRNAs are known to encode pseudoknots16. In the 5′ NCRs, these motifs have roles in the regulation of translation initiation, whereas in the coding region they affect translation elongation. Pseudoknots have also been described in the catalytic RNAs of some RNA satellite viruses, where they have a role in genome replication17.

Figure 1: RNA pseudoknots in virus gene expression.
figure 1

A schematic of a generic RNA virus genome is shown. Viral pseudoknots have been described in the 5′ non-coding region (NCR), the coding region, the intergenic region (IGR) and the 3′ NCR, where they function in various steps of the replication cycle. Although the majority of examples are from positive-strand RNA viruses, pseudoknots also have a role in the replication cycles of certain DNA viruses, satellite RNA viruses and viroids. For simplicity, viral pseudoknots involved in long-range interactions (including virus genome circularization) or possessing catalytic activity are not shown, but are discussed in the text.

Here, we review selected, well characterized examples of pseudoknots in virus genomes — with an emphasis on structure–function relationships — highlighting recent advances in our understanding of pseudoknot conformation at high resolution and exploiting, where relevant, our improved knowledge of ribosome architecture.

What is an RNA pseudoknot?

As defined originally3, a pseudoknot is a structure formed upon base-pairing of a single-stranded region of RNA in the loop of a hairpin to a stretch of complementary nucleotides elsewhere in the RNA chain (Fig. 2). Such pseudoknots, referred to as hairpin type (H-type) pseudoknots, have two base-paired stem regions (S1 and S2) and, depending on the number of loop bases that participate in the pseudoknotting interaction18, two or three single-stranded loops (L1, L2 and L3). In most (>85%; Refs 19, 20) H-type pseudoknots, L2 is absent or very short, and the base-paired stems stack coaxially to form a quasi-continuous helix. In these structures, L1 spans S2 and crosses the deep groove of the helix, whereas L3 spans S1 and crosses the shallow groove (Fig. 2).

Figure 2: RNA pseudoknot structure.
figure 2

a | Various structural motifs have been described in RNA149. Orthodox secondary structures consist of base-paired regions (stems) connected by single-stranded loops at stem termini (hairpin loop), or in the body of a stem (bulge (B) or interior (I) loop) or at the junction of several stems (multibranched (M) loop). Pseudoknots are considered as a tertiary structure and form when bases in a loop pair with a single-stranded region elsewhere. The hairpin type (H-type) pseudoknot is by far the most common, and this tertiary interaction involves bases in the loop of a hairpin loop. The resultant structure contains two stem regions, S1 and S2, connected by single-stranded loops. In many cases, no unpaired bases are present between the two stems (L2 is zero), and the stems stack coaxially to give a quasi-continuous helix. b | The secondary structure of the pseudoknot of the ribosomal frameshifting signal of simian retrovirus 1 (SRV-1) is shown alongside three dimensional views of the nuclear magnetic resonance model74. The stems are shown as surface representations and the loops as ribbons (all structural images were prepared using PyMol). The polarity and handedness of the double helix leads to inequivalence of the loops, with L1 (yellow) crossing the deep groove and L3 (green) crossing the shallow groove. S1 is blue, S2 is red, L1 is yellow and L3 is green. L2 is not present in the example shown.

The geometry of the pseudoknot is such that when S2 is six or seven base pairs in length, L1 can be as short as a single nucleotide3,21. However, in some pseudoknots, the loops are much longer and include their own secondary structure elements. Pseudoknots are also formed upon base-pairing of single-stranded bulge (B), interior (I) and multibranched (M) loops with complementary regions elsewhere in the RNA (which themselves can be constrained in a secondary structure, for example in intramolecular hairpin–loop–hairpin–loop (H–H or so-called kissing loop) interactions)22. However, unless all of the loop nucleotides are paired, these pseudoknots are generally considered as H-type pseudoknots, as the additional base-pairing interaction is viewed as a substructure within the loop. For this reason, the B, I and M nomenclature (as well as H–H) is not extensively used, and most structures are referred to as H-type pseudoknots and often simply as pseudoknots. An additional issue regarding nomenclature is loop numbering. In the early pseudoknot literature, most examples did not possess unpaired residues between the two stems, so the convention was to name the groove-spanning loops L1 and L2 (now L1 and L3). Here, we have opted for the L1, L2 and L3 nomenclature, which is more generally applicable.

As will be discussed in more detail below, the biological properties of pseudoknots are intimately linked to their structural features4,5,6,7,18,23. For example, the geometry of the junction between the stems and the interactions that can occur between the constituent loops and stems are often of great functional relevance21,24,25,26,27. Indeed, for many viral pseudoknots, much of the primary sequence is unimportant for function, as long as the conformation and overall stability of the structure is maintained. Where precise nucleotide-sequence requirements have been identified, this is likely to reflect a specific structural necessity, although additional roles might be possible (for example, base-specific recognition by proteins).

Pseudoknots and translation

Internal ribosome entry. Most eukaryotic cellular mRNAs are translated in a cap-dependent manner, with the 40S subunit and associated initiation factors scanning along the mRNA until the start codon (AUG) is reached28. Efficient translation also requires mRNA circularization, which is brought about by the interaction of the 3′-end poly(A) tail-binding protein (PABP) with initiation factor 4E (eIF4E), bound to the 5′ cap29. However, the genomes of many positive-strand RNA viruses often lack a cap, a poly(A) tail or both, and translation initiation involves non-standard mechanisms30,31.

One such example is cap-independent internal ribosome entry, in which the ribosome is recruited internally to a structured region of the mRNA (the internal ribosome entry site (IRES), usually located in the 5′ NCR) and often directly to the start codon. Pseudoknots have been identified in a number of IRESs, and their function is best exemplified in the IRESs of the flavivirus HCV and the dicistrovirus cricket paralysis virus (CrPV) (Fig. 3). Unlike the mammalian picornaviral IRESs, which retain a requirement for many or most canonical initiation factors, the mammalian 40S subunit can associate with the HCV IRES in the absence of translation initiation factors, although the formation of the 48S complex (with the initiation codon locked into the mRNA-binding cleft of the small subunit) requires the participation of the eIF2–GTP–Met-tRNAi ternary complex and eIF3 (reviewed in Ref. 32). The HCV IRES forms a defined secondary structure that contains two major hairpins (domains 2 and 3) and an essential pseudoknot structure12 at the base of domain 3 (domain 3e/f) (Fig. 3). IRES function requires domains 2 and 3, but initial binding of the 40S subunit is mediated principally by the basal region of domain 3, including stem–loop 3d, with a modest contribution from the pseudoknot33,34. Comparisons of HCV IRES–40S (Ref. 35) and IRES–80S (Ref. 36) complexes have revealed that the overall appearance of the IRES is similar in the two complexes36. In the 80S complex, the pseudoknot corresponds to an L-shaped density located at the mRNA exit channel in the vicinity of ribosomal protein S5 (rpS5) (Fig. 3; Supplementary information S3 (figure)). The assignment of the pseudoknot to the L-shaped density is based on molecular modelling and is consistent with recent crosslinking data37. The three-dimensional (3D) structure of the HCV pseudoknot is not available, but in essence it is an H-type pseudoknot in which L1 is long and highly structured, and includes the entirety of domains 3a–3e. Why a pseudoknot is present at this key location of the IRES is unclear, but it might be linked to a capacity to bind rpS5 (Ref. 37). This protein is present at the mRNA exit channel and also associates with IRES domain 2 (Refs 36, 38). Boehringer and colleagues propose that the HCV IRES domains function synergistically to position the AUG into the ribosomal peptidyl (P) site, coupled to movement of the pseudoknot36. In this model, a conformational change of the four-way junction (which includes domains 3a and 3c; Fig. 3) pivoted around domain 3d is transmitted to the pseudoknot, which moves towards the mRNA exit channel, positioning the AUG correctly into the ribosomal P site and allowing subunit joining. The movement could be potentiated by eIF3 binding, as this multisubunit initiation factor makes intimate contacts with the HCV IRES39.

Figure 3: Pseudoknots and internal ribosome entry.
figure 3

a | A secondary structure representation of the hepatitis C virus (HCV) internal ribosome entry site (IRES) with the pseudoknot shown in blue. b | A surface representation of the human 80S ribosome (grey) in complex with the HCV IRES (red) derived from the cryo-electron microscopy (cryo-EM) structure36. Density corresponding to the pseudoknot is indicated in blue. c | A secondary structure representation of the Plautia stali intestine virus (PSIV) IRES is shown above a ribbon representation of the RNA (domains 1 and 2) derived from the crystal structure27. Domain 3 remains to be solved. The secondary structure of the cricket paralysis virus (CrPV) IRES (not shown) is similar. d | A surface representation of the yeast 80S ribosome (grey) is shown in complex with the CrPV IRES (red) derived from the cryo-EM structure26. Below is a fit of the density to the modelled CrPV IRES showing the interactions that occur between the various domains and ribosomal components. Ribosomal proteins are cyan, 25S ribosomal RNA (rRNA) is purple and 18S rRNA is brown.

Pseudoknots also have an essential role in the function of the intergenic region (IGR) IRES of CrPV40 and other dicistroviruses41. This remarkable IRES, only 200 nucleotides in length, has been described as an RNA-based translation factor42 because it recruits ribosomes and activates translation without the involvement of initiation factors or initiator tRNA. The ribosomal 40S and 60S subunits bind directly to the IRES43, which then occupies the ribosomal intersubunit space of the 80S complex and interacts with key components that form the ribosomal aminoacyl (A), P and exit (E) sites.

The three defined domains of this IRES have distinct functional tasks (Fig. 3). Domain 1 contributes to interactions with the 60S subunit in the E- and P-site regions, and domain 2 interacts with the 40S subunit at the E site. Domain 3, which is located predominantly between the A and P sites and is in a similar orientation to ribosome-bound tRNAs, places its most 3′ nucleotide triplet (GCU) into the decoding region of the A site11. Here, it binds the anticodon of tRNAAla, which is brought to the ribosome as part of the ternary complex (elongation factor 1A (eEF1A)–GTP–tRNAAla). Subsequently, tRNAAla is pseudotranslocated (without peptide-bond formation) by eEF2 into the P site, allowing delivery of the next tRNA into the A site and authentic elongation to begin41,43,44.

Modelling and structural analysis of the CrPV IRES and IRESs from related viruses, including Plautia stali intestine virus (PSIV)26,27,45,46, has revealed that the structure is dominated by three H-type pseudoknots (Fig. 3), one per domain. Pseudoknot PKI, which essentially forms all of domain 3, is characterized by the possession of an AU-rich S1 and short loops. The folding of the rest of the IRES is dominated by interactions between pseudoknots PKII and PKIII (Fig. 3). The pseudoknot of domain 2, PKIII, is a nested pseudoknot in that it is entirely contained within L3 of the pseudoknot of domain 1, PKII. Cryo-electron microscopy (cryo-EM) and X-ray crystallography studies26,27 (reviewed in Ref. 11) have indicated a complex folding strategy that forces two small hairpins of PKIII, present as substructures in L1 (stem–loop (SL) IV) and L3 (SL V), to project from a central core and emerge on the same side of the structure to make vital interactions with rpS5 at the E site46,47,48. Pivotal to this folding strategy are pseudoknot loop–helix interactions. In PKIII, for example, an L1–S2 major-groove interaction positions SL IV, and a second interdomain interaction occurs between the minor groove of S2 and four L1 bases of the adjacent pseudoknot PKII, which stabilizes the core. The geometry of the overall fold is such that S2 of PKII stacks on S2 of PKIII to create a wedge-shaped section that occupies the mRNA channel and directs PKI into the decoding site.

Recruitment of the first aminoacyl tRNA to the ribosome requires the participation of eEF2 (Ref. 49). The activation of eEF2 has been linked to an IRES-mediated destabilization of a conserved cellular pseudoknot that is present in helix 18 of the small subunit rRNA (the so-called 530-loop pseudoknot; Supplementary information S2 (table)). This region of rRNA is involved in the enhancement of translational accuracy and tRNA binding50, and its destabilization (probably by the pseudoknot of domain 3) is likely to be crucial to the IRES mechanism.

The structure of the IGR IRESs illustrates how pseudoknots can be used to direct the global folding of an RNA sequence. The pseudoknot motif naturally provides the potential for coaxial stacking of constituent helices, but also the opportunity for additional stacking with helices present in constituent loops. This allows longer helical domains to be generated, a common feature in the organization of global RNA structures51. If the pseudoknot is itself nested in another pseudoknot, additional helical stacking possibilities are created. Superimposed on this is the capacity of the single-stranded loops to interact with constituent stems to add stability, or with other regions of RNA to promote packing of adjacent helices. The compact and complex fold of the IGR IRESs brought about by the nested pseudoknots is a strategy similar to that used to fold certain ribozyme cores, discussed in more detail below. There is no known ribozyme activity associated with IRESs, however, which indicates that this is a general folding strategy that can be used to satisfy different mechanistic requirements.

Autoregulation. One of the first viral pseudoknots to be described16 is encoded by T-even bacteriophages (such as T2, T4 and T6) and functions in translational autoregulation of the gene 32 protein (gp32), a single-stranded DNA-binding protein that mainly functions in replication of the viral double-stranded genomic DNA. The pseudoknot is located some 40 nucleotides upstream of the translation start site (AUG) of the gp32 mRNA and acts as a specific binding site for gp32 itself. At low protein concentrations, pseudoknot-bound gp32 does not overlap the ribosome-binding site, but as protein levels increase, cooperative binding of multiple copies occurs, nucleated at the pseudoknot-bound gp32 (Ref. 52). This assembly blocks access to the Shine–Dalgarno sequence and so represses translation. Unfortunately, molecular details of the gp32–pseudoknot interaction are lacking, but the structure of the pseudoknot isolated from bacteriophage T2 has been solved by nuclear magnetic resonance (NMR)53. It is a classic H-type pseudoknot with short loops and coaxially stacked stems, albeit with S2 rotated by 18˚ with respect to S1 to relieve close phosphate–phosphate contacts at the junction while preserving the stabilizing effects of base stacking. Although the T2 pseudoknot possesses only a single nucleotide in L1 (an A), this is stereochemically feasible as the distance between the two phosphates across the deep groove of A-form RNA reaches a minimum when six or seven base pairs of S2 are bridged3,21.

Although the gene 32 system represents the only known viral example of pseudoknot involvement in the autoregulation of translation initiation, there are related examples in cellular mRNAs (Supplementary information S2 (table)). For example, translational repression of the ribosomal S4 α mRNA operon54,55 and autoregulation of ribosomal protein S15 synthesis56 requires specific binding of the respective proteins to pseudoknots in the 5′ untranslated region (UTR) of their own mRNAs.

Frameshifting. RNA pseudoknots in coding regions are principally associated with sites of programmed −1 ribosomal frameshifting. This is a translational mechanism used by many viruses to coordinately express two proteins from a single mRNA at a defined ratio7,57. During elongation, ribosomes decode the mRNA in triplet steps and the reading frame is accurately maintained. However, in frameshifting, the ribosome is forced to shift one nucleotide backwards into an overlapping reading frame and translate an entirely new sequence of amino acids (Fig. 4). In retroviruses, frameshifting at the overlap of the gag and pol open-reading frames (ORFs) allows expression of the viral Gag–Pol polyprotein and sets a defined cytoplasmic Gag:Gag–Pol ratio that is optimized for virion assembly and packaging of reverse transcriptase58. In other RNA viruses, frameshifting allows expression of RNA-dependent RNA polymerases (RdRps)59. Maintaining a precise efficiency of frameshifting has been shown to be crucial to the replication of HIV-1 (Ref. 60) and the retrovirus-like double-stranded RNA virus of yeast, L-A61. Similarly, in other RNA viruses, changing the stoichiometry of non-frameshifted and frameshifted products is also likely to be detrimental. In SARS-CoV, for example, components of the viral replication machinery present in the viral polyproteins pp1a and pp1ab (which are expressed by frameshifting) are predicted to form a heterodimer with a stoichiometry of 8:1 (Refs 62, 63), a ratio that is consistent with the natural level of frameshifting64,65. For these reasons, frameshifting has emerged as a potential target for antiviral therapeutics.

Figure 4: Pseudoknots and ribosomal frameshifting.
figure 4

a | The overlapping coding sequences open reading frame 1a (ORF1a) and ORF1b of the genome of the coronavirus infectious bronchitis virus (IBV) are shown above the minimal frameshift-promoting sequences of this virus. The pseudoknot promotes frameshifting at the slippery sequence, indicated by a jagged arrow. b | A representation of the stalled, pseudoknot-engaged rabbit 80S ribosome is shown derived from the cryo-electron microscopy structure71. The 60S subunit is light grey and the 40S subunit is dark grey. The peptidyl (P)-site transfer RNA (tRNA) stalled in the complex is coloured turquoise, the eukaryotic translocase, elongation factor 2 (eEF2), is purple and the pseudoknot structure is red. Below is a schematic of the stalled ribosome. Engagement with the pseudoknot generates a frameshifting intermediate in which the ribosome is stalled during translocation with eEF2 bound, generating tension in the mRNA that bends the P-site tRNA in a(+) sense direction. As a result, the anticodon–codon interaction breaks over the slippery sequence, allowing a spring-like relaxation of the tRNA in a (−) sense direction. c | A close-up view of the pseudoknot (PK) in the stalled complex, with the ribosomal components (rpS0, rpS2, rpS3 and rpS9) in close proximity highlighted. d | The beet western yellows virus (BWYV) pseudoknot73 is illustrated to show examples of features that might confound the ribosomal helicase70. Shown from left to right are: a secondary structure model of the pseudoknot, with U13 drawn to indicate its extrusion form the helix; a ribbon representation of the X-ray structure; the L3–S1 (loop 3–stem 1) triplex interaction, with S1 shown as a transparent surface and the helix within as a purple ribbon; and the BWYV junction quadruple interaction, with hydrogen bonds shown as dashed lines. A triplex also forms at the junction of the two stems (not highlighted). RACK1, receptor for activated C kinase 1. Panels b and c are modified with permission from Ref. 71 © MacMillan Publishers Ltd.

A typical frameshift signal has two essential elements: a heptanucleotide 'slippery' sequence, at which the ribosome-bound tRNAs slip into the −1 frame, and an adjacent mRNA secondary structure, most often an H-type pseudoknot14,66, that stimulates this slippage process (Fig. 4). How the pseudoknot promotes frameshifting is not completely understood, but the mechanism is likely to be linked to the helicase activity of the ribosome, with the pseudoknot presenting an unusual topology that resists unwinding47,67,68,69,70,71. Takyar and colleagues have shown that the prokaryotic 70S ribosome can itself act as a helicase to unwind mRNA secondary structures before decoding70, with the active site located between the head and shoulder of the 30S subunit. Prokaryotic ribosomal proteins S3, S4 and S5 that line the mRNA entry tunnel are implicated in the helicase activity70. Recent cryo-EM images of mammalian 80S ribosomes paused at a coronavirus frameshift-promoting pseudoknot revealed that the ribosomes become stalled during the translocation phase of the elongation cycle with eEF2 bound and in the act of transferring peptidyl-tRNA from the A site to the P site71. Furthermore, the P site tRNA is structurally distorted and in a spring-like conformation. Consistent with its proposed function, the pseudoknot is present at the mRNA entry channel in close proximity to the proteins (rpS3, rpS9 and rpS2) that are likely to form important elements of the mammalian 80S helicase (Fig. 4). Namy and colleagues71 suggest a mechanical model of frameshifting in which the pseudoknot resists unwinding by the helicase, compromising translocation by putting tension on the mRNA, leading to bending of the tRNA anticodon and, ultimately, repositioning the tRNA on the slippery sequence in the −1 reading frame. The pseudoknot is also in close proximity to the ubiquitous ribosomal regulatory protein RACK1 (receptor for activated C kinase 1)72. It is not known whether this protein, or the recruited kinase, has a role in frameshifting, but this could in principle provide a route to regulation of frameshifting during virus infection.

X-ray crystallography and NMR analysis of frameshift-promoting pseudoknots, coupled with functional studies, have revealed features that could account for an intrinsic resistance to unwinding by the ribosomal helicase (Fig. 4). Chief among these is the presence of extensive minor-groove triplex interactions between S1 and the crossing L3, first observed in the pseudoknot of the luteovirus beet western yellows virus (BWYV; Fig. 4)73 and subsequently in related luteoviruses and the gag/pro pseudoknot of simian retrovirus 1 (Ref. 74). The S1–L3 RNA triplex is likely to be the first feature encountered by the elongating ribosome, and the presence of the 'third strand' would conceivably confound unwinding, at least temporarily.

Another feature of functional importance is the architecture of the junction between the constituent pseudoknot helices. Several non-canonical interactions have been described in and around the junction, including base triples, base quadruples and loop–loop Hoogsteen base pairing as well as distortions such as over-rotation of the stems, helical displacement and bending7,75 (Fig. 4). The first NMR study of a frameshift-promoting pseudoknot, from the gag/pro overlap of mouse mammary tumour virus, had indeed indicated a requirement for a specific bent conformation of the pseudoknot, brought about, in part, by the presence of an intercalated unpaired adenosine residue between the two stems76. However, closely related pseudoknots with coaxially stacked stems can clearly stimulate high levels of frameshifting74,77 and it now seems likely that the specific interactions and resultant architectures of the helical junction that are required for frameshifting are strongly context dependent75. Nevertheless, there is no doubt that the junction conformation is crucial to function and might also present a kinetic or thermodynamic barrier to unwinding78,79.

Several cellular homologues of viral frameshift signals have been identified and appear to be derived from endogenous retroviruses80,81,82 (although some recently identified candidates may have a different origin83). The pseudoknots within these genes are fundamentally similar in structure to — and retain the functionality of — their viral progenitors, and have a role in the regulation of gene expression during embryogenesis. Another cellular pseudoknot with close structural similarity to viral frameshift pseudoknots is present in bacterial transfer messenger RNA (tmRNA)84,85,86. This RNA, which contains four pseudoknots (PKI–IV), functions by binding to, and ultimately rescuing, ribosomes stalled on damaged mRNAs by a process termed trans translation86. PKI of tmRNA shows considerable similarity to the BWYV family of pseudoknots, including short stems and loops, extensive base stacking, stabilization of the stem junction by a triplex, exclusion of a uridine at the junction of the two stems and insertion of L3 into the minor groove of S1 (Ref. 84). PKI functions at a different site on the ribosome than frameshift-promoting pseudoknots, however, and its precise role is uncertain. A growing number of pseudoknots have been shown to interact with the ribosome, and it is clear that they can influence function from different sites. This is illustrated in Supplementary information S3 (figure), in which the location on the ribosome of the pseudoknots involved in IRES function, frameshifting and trans translation are compared.

Pseudoknots can also promote +1 ribosomal frameshifting. The best characterized examples come from the cellular gene ornithine decarboxylase antizyme87. However, pseudoknot-dependent +1 frameshifting also appears to be used in the expression of certain structural proteins of the phage of Scott A (PSA) bacteriophage that infects Listeria monocytogenes 88, although this remains the only viral example so far.

Stop codon readthrough. In the gammaretroviruses, which are typified by murine leukaemia virus (MuLV), gag and pol are in the same reading frame and are separated by an UAG stop codon. Here, Gag–Pol synthesis requires periodic misreading of the stop codon as a sense (Gln) codon, a process known as termination codon suppression or readthrough. Efficient readthrough is promoted by an H-type pseudoknot spaced precisely eight nucleotides downstream of the stop codon89,90. Retroviral readthrough and frameshift signals are thus similar in terms of overall organization, with the recoding site (a stop codon or a slippery sequence) separated from the pseudoknot by a spacer of similar length. There is little variation in the size of gammaretroviral pseudoknots and considerable primary sequence conservation, especially in the spacer region and L3 of the pseudoknot. Many of these bases are functionally essential89,91,92, but have not yet been linked to an explicit structural role92,93. No high-resolution structures of these pseudoknots are available as yet, and it remains to be seen whether they interact with the ribosome in a manner similar to the pseudoknots involved in frameshifting. However, it is plausible that interactions of the readthrough pseudoknot with the helicase or other ribosomal components could modulate release-factor access or activity and promote misreading of the stop codon by the near-cognate tRNAGln. It has already been shown that sequestration of eRF1, in this case as a result of direct binding by the MuLV reverse transcriptase, can lead to increased readthrough94.

Pseudoknots in the 3′ NCR

Translation and replication. Several positive-strand RNA viruses harbour pseudoknots in the 3′ NCRs (Supplementary information S1 (table)). The best studied examples come from plant viruses and, in particular, the pseudoknot that induces the formation of the tRNA-like structure (TLS) at the end of the genome of turnip yellow mosaic virus1,2 (TYMV; Fig. 5). This short H-type pseudoknot forms part of the aminoacyl acceptor arm of the TLS, stacking against an adjacent hairpin (itself the equivalent of the tRNA T-stem–loop) to generate a quasi-continuous helix that mimics the acceptor arm of tRNA and that can be adenylated and aminoacylated with Val95. NMR studies of this region of the TLS have revealed that the single-stranded loops of the pseudoknot introduce minimal distortion in the A-form helical shape of the molecule. The major groove comfortably accommodates the crossing L1 residues, and L3 is closely anchored to S1 through triplex interactions centred around the adenosine residue in L3. Indeed, this study was the first to reveal the potential for interplay between pseudoknot loops and stem regions21.

Figure 5: Pseudoknots and transfer RNA-like structures.
figure 5

a | The 3′ end of the turnip yellow mosaic virus (TYMV) genomic RNA. In the upper panel, the predicted secondary structure is shown, with pseudoknotting interactions indicated by dashed red lines. Below is a secondary structure representation of the folded molecule, showing the transfer RNA (tRNA)-like structure (TLS) and the pseudoknots in the acceptor arm and upstream of the TLS (UPK). The ribbon representation (boxed) is derived from the nuclear magnetic resonance structure of the acceptor arm pseudoknot21. b | For comparison, secondary structure representations of tRNAPhe are also shown. D, dihydrouridine modified bases; T, ribothymidine base.

The tRNA mimicry accounts for the reactivity of the 3′ end of the viral genome with several enzymes that recognize tRNA. These include the CCA nucleotidyl transferase, which adds a 3′ terminal A to complete the 3′ CCA end of the genome upon infection; valyl tRNA synthetase, which aminoacylates the 3′ end with Val; the elongation factor eEF1A, which binds to the TLS to give a viral RNA–eEF1A–GTP ternary complex; and the viral replicase p69/p206 (Refs 96, 97).

An elegant relationship has been unearthed between the TLS-specific reactivities and the TYMV lifecycle. Upon entry into the cell, the 3′ CCA end is completed and aminoacetylated, at which point translation of the input virus genome, an obligatory step in the production of the virus replicase, is stimulated synergistically by the 5′ cap and the TLS97. The stimulation of translation is maximal when the genome is aminoacetylated and is linked to the formation of a viral RNA–eEF1A–GTP ternary complex97. How eEF1A enhances translation is not known, but the available evidence hints at an unexpected involvement of this elongation factor in the process of translation initiation. It has been suggested98 that binding of the viral RNA ternary complex to the A-site of initiating ribosomes could stimulate initiation at the 5′ end (perhaps in a manner similar to the pseudoknot of the IGR IRES domain 3 discussed above), but recent evidence argues against this specific mechanism99. Nevertheless, close proximity of the virus genome ends during translation (the circularization observed for most cellular mRNAs) could offer an explanation for the translational synergy afforded by the 5′ cap and the 3′ TLS–eEF1A complex.

The TLS of TYMV is also crucial in the switch between translation and replication. As has been elegantly illustrated for bacteriophage Qβ100 and poliovirus101,102, the movement of ribosomes in the 5′→3′ direction is incompatible with negative-strand RNA synthesis, in which the RNA polymerase travels 3′→5′. Thus, following an initial burst of translation, the viral mRNA must be cleared of ribosomes to allow replication from the 3′ end. Negative-strand synthesis in TYMV is initiated from the second C of the 3′ CCA triplet following pseudoknot-dependent binding of the replicase to the TLS103,104. It has been demonstrated recently that this reaction is inhibited by the binding of eEF1A to the valylated TLS105. So, translation is favoured until the levels of viral RdRp are sufficient to compete with eEF1A for binding to the TLS or perhaps until genomes are sequestered into vesicular sites of virus replication, which might be free of competing eEF1A and aminoacyl tRNA synthetases. The TLS might also be involved in genome packaging: in the related brome mosaic virus, viral RNAs lacking the TLS fail to assemble into virions106.

In TYMV, a second pseudoknot is present immediately upstream of the TLS, and this pseudoknot also contributes to translational enhancement, probably by acting as a spacing element to present the functional TLS to enzymes97 (Fig. 5). The presence of such upstream pseudoknots is not uncommon and indeed, tobamo-, hordei-, furo-, pumo- and certain tymoviruses boast clusters of pseudoknots (between two and seven) in this region107, which is termed the upstream pseudoknot domain (UPD). So, in tobacco mosaic virus (TMV), for example, a total of five pseudoknots are present in the 3′ UTR, three of which form the quasi-continuous double helical stalk of the UPD (5′ PKIII, PKII, PKI 3′) and two of which (5′ PKb, PKa 3′) are present in the TLS108. In the TLS, PKa forms part of the acceptor arm and PKb forms the central core, imposing the tRNA-like shape and orienting the UPD109. PKb is a B-type pseudoknot and forms a Y-shaped three-way junction that is proposed from modelling to have some structural similarity to the ribozyme of hepatitis delta virus (HDV; see below), although it displays no catalytic activity109.

Despite the complexity of folding of the TMV TLS, the UPD seems to have usurped its role, at least in terms of translational enhancement110. Although the TLS can be aminoacetylated with His and can bind eEF1A22, the major player is the UPD, in which the highly conserved PKII and PKI can be crosslinked to eEF1A, independently of TLS aminoacylation111, and form an essential element of the promoter for negative-strand synthesis112. The UPD can also bind to the heat-shock chaperone HSP101, although the protein does not appear to be essential for efficient replication of TMV113. It seems that there is a general requirement for the binding of certain cellular proteins to the 3′ end that does not necessarily have to be mediated solely by a TLS.

Another model system used to study the role of viral NCR pseudoknots in facilitating translation, replication and the switch between the two is tomato bushy stunt virus (TBSV)114,115,116,117. The TBSV genome is uncapped and lacks a poly(A) tail. It recruits the translation machinery initially to the 3′ NCR, a process requiring a highly structured 3′ cap-independent translational enhancer (3′ CITE), comprising a Y-shaped hairpin domain upstream of a 3′-end pseudoknot. It is thought that upon infection, the pseudoknot is folded (the closed conformation) and translation factors gather on the 3′ CITE. Translation initiation complexes are then transferred to the vicinity of the 5′ NCR by virtue of genome circularization, mediated, at least in part, by a kissing-loop interaction between a stem–loop in the 3′ CITE and a partner close to the 5′ end of the genome (SL III). Translation complexes then scan to the first AUG and begin protein synthesis. Subsequently, the pseudoknot at the 3′ end of the genome is destabilized, possibly by binding of the accumulating viral RdRp, yielding an 'open' complex that is compatible with negative-strand synthesis117. Another pseudoknot (PK-TD1) in the 5′ NCR is also required for efficient replication, although its mechanism of action is not fully understood.

Pseudoknots in non-coding regions have also been documented in animal RNA viruses, with the vast majority known to be essential for virus replication (Supplementary information S1 (table)). Mostly, they function in translation, genome replication or the switch between the two. However, they are not well characterized structurally (at high resolution) and few details of the molecular interactions in which they participate are available.

Pseudoknots and ribozymes

Hepatitis delta virus. HDV is a satellite RNA virus of humans that replicates in association with a helper virus, hepatitis B virus17. The circular, single-stranded RNA genome of HDV is replicated through an intermediate, the antigenome, by a double rolling-circle mechanism that requires self-cleavage by closely related genomic and antigenomic versions of a ribozyme. The HDV ribozymes have been extensively studied as a model for the mechanism of catalytic RNAs9 and fold into similar structures, characterized by a nested double pseudoknot that helps form the catalytic site and brings great stability to the RNA118 (Supplementary information S4 (figure)). The nested double-pseudoknot motif is also common in cellular ribozymes (Supplementary information S2 (table)), for example, the HDV-like ribozyme present in an intron of the gene that encodes human cytoplasmic polyadenylation binding protein 3 (Ref. 119) and the glmS riboswitch ribozyme present in the 5′ UTR of the gene that encodes glucosamine-6-phosphate synthase in numerous Gram-positive bacteria120,121. The intricate connectivity that results from nesting two pseudoknots within each other makes it possible for these short RNAs to adopt complex and stable 3D folds.

A viral telomerase pseudoknot

Telomerases are ribonucleoprotein complexes responsible for the maintenance of telomeres10. In addition to a specialized reverse transcriptase (TERT), an RNA component (telomerase RNA (TR)) is present and includes the RNA template for telomere addition and a highly conserved pseudoknot necessary for telomerase activity. Like frameshift-promoting pseudoknots, the TR pseudoknot has extensive triplex interactions at the junction of the two stems25 (Supplementary information S5 (figure)). It has been proposed that the pseudoknot region makes contacts with the TERT122 and is needed for telomere repeat-addition processivity123. Indeed, a switch between the pseudoknot and a partially unfolded form might be important in the translocation of telomerase during telomere addition124,125,126. High levels of telomerase activity are detected in a large range of cancers and are closely associated with the immortalization process127,128. Recently, a chicken TR homologue, vTR, has been described in the oncogenic herpesvirus Marek's disease virus129. Deletion of the two copies of vTR from the virus genome led to reduced incidence and severity of lymphoma in infected chickens, indicating that vTR has the attributes of an oncogene130. An understanding of how vTR supports lymphomagenesis in genetic and molecular terms has the potential to yield new insights into telomerase function in cancer131.

The versatile pseudoknot

The development of sophisticated algorithms (Box 1; Table 1) to scan genomic sequences for RNA structural motifs has highlighted the prevalence of the pseudoknot fold in the RNA universe, indicating evolutionary selection. A plausible explanation for this abundance is that pseudoknots offer functional versatility. Some pseudoknots (for example the BWYV frameshift pseudoknot) are extensively stabilized by both Watson–Crick and non-Watson–Crick interactions73 and are more stable than an equivalent hairpin (containing a base-paired stem that is equivalent to S1 plus S2). Thus, a very stable motif can be included in an RNA when space — either in terms of genomic coding capacity or molecular dimensions — is at a premium. At the other end of the spectrum, many pseudoknots are considerably less stable than their equivalent hairpin and could conceivably act as regulatory switches, oscillating between stem–loop and pseudoknot conformations in response to environmental signals132. Pseudoknots also offer binding sites for proteins or single-stranded loops of RNA. The often extensive intra- and intermolecular contacts that pseudoknots engage in provide many targets for such interactions. Indeed, the in vitro selection of RNA aptamers that bind various biomolecules often generates pseudoknotted RNAs (Box 2). Pseudoknotting can also be the most efficient way of folding RNAs in an active conformation (for example, ribozymes).

Table 1 Examples of pseudoknot prediction programmes

Long-range interactions are also facilitated by this motif to organize global folding (for example, in the ribosome itself; Supplementary information S2 (table)) and to link separate domains of RNA together. In bacteriophage Qβ, although the viral replicase is anchored at a site some 1.2 kb from the 3′ end of the genome where replication initiates, an adjacent pseudoknot forms a long-range interaction that brings the 3′ end to the active site of the polymerase133. Such connections also offer the potential for global regulation of gene expression. In barley yellow dwarf virus, L3 of the frameshift-promoting pseudoknot is over 4 kb in length (of a 5.6-kb genome) and links the frameshift region with the 3′ end of the genome134. It has been suggested that disruption of this long-range contact is a means to regulate ribosome and replicase traffic on the viral genome135.

Especially in RNA viruses, there is often enormous selective pressure, and genomes that are optimized for viral fitness rapidly dominate. Although it is plain that some or all of the features of pseudoknots discussed above could be reproduced in other ways, it might be that this can only be achieved at considerable cost to fitness.

Future perspectives

Although 25 years have passed since the first pseudoknot was described1, much remains to be discovered. Foremost among the tasks ahead is to determine the true frequency of these motifs in natural RNAs. The majority of pseudoknots described in this article were identified as part of the normal process of scientific research, but computational approaches to their identification will be of increasing relevance. To maximize the proportion of genuine candidate pseudoknots identified in such screens, a better understanding of the thermodynamic parameters that govern pseudoknot formation will be invaluable. Additionally, we will need more information about the structure and function of pseudoknots. Until recently, atomic resolution structural information has been restricted to short, often very stable, pseudoknots. Elucidating the cryo-EM and crystal structures of the IGR IRESs of CrPV26 and PSIV27 were huge steps forward, but similar monumental efforts will be required to solve those structures in which the pseudoknot orchestrates the folding of long and complex domains, as is found in many viral NCRs. Several viral pseudoknots remain poorly characterized, both in terms of structure and biological activity. A number of these have been shown to interact with proteins, both viral (including RdRp) and cellular, yet molecular details are lacking. Furthermore, we do not know whether pseudoknots are ever substrates for viral or cellular helicases present in infected cells; such activities could offer an extra level of control of virus gene expression and replication. The use of pseudoknots in antiviral strategies should also be considered more broadly. Antisense oligomers targeting the SARS frameshift-promoting pseudoknot have marked antiviral activity136, and pseudoknotted aptamers have been shown to block HIV replication (Box 2).

The study of viral pseudoknots will continue to excite and challenge researchers. It is unquestionable that many more pseudoknots remain to be discovered, and it is highly likely that some of these will possess novel activities or have unprecedented functions. The field has come a long way since the early experiments of Pleij and his collaborators, yet there is still much to be discovered about these fascinating motifs.