Enteroviruses comprise a large group of mammalian pathogens that includes poliovirus. Pathology in humans ranges from sub-clinical to acute flaccid paralysis, myocarditis and meningitis. Until now, all of the enteroviral proteins were thought to derive from the proteolytic processing of a polyprotein encoded in a single open reading frame. Here we report that many enterovirus genomes also harbour an upstream open reading frame (uORF) that is subject to strong purifying selection. Using echovirus 7 and poliovirus 1, we confirmed the expression of uORF protein in infected cells. Through ribosome profiling (a technique for the global footprinting of translating ribosomes), we also demonstrated translation of the uORF in representative members of the predominant human enterovirus species, namely Enterovirus A, B and C. In differentiated human intestinal organoids, uORF protein-knockout echoviruses are attenuated compared to the wild-type at late stages of infection where membrane-associated uORF protein facilitates virus release. Thus, we have identified a previously unknown enterovirus protein that facilitates virus growth in gut epithelial cells—the site of initial viral invasion into susceptible hosts. These findings overturn the 50-year-old dogma that enteroviruses use a single-polyprotein gene expression strategy and have important implications for the understanding of enterovirus pathogenesis.
Enteroviruses are ubiquitous worldwide, highly infectious and environmentally stable. Whereas many infections are mild or asymptomatic, some serotypes can cause severe and even fatal disease. Symptoms include fever; hand, foot and mouth disease; myocarditis; viral meningitis; encephalitis; acute haemorrhagic conjunctivitis and acute flaccid paralysis. Although the enterovirus that causes poliomyelitis has been eradicated from much of the globe, other emerging enteroviruses can cause severe polio-like symptoms1. The Enterovirus genus belongs to the Picornaviridae family. Members have monopartite linear positive-sense single-stranded RNA genomes of about 7.4 kb that are encapsidated into non-enveloped icosahedral virions. Thirteen species (Enterovirus A–J and Rhinovirus A–C) and more than 70 serotypes have so far been defined. The virus genome contains a single long open reading frame (ORF), which is translated as a large polyprotein that is cleaved to produce the viral capsid and non-structural proteins2 (Fig. 1a). The 3ʹ end of the genome is polyadenylated and contains signals that are involved in replication and genome circularization. The 5ʹ end is covalently bound to a viral protein VPg and the 5ʹ untranslated region (UTR) harbours an internal ribosome entry site (IRES) that mediates cap-independent translation.
The enterovirus IRES comprises several structured RNA domains denoted II to VI (Fig. 1a). Ribosome recruitment requires the eukaryotic initiation factors eIF2, eIF3, eIF4A, eIF4G, eIF4B and eIF1A, but not the cap-binding protein eIF4E3. Domain VI (dVI) comprises a stem-loop containing a highly conserved AUG codon (586AUG in poliovirus) in a poor initiation context. The dVI AUG plays an important role in stimulating the attachment of 43S ribosomal pre-initiation complexes to the viral mRNA, which then scan or otherwise migrate to the polyprotein initiation site downstream (743AUG in poliovirus)4,5,6. In poliovirus type 1 (PV1), the dVI AUG is followed by a 65-codon upstream ORF (uORF) that overlaps the polyprotein ORF (ppORF) by 38 nucleotides (nt) and some other enteroviruses contain a similarly positioned uORF3,7. However, several earlier studies have indicated that the dVI AUG is not utilized for initiation6,7,8 and the 6.5–9.0 kDa protein that might result from uORF translation has never been detected in enterovirus-infected cells. The ‘spacer’ sequence between dVI and the polyprotein AUG contains little obvious RNA structure and is not particularly well-conserved at the nucleotide level. Despite three decades of research, its function remains unknown.
Here we performed a comparative analysis of >3,000 enterovirus sequences, from which we show that the uORF is largely conserved in major enterovirus groups and the encoded amino acids are subject to strong purifying selection, thus indicating that it encodes a functional protein. We used ribosome profiling to demonstrate translation of the uORF in three enterovirus species. Moreover, we show that knocking out the expression of the uORF protein (termed UP) significantly attenuates virus growth in differentiated mucosa-derived human intestinal organoids but not in standard cell culture systems, which suggests a specific role for UP during the establishment of productive virus infection in gut epithelia in the initial stages of virus invasion of susceptible hosts.
We obtained all full-length enterovirus sequences from GenBank, clustered these according to species and identified the dVI AUG in each. Sequences were defined as harbouring the uORF if the ORF beginning with this AUG codon overlapped the 5ʹ end of the ppORF and contained at least 150 nt upstream of the polyprotein AUG codon. The majority of Enterovirus A, B, E, F and G sequences and around half the Enterovirus C sequences contain an intact uORF (Fig. 1b). In contrast, the uORF is absent from Rhinovirus A, B and C, and Enterovirus D sequences. Clades that lack the uORF, particularly the rhinoviruses, tend to have a much shorter spacer between the dVI AUG and the polyprotein AUG (Fig. 1c). Although Enterovirus D sequences have a mid-sized spacer (Fig. 1c), the dVI AUG-initiated ORF has just 5 codons in 437 of 442 sequences and there is no alternative uORF beginning at a different site. Where present, the translation of the uORF in Enterovirus A, B, C, E, F and G would produce a peptide of 56–76 amino acids, 6.5–9.0 kDa, with an isoelectric point (pI) of 8.5–11.2 (median values by group; Supplementary Table 1 and Supplementary Fig. 1a). The 3ʹ quarter of the uORF overlaps the ppORF in either the +1 or +2 frame (Supplementary Table 1), thus resulting in different C-terminal tails in UP.
Although the uORF is not present in all sequences, we wanted to ascertain whether, where present, it is subject to purifying selection at the amino acid level. To test this, we used MLOGD9 and codeml10. Codeml measures the ratio of non-synonymous to synonymous substitutions (dN/dS) across a phylogenetic tree; dN/dS < 1 indicates selection against non-synonymous substitutions, which is a strong indicator that a sequence encodes a functional protein. The application of codeml to within-species uORF alignments (excluding the overlap region) resulted in dN/dS estimates in the range of 0.04 to 0.22 for Enterovirus A, B, C, E, F and G (Supplementary Table 1). MLOGD uses a principle similar to dN/dS but also accounts for the higher likelihood of conservative amino acid substitutions (that is, similar physico-chemical properties) than non-conservative substitutions in biologically functional polypeptides. Three-frame ‘sliding window’ analysis of full-genome alignments using MLOGD revealed a strong coding signature in the ppORF (as expected) and also in the uORF, with this result independently replicated for each of the six enterovirus species (Fig. 1d and Supplementary Fig. 1b).
To evaluate the significance of UP in virus infection, we first utilized an infectious clone of echovirus 7 (EV7), a member of the species Enterovirus B. In EV7, the predicted UP is 8.0 kDa, has a pI of 10.5 and a predicted transmembrane domain (Fig. 2a). A set of mutant virus genomes was created and tested for RNA infectivity, virus titre, plaque size, stability of the introduced mutations, competitive growth with wild-type (WT) virus and relative IRES activity in a dual luciferase reporter system. Mutants with premature termination codons (PTC) introduced at uORF codons 5 or 29 (EV7-Loop and EV7-PTC, respectively; Fig. 2b and Supplementary Fig. 2a) behaved similarly to WT EV7 in all tested assays (Fig. 2c,d and Supplementary Fig. 3a), which indicates that UP is not required in the context of the susceptible RD cell line.
Consistent with previously published poliovirus data4,7, mutation of the EV7 dVI AUG (591AUG) to AAG with a compensatory 615A-to-U mutation to maintain the stem-loop base-pairing (EV7-mAUG; Fig. 2b and Supplementary Fig. 2a) resulted in a substantial decrease in IRES activity to 15% of the WT IRES activity (Fig. 2d). This decrease in IRES activity could explain the attenuated virus growth, which was followed by 100% reversion after the second passage in RD cells (Fig. 2c). Consequently the EV7-mAUG mutant was not used for further uORF studies.
We next sought to determine whether UP is expressed during virus infection. To facilitate this, a version of EV7 designed to produce C-terminally Strep-tagged UP and a corresponding PTC control (EV7-StrUP and EV7-StrUP-PTC, respectively; Fig. 2b and Supplementary Fig. 2b) were created. IRES activity decreased to 80% of the WT for both EV7-StrUP and EV7-StrUP-PTC (Fig. 2d); nevertheless, this did not noticeably affect the RNA infectivities, virus titres, plaque sizes or stability of the introduced mutations (Fig. 2c). We then infected RD cells with WT or mutant viruses at a high multiplicity of infection (m.o.i.) and analysed the cell lysates by immunoblotting. A protein of the expected size was detected 6 and 8 hours postinfection (h.p.i.) in lysates from cells infected with EV7 or EV7-StrUP viruses (using anti-UP and anti-Strep antibodies, respectively), but not for cells infected with the PTC mutants, thus confirming the expression of UP (Fig. 3a,b, upper panels). Further analyses confirmed that the introduced mutations did not affect the accumulation of the virus VP3 structural protein (Fig. 3a,b, lower panels; Supplementary Fig. 11a) or virus growth kinetics in one-step growth curves in RD cells (Fig. 3c).
To further study virus gene expression, we infected RD cells with WT EV7 and performed ribosome profiling at 4 and 6 h.p.i. Ribosome profiling maps the footprints of actively translating 80S ribosomes but not scanning or pre-initiation ribosomes. Ribosome profiling sequencing (Ribo-Seq) data quality was assessed as previously described (Supplementary Fig. 4)11. For these libraries, ribosome-protected fragment (RPF) 5ʹ ends mapped predominantly to the first nucleotide positions of codons (phase 0) (Supplementary Fig. 4c), thus allowing the robust identification of the reading frame in which translation is taking place. Within the ppORF, RPFs mapped predominantly to the first nucleotide positions of polyprotein codons (blue phase; Fig. 3d). However, within the non-overlapping portion of the uORF, RPFs mapped predominantly to the first nucleotide positions of uORF codons (green phase; Fig. 3d and Supplementary Fig. 4d), thus confirming translation of the uORF. Ribosome density in the uORF was comparable to ribosome density in the ppORF (Fig. 3d,e). With our Ribo-Seq protocol, a peak in RPF density is frequently observed on initiation sites (Supplementary Fig. 4a). Consistent with this, the first peak in the green phase mapped to the precise location of the dVI AUG codon (Fig. 3d and Supplementary Fig. 4d).
To confirm translation of the uORF in other enteroviruses, we performed ribosome profiling with PV1 and enterovirus A71 (EV-A71), members of the species Enterovirus C and A, respectively. Poliovirus type 1 is a causative agent of poliomyelitis, whereas EV-A71 is one of the major causative agents of hand, foot and mouth disease. Both viruses have the potential to cause severe neurological disease. For both PV1 and EV-A71, the uORF initiation site within the 5ʹ RNA structure and the properties of UP are similar to those of EV7 (Fig. 4a,e). The predicted UP is 7.2 kDa with a pI of 9.2 for PV1, and 8.8 kDa with a pI of 9.5 for EV-A71, and both UPs are predicted to contain a transmembrane domain.
The growth characteristics of PV1 were found to be similar to those of EV7—reaching complete cytopathic effect at 7–8 h.p.i. at a high m.o.i. On the other hand, the growth of EV-A71 was slower, with complete cytopathic effect at 10–11 h.p.i. Consistent with this, the accumulation of VP3 in infected cells was fastest in PV1 (4 h.p.i.), followed by EV7 (6 h.p.i.) and slowest in EV-A71 (8 h.p.i.; Fig. 4f). Hence, for PV1 we used 4 and 6 h.p.i. as time points for ribosome profiling, whereas for EV-A71 we used 5 and 7.5 h.p.i. Ribo-Seq data quality was assessed as before (Supplementary Figs. 5,6). In PV1, the uORF is in the +2 frame relative to the ppORF and, once again, within the non-overlapping portion of the uORF RPFs mapped predominantly to the first nucleotide positions of uORF codons (orange phase; Fig. 4b and Supplementary Fig. 5d). Similarly to EV7, the PV1 uORF was found to be efficiently translated (Fig. 4b,c) and the first peak in the orange phase mapped precisely to the dVI AUG codon (Fig. 4b and Supplementary Fig. 5d). The uORF in EV-A71 is also in the +2 frame relative to the ppORF. Ribosome-protected fragment density in the uORF phase (orange) was substantially lower for EV-A71 than for EV7 and PV1 (Fig. 4g,h and Supplementary Fig. 6d). Nonetheless, the first peak in the orange phase mapped precisely to the dVI AUG codon (Fig. 4g and Supplementary Fig. 6d), which indicates that the uORF is also translated in EV-A71, but probably at a lower efficiency than in EV7 and PV1.
Following the strategy used for EV7, we designed a version of PV1 to produce C-terminally HA-tagged UP and corresponding PTC and Loop mutant controls (PV1-HA, PV1-HA-PTC and PV1-HA-Loop; Supplementary Fig. 7a). Tagging resulted in moderate attenuation of the virus (Supplementary Fig. 7b). A protein of the expected size was detected in lysates from PV1-HA-infected RD cells 11 h.p.i. but not for cells infected with WT PV1 or the PTC or Loop mutants, thus confirming the expression of UP during virus infection (Fig. 4d, upper panels). There were no major differences in the accumulated levels of the virus VP3 structural protein between the different viruses (Fig. 4d, lower panels; Supplementary Fig. 11c).
We next investigated the possible effects of UP during infection in other cell lines and experimental conditions. In initial tests, we found no difference between WT and PTC mutants in any of the permissive cell lines (MA104, HEK293T, HeLa, CaCo2, Huh7 and HGT) tested at various m.o.i., even after the induction of an antiviral state through treatment with interferon. In an earlier analysis using a mouse-pathogenic poliovirus mutant Mah(L), Slobodskaya and colleagues found that a 103-nt deletion in the 5ʹ UTR (∆S mutant)—which truncated the uORF to 31 codons and fused it in-frame with the ppORF—resulted in no attenuation; in contrast, mutation of the dVI AUG abrogated neurovirulence, presumably due to its effect on IRES activity12. Thus, we hypothesized that UP might instead play a role at the primary site of infection, namely the gastrointestinal tract, which for many enteroviruses is the critical site of virus amplification before dissemination and further progression of systemic infection13.
The mouse model for enterovirus infection has several limitations: (1) the requirement for substantial virus adaptation and immunodeficient or receptor-transgenic mouse strains, (2) mouse models do not closely mimic human disease and (3) although a good model for neurovirulence studies, the low sensitivity of the mouse alimentary tract to enterovirus precludes the examination of the enteric stage of virus replication. Thus, to investigate a possible role of UP in the gastrointestinal tract, we utilized a recently developed human intestinal epithelial organoid platform to examine the potential effects of UP in differentiated organoids14. We generated three-dimensional organoids derived from distal small bowel (that is, terminal ileum) mucosal biopsies of patients. Following the establishment of cultures, organoids were trypsinized and cultured to form differentiated monolayers. Differentiation into epithelial cell subsets, predominantly consisting of absorptive enterocytes, was achieved by withdrawal of Wnt agonists as previously reported15,16 (Fig. 5a) and tested by quantitative reverse transcription-PCR (qRT–PCR; Supplementary Fig. 8). Monolayers were then infected with either WT or mutant viruses. At the later time points we observed a 75–90% reduction in EV7-Loop or EV7-PTC titres compared to WT EV7 titres (P = 4.6 × 10−5 and 5.5 × 10−5 at 36 and 48 h.p.i., respectively, when combined over the two patients; Fig. 5b). For EV7, EV7-Loop and EV7-PTC viruses, the initial infection (6–9 h.p.i.) was restricted to 5–20% of the organoid monolayer (Fig. 5c), which later progressed to complete cytopathic effect by 24–48 h.p.i. (Fig. 5d).
To investigate the cause of the growth defect of UP mutants in differentiated human intestinal organoids, we first quantified viral protein and RNA in the 36 and 48 h.p.i. samples. Even after normalizing with protein or RNA, UP mutant titres were still below WT titres (mean fold difference = 0.24; P = 0.00022, 0.000017 (titre/protein) and 0.00016, 0.000016 (titre/RNA) at 36 and 48 h.p.i., respectively, when combined over patients; Fig. 5e). Given that UP contains a predicted transmembrane region, we hypothesized that it may play a role in virus release from membranes. Therefore, we subjected the same samples to Triton X-100 detergent treatment (Fig. 5f). This had little effect on WT titres (mean fold increase = 1.2) but resulted in increased UP mutant titres (mean fold increase = 2.4) with the change in mutant titres differing significantly from the change in WT titres (P = 0.00087 and 0.000056 at 36 and 48 h.p.i., respectively, when combined over patients; Fig. 5f). The lysed cells from 48 h.p.i. samples were also tested for Triton X-100-mediated virus release. Consistent with the previous results, the change in mutant titres (mean fold increase = 5.2) was significantly different from the change in WT titres (mean fold increase = 2.0; P = 0.000086 when combined over patients; Fig. 5g).
To further test our hypothesis that UP facilitates the disruption of organoid-derived membranes, we performed membrane flotation assays for virus-containing media derived from infected differentiated organoid cultures. At 36 h.p.i., the ratio of membrane-bound to free virus for the EV7-PTC and EV7-Loop mutants exceeded that of WT EV7 by a mean of 3.1 fold (P = 0.026; two-tailed t-test, comparing patient 1 and 2 WT against the four mutant samples; Fig. 5h). In contrast, no membrane-associated virus was detected when the assay was repeated for RD cell-derived EV7 (Fig. 5h, green curve), which explains why no difference between WT and UP-knockout virus titres was observed for these cells. We also compared the neutralization of organoid-derived membrane fractions and RD cell-derived virus by treating with EV7 neutralization serum and/or through the prevention of receptor-mediated attachment using anti-DAF antibody17. The flotated membrane fractions were only partially neutralized in all three assays, whereas neutralization of RD cell-derived virus was significantly more efficient (Fig. 6a). In addition, the neutralization serum and anti-DAF antibody acted synergistically for RD cell-derived virus (P = 0.0004 and 0.0007 for each independent treatment compared to the combined treatment; two-tailed t-tests; Fig. 6a). However, this was not the case for the flotated membrane fractions (Fig. 6a), which suggests that non-neutralized membrane-associated virus enters cells by a route that does not involve receptor binding—for example, via membrane fusion. Together, these results suggest that UP plays a role in the release of virus particles from membranous components.
Given that the detectable accumulation of UP in infected cells coincides with a strong cytopathic effect that leads to autofluorescence, we investigated the subcellular localization of UP in transfected HeLa cells as well as in a stably expressing HeLa cell line. This revealed an endoplasmic reticulum-associated pattern that was confirmed by co-staining with calnexin, an endoplasmic reticulum marker (Fig. 6b and Supplementary Fig. 9). We also confirmed the membrane association of UP through subcellular fractionation of UP-expressing HeLa cells and subsequent analysis of the fractions (Fig. 6c).
As a result of variations in translational speed (including pausing and potential stacking behind ribosomes initiating at the polyprotein AUG), in addition to nuclease, ligation and PCR biases introduced during library preparation, ribosome profiling may not provide an accurate estimation of protein expression levels, particularly for short ORFs11. Therefore, to investigate the relative level of uORF expression, we used dual luciferase reporter constructs where the 2A-FFLuc cassette was placed either in the uORF or ppORF reading frame just downstream of the ppORF initiation codon (Fig. 6d). HeLa cells were transfected with the reporter construct with or without co-transfection of T7-transcribed infectious EV7 RNA. The ratio of uORF to ppORF expression did not change greatly over time and, consistent with the poor initiation context of the uORF, ppORF translation was 19–23 times more efficient than uORF translation in the context of virus infection (Fig. 6e). The encoding of UP in a separate ORF may be a strategy to allow the expression of UP at a level very different from that of the polyprotein products. As expected, IRES activity (both in the uORF and ppORF reading frames) increased relative to cap-dependent translation as infection progressed (Fig. 6f).
To test whether other members of the family Picornaviridae might also harbour undiscovered proteins encoded by alternative ORFs, we applied our comparative genomic methods to other picornaviruses, which revealed putative additional protein-coding ORFs in ten Picornaviridae clades (Supplementary Figs. 13–21).
The data presented here demonstrate the existence of an additional protein UP that is encoded by the enterovirus genome. The molecular biology of enteroviruses has been studied for over 50 years, not least because poliovirus is such an important pathogen18. Even before the poliovirus genome was first sequenced in 198119,20, all viral polypeptides were presumed to derive from the single polyprotein21. The uORF product was probably overlooked due to its small size and low expression levels. On the other hand, the function of the apparent spacer region between the IRES and the polyprotein initiation site was perplexing, particularly as it is absent in rhinoviruses.
Our analysis now demonstrates that, at least in EV7 (Enterovirus B), this region encodes a small protein, UP, that is not required for basic replication but plays an important role in virus growth in gut epithelial cells, the site of initial viral invasion into a susceptible host. Ribosome profiling revealed uORF translation in three enterovirus species and UP expression in EV7 and PV1 was further confirmed by western blotting. Comparative genomic analysis shows that the uORF is predominantly present in Enterovirus A, B, E, F and G and around half of Enterovirus C isolates and, where present, is subject to a strong purifying selection. In contrast, the uORF is absent from rhinoviruses, which infect the upper respiratory tract instead of the gastrointestinal tract, consistent with UP playing a specific role in gut epithelial cells. Some of the enteroviruses that lack UP have in fact been shown to be respiratory viruses22. It is possible that, in enteric viruses that lack UP, its function may be taken over by another membrane-associated protein such as the viroporin 2B; alternatively, it may simply be that their replication or tropism is such that a UP function is not required. Interestingly, the majority of poliovirus type 2 and 3 sequences have only a truncated uORF (mode lengths of 38 and 18 codons, respectively), too short to meet our definition of uORF presence. However, most available sequences (203 of 229) of PV1—the most common serotype—have an intact uORF.
Previous cell-free translation studies indicated that the dVI AUG was not utilized (or was only utilized infrequently) as an initiation site in WT PV1, although it can be ‘activated’ if its initiation context is artificially enhanced7,23. Using an in vitro reconstituted translation system, only trace amounts of 48S complexes were observed to form at the PV1 dVI AUG and the EV-A71 dVI AUG was not recognized3. Interestingly, the formation of 48S initiation complexes was observed at much higher levels at the uORF AUG in bovine enterovirus (Enterovirus E), where the dVI stem-loop is less stable. In the context of a cell-free translation system, the same authors found detectable but very inefficient 80S ribosomal complex formation at the PV1 dVI AUG and even lower amounts at the EV-A71 dVI AUG3,24. Ribo-Seq analysis allowed us to study ribosome occupancy throughout the 5ʹ UTR, in a cellular context and in the context of virus infection. In contrast to much of the previous work conducted in vitro, this revealed efficient translation of the uORF in EV7 and PV1, and a low level of translation in EV-A71.
Although non-enveloped viruses have traditionally been assumed to exit cells via cell lysis, recognition of non-lytic release pathways of either free or membrane-bound virus particles is increasing25,26,27. The late stage of the UP-knockout defect, however, suggests that it is unrelated to non-lytic release. Enteroviruses also subvert the host autophagy pathway—leading to intracellular double- or single-membraned virus-containing vesicles—and can be released from cells in various membrane-bound forms26,27,28. The intriguing decrease in UP-knockout virus titres observed at late stages of organoid infections, their rescue following detergent treatment and the increased proportion of membrane-associated virus in the absence of UP, suggest the importance of UP as a membrane disruptor to facilitate virus particle release from vesicles particular to gut epithelial cell infection.
These data overturn the long-established dogma of a single-polyprotein gene expression strategy in the enteroviruses and open a new window on our understanding of enterovirus molecular biology and pathogenesis. An increased understanding of the precise role(s) of UP in different enterovirus species and the differences between Enterovirus C isolates that contain or lack an intact uORF may lead to new virus control strategies; UP knockout mutants may even have applications as attenuated virus vaccines.
Cells and viruses
RD cells (human rhabdomyosarcoma cell line; ATCC, CCL-136), HEK293T cells (human embryonic kidney cell line; ATCC, CRL-3216) and HeLa cells (ATCC, CCL-2) were maintained at 37 °C in DMEM media supplemented with 10% fetal bovine serum (FBS), 1 mM L-glutamine and antibiotics. All cells were mycoplasma tested and authenticated by deep sequencing.
The cDNA of EV7 (strain Wallace; GenBank accession number AF465516.1, with silent substitution 1687G-to-A) was sourced from M. Lindberg and was cloned downstream of a T7 RNA promoter. The cDNA of WT PV1 (strain Mahoney; GenBank accession number V01149.1, with substitutions 2133C-to-T and 2983A-to-G) was sourced from B. Semler (University of California) and was cloned downstream of a T7 RNA promoter with a hammerhead ribozyme at the 5′ end as previously described29. Enterovirus EV-A71 strain B2 MS/7423/87 (GenBank accession number MG432108) was plaque-purified using RD cells, sequenced, titrated on RD cells and used for ribosome profiling infections. The EV7, PV1 and mutant viruses were rescued via transfection of RD cells with T7-transcribed RNAs using Lipofectamine 2000 (Invitrogen). RNA infectivity was assessed by a infectious centre assay in which RD monolayers were overlaid with dilutions of a suspension of RNA-transfected RD cells, incubated for 3 h, overlaid with 1.5% low melting point agarose (Invitrogen) in DMEM containing 1% FBS and incubated for 48 h at 37 °C until the formation of plaques. Alternatively, to collect recovered viruses, the transfection medium was replaced with DMEM containing 1% FBS and incubated for a further 20 h until 100% cytopathic effect was observed. Virus stocks were amplified on RD cells, cleared by centrifugation, purified through a 0.22 µm filter, titrated on RD cells and used for subsequent infections. All mutant viruses were also passaged at least three times at a low m.o.i. (0.01–0.1). The final virus stocks were used for RNA isolation and RT–PCR analysis to confirm the presence or reversion of the introduced mutations.
For the mammalian expression of UP, the coding sequence of EV7 UP, EV7 UP with a C-terminal Strep-tag or PV1 UP with a C-terminal HA-tag was inserted into the vector pCAG-PM30 using AflII and PacI restriction sites. The resulting constructs designated pCAG-UP, pCAG-StrUP and pCAG-HA-UP, respectively, were confirmed by sequencing.
All EV7 (Supplementary Fig. 2) and PV1 (Supplementary Fig. 7) mutations were introduced using site-directed mutagenesis of the pT7-EV7 or pT7-PV1 infectious clone, respectively, and confirmed by sequencing. For Strep-tagged EV7 and HA-tagged PV1, the uORF/ppORF overlap was duplicated and synonymously mutated to avoid recombination (see Supplementary Fig. 2b and 7a, respectively, for details). The resulting plasmids were linearized with XhoI (EV7) or EcoRI (PV1) before T7 RNA transcription.
To assess the relative IRES activity in a reporter system, the pSGDLuc vector was used to design a cassette with a cap-dependent Renilla luciferase gene followed by 748 nt of 5ʹ-terminal EV7 sequence (entire 5ʹ UTR and first two ppORF codons) fused in-frame with the 2A firefly luciferase gene31. To assess IRES activity in the uORF reading frame, the 2A firefly luciferase gene was fused after the 7th nucleotide of the ppORF. The resulting plasmids were linearized with BamHI before T7 RNA transcription.
RNA transcript preparation
Transcription reactions were performed using the T7 RNA polymerase MEGAscript T7 transcription kit (Ambion). Transcription reactions (10 µl) were incubated for 1 h at 37 °C and terminated by treatment with DNase I for 15 min at 37 °C.
Reporter assay for relative IRES activity
HEK293T cells were transfected in triplicate with Lipofectamine 2000 reagent (Invitrogen), using the protocol in which suspended cells are added directly to the RNA complexes in 96-well plates. For each transfection, 100 ng purified T7 RNA (RNA Clean and Concentrator, Zymo research) plus 0.3 µl Lipofectamine 2000 in 20 µl Opti-Mem (Gibco) supplemented with RNaseOUT (Invitrogen; diluted 1:1,000 in Opti-Mem) were added to each well containing 105 cells. Transfected cells in DMEM supplemented with 5% FBS were incubated at 37 °C for 16 h. Firefly and Renilla luciferase activities were determined using the Dual Luciferase Stop & Glo Reporter Assay System (Promega). IRES activity was calculated as the ratio of Firefly (IRES-dependent translation) to Renilla (cap-dependent translation), normalized by the same ratio for the WT EV7 sequence. Three independent experiments were performed to confirm the reproducibility of the results. For the temporal analysis of the ppORF:uORF expression ratio, a similar protocol was used but with HeLa cells to support EV7 replication. EV7 infection was achieved by co-transfection with capped T7 EV7 RNA (150 ng per transfection) and the released virus was titrated by plaque assays on RD cells.
Virus competition assay
Dual infection/competition assays were performed in duplicate on RD cells using mutant and WT EV7 at either equal or 9:1 ratios and a total m.o.i. of 0.1. Mono-infections with WT or mutant viruses were used as controls. Media collected from infected plates were used for five blind passages using 1:10,000 volume of obtained virus stock (corresponding to m.o.i. 0.05–0.2). RNA was isolated from passages 1 and 5 using Direct-zol RNA MicroPrep (Zymo research) and used for RT–PCR and Sanger sequencing of the fragment containing the mutated region of the virus genome. The final chromatograms were compared and evaluated based on three RT–PCR products from each analysed sample (Supplementary Fig. 3b).
SDS–PAGE and immunoblotting
Lysates from virus-infected or pCAG-transfected cells were analysed by SDS-polyacrylamide electrophoresis (SDS–PAGE) using standard 12% SDS–PAGE to resolve virus structural proteins and precast Novex 10–20% tricine protein gels (Thermo fisher) to resolve UP. Proteins were then transferred to 0.2 µm nitrocellulose membranes and blocked with 4% Marvel milk powder in PBS. Immunoblotting of the enterovirus VP3 structural protein was performed using Enterovirus pan monoclonal antibody (Thermo Fisher) at a 1:1,000 dilution. A custom rabbit polyclonal antibody raised against C-terminal UP peptide CPPRKPEPMRLG (GenScript), an anti-Strep mouse antibody (Abcam) and an anti-HA mouse antibody (Abcam) were used for the detection of EV7 UP, EV7 Strep-tagged UP and PV1 HA-tagged UP, respectively. The following antibodies were used for cellular targets: anti-tubulin (Abcam), anti-VDAC1 (Abcam), anti-GAPDH (Ambio) and anti-calnexin (Merck). To ensure synchronicity of infection, a high m.o.i. was used for virus infections. Immunoblots were imaged and analysed on a LI-COR imager. The original LI-COR scans and quantifications are shown in Supplementary Fig. 11.
Sampling, preparation and infection of human intestinal organoid monolayers
Following ethical approval (REC-12/EE/0482) and informed consent, intestinal biopsies were collected from the terminal ileum of patients undergoing routine endoscopies. All patients included had macroscopically and histologically normal mucosa. Biopsy samples were processed immediately and intestinal epithelial organoids generated from isolated crypts following an established protocol15,16.
To form differentiated monolayers for infection, 48-well plates or IBIDI 8-well chamber slides were collagen-coated 2 h before cell seeding. Mature intestinal organoids were washed with PBS containing 0.5 mM EDTA and dissociated in 0.5% Trypsin-EDTA. Trypsinization was inactivated by FBS and clumps of cells were removed using a 40 µm cell strainer. Cells were seeded at 1.4 × 105 per well and grown in proliferation media16. After 24 h, cells were maintained in differentiation media14 and differentiation allowed to occur for 5 d before infection. The differentiation of monolayers was confirmed by qPCR measurement of stem cell (leucine-rich repeat-containing G-protein coupled receptor 5), mature enterocyte (alkaline phosphatase) and epithelial cell (villin) marker transcripts at Days 0, 3 and 5. Relative fold changes were assessed with the 2−ΔΔCT method using the hypoxanthine phosphoribosyltransferase 1 transcript for normalization.
Monolayers cultured in 48-well plates were infected in triplicate at a m.o.i. of 10 at 37 °C for 1 h, washed twice with serum-free media and overlaid with 250 µl differentiation media. Aliquots of media corresponding to half the volume were taken at the indicated time points and clarified by centrifugation at 6,000 g for 5 min. The lysed cell debris at 48 h.p.i. was collected using 250 µl differentiation media. All collected samples were titrated on RD cell monolayers using plaque assays as readouts. The 48 h.p.i. virus stocks were used for RNA isolation and RT–PCR analysis to confirm the presence of the introduced mutations.
Analysis of samples collected from infected human intestinal organoid monolayers
Samples collected at 36 and 48 h.p.i. were used for EV7 RNA and VP3 quantification. The amount of EV7 RNA was determined by qRT–PCR. A 20 µl aliquot of each sample was mixed with 4 × 106 PFUs purified Sindbis virus (SINV) stock, which was used for normalization and to control the quality of RNA isolation. RNA was extracted using the Qiagen QIAamp viral RNA mini kit. Reverse transcription was performed using the QuantiTect reverse transcription kit (Qiagen) with virus-specific reverse primers for SINV (GTTGAAGAATCCGCATTGCATGG) and EV7 (CACCGAATGCGGAGAATTTACC). EV7 and SINV-specific primers were used to quantify corresponding virus RNAs; the primer efficiency was within 95–105%. Quantitative PCR was performed in triplicate using SsoFast EvaGreen Supermix (Bio-Rad) in a ViiA 7 Real-time PCR system (Applied Biosystems) for 40 cycles with two steps per cycle. The results were normalized to the amount of SINV RNA in the same sample. Fold differences in RNA concentration were calculated using the 2−ΔΔCT method. Protein analysis from the same samples was performed by western blotting using Enterovirus pan monoclonal antibody at a 1:500 dilution. VP3-specific bands were quantified using LI-COR imager software. The EV7 titres were normalized to either the RNA or protein quantities and further normalized to the mean value of the WT EV7 samples. The same set of samples was subjected to treatment with Triton X-100 at a final concentration 1% or PBS as a control, titrated by plaque assay and presented as the ratio of Triton X-100 treated to PBS-treated values.
Membrane flotation assay of organoid-derived viruses
Differentiated human intestinal organoid cultures were infected with EV7 and mutants. At 36 h.p.i., media were collected, clarified by centrifugation at 6,000 g for 5 min and aliquots were titrated with or without Triton X-100 pre-treatment. EV7 derived from infected RD cells (m.o.i. of 1) collected at 20 h.p.i. in serum-free media and clarified by centrifugation at 6,000 g for 5 min was used as a control. Samples were then used for the flotation assay in an iodixanol gradient as described by Vogt et al.32 with minor modifications. Briefly, each sample was mixed with 1.5 ml 0.25 M sucrose in PBS and 1.5 ml iodixanol (Sigma) resulting in 30% iodixanol concentration. A discontinuous iodixanol gradient consisting of 1 ml 60%, 3 ml 30% (containing the sample), 4 ml 20% and 4 ml 10% iodixanol was layered and centrifuged at 200,000 g for 16 h at 4 °C in a SW41Ti rotor. A total of 15 fractions (∼800 µl each) were collected using a fractionator. Each fraction was titrated by plaque assay on RD cells. The resulting titres were normalized with the total amount of virus in each sample and plotted.
Virus neutralization was performed by mixing virus sample corresponding to 50–500 PFUs (with appropriate dilution for counting input PFUs) with a 1:400 dilution of EV7 neutralization serum (Batch nr. 2/69, The Standards Laboratory, Central Public Health Laboratory), incubating the mixture at room temperature for 30 min and then plating on monolayers of RD cells for plaque formation. The neutralization assay via prevention of receptor-mediated attachment was performed on monolayers of RD cells pretreated for 1 h with anti-DAF antibody at 1:500 dilution (rabbit; in-house; sourced from D. Evans33) followed by infection and plaque formation.
Fractionation analysis of UP
For the analysis of overexpressed UP, electroporation of HeLa cells was performed in full media at 240 V and 975 µF using a Bio-Rad Gene Pulser. At 20 h post-electroporation, cells were washed with PBS and fractionated using a subcellular protein fractionation kit for cultured cells (Thermo Scientific) according to the manufacturer’s instructions. Equal aliquots of whole cell lysate, cytoplasmic and membrane fractions were analysed by western blotting using the indicated virus- or cellular target-specific antibodies.
Differentiated human intestinal organoid monolayers were grown on IBIDI 8-well chamber slides and infected with EV7 or mutants (m.o.i. of 10) five days post differentiation. For the analysis of overexpressed UP, the transfection of HeLa cells was performed using Lipofectamine 2000. For moderately expressed UP, a HeLa cell line stably expressing UP (HeLa-UP) was created using the pCAG-UP construct as previously described30. At 9 h.p.i. or 20 h.p.t., cells were fixed with 4% paraformaldehyde for 20 min at room temperature, followed by permeabilization with PBS containing 0.5% Triton X-100 (for infected organoids), 0.1% Triton X-100 or 0.2% saponin (for transfected HeLa cells and the HeLa-UP cell line) for 10 min. Cells were blocked in 5% goat serum and incubated sequentially with primary (Enterovirus pan monoclonal antibody, J2 anti-dsRNA IgG2a monoclonal antibody (Scicons) or anti-calnexin antibody) and secondary (Alexa Fluor 488- or Alexa Fluor 597-conjugated goat anti-mouse or goat anti-rabbit; Thermo Fisher) antibodies. Nuclei were counter-stained with Hoechst (Thermo Scientific). The images are a projection of a z-stack (Supplementary Fig. 9) or single plane image (Fig. 6b) taken with a Leica SP5 Confocal Microscope using a water-immersion ×63 objective.
RD cells were grown on 150-mm dishes to 90% confluency. Following previous optimization of ribosome profiling with other viruses, we infected cells at a m.o.i. of 20 with EV7, PV1 or EV-A71 virus stocks. At indicated times postinfection, cells were treated with 3 mM cycloheximide for 3 min, flash frozen in a dry ice/ethanol bath and lysed in the presence of 0.36 mM cycloheximide. The cell lysates were subjected to Ribo-Seq based on previously reported protocols11,34, except a Ribo-Zero Gold rRNA removal kit (Illumina), not DSN, was used to deplete ribosomal RNA. Amplicon libraries were deep sequenced using an Illumina NextSeq platform.
Computational analysis of Ribo-Seq data
Ribo-Seq analysis was performed as described previously11. Adaptor sequences were trimmed using the FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit) and trimmed reads shorter than 25 nt were discarded. Reads were mapped to host (Homo sapiens) and virus RNA using bowtie version 135, with parameters -v 2 --best (that is maximum of two mismatches, report best match). Mapping was performed in the following order: host rRNA, virus RNA, host RefSeq mRNA, host non-coding RNA and host genome.
To normalize for library size, RPM values were calculated using the sum of total virus RNA plus host RefSeq mRNA reads (positive-sense reads only) as the denominator. A +12 nt offset was applied to the RPF 5′ end positions to give the approximate ribosomal P-site positions. To calculate the phasing and length distributions of host and virus RPFs, only RPFs whose 5′ end (+12 nt offset) mapped between the 16th nucleotide 3′ of the initiation codon and the 16th nucleotide 5′ of the termination codon of the coding sequences (ppORF for viruses; RefSeq mRNA coding regions for host) were counted, thus avoiding RPFs of initiating or terminating ribosomes. Histograms of host RPF positions (5′ end +12 nt offset) relative to initiation and termination codons were derived from reads mapping to RefSeq mRNAs with annotated coding regions ≥450 nt in length and with annotated 5′ and 3′ UTRs ≥ 60 nt in length.
Virus uORF and ppORF expression levels (RPKM) were calculated by counting RPFs whose 5′ end (+12 nt offset) mapped within the respective coding region. The region of overlap between the uORF and ppORF was excluded. To mitigate the effect of RPFs potentially deriving from translation of very short overlapping ORFs (Supplementary Figs. 4d, 5d and 6d) and given the high degree of triplet phasing in the data (Supplementary Fig. 4c, 5c and 6c), we only counted RPFs mapping in phase 0 with respect to the uORF or ppORF, as appropriate; these values were then scaled by the ratio of total polyprotein-mapping RPFs to phase-0 polyprotein-mapping RPFs (a value in the range of 1.24–1.39, depending on the library). Due to variability in RPF density as a result of variable codon dwell-times in addition to biases introduced during library preparation, the short length of the uORF and the possibility of non-specific initiation in other very short ORFs between the uORF AUG and the ppORF AUG, it was not possible to precisely calculate the relative translation efficiencies of uORF and ppORF from the Ribo-Seq data.
Comparative genomic analysis
Genus Enterovirus nucleotide sequences were downloaded from the National Center for Biotechnology Information (NCBI) on 2 July 2017. The bona fide polyprotein AUG initiation site was identified in each sequence by alignment to NCBI genus Enterovirus RefSeqs. Sequences that contained the complete ppORF and at least 160 nt upstream were identified and used for further analysis. Patent sequence records, sequences with NCBI keywords ‘UNVERIFIED’, ‘STANDARD_DRAFT’, ‘VIRUS_LOW_COVERAGE’ or ‘VIRUS_AMBIGUITY’ and sequences with >10 ambiguous nucleotide codes (for example ‘N’s) indicative of low quality or incomplete sequencing, were removed, leaving 3,136 sequences.
To define enterovirus clades, the following International Committee on Taxonomy of Viruses (ICTV) type sequences for 13 genus Enterovirus species were used as reference sequences: Enterovirus A, AY421760; Enterovirus B, M88483; Enterovirus C, V01149; Enterovirus D, AY426531; Enterovirus E, D00214; Enterovirus F, DQ092770; Enterovirus G, AF363453; Enterovirus H, AF326759; Enterovirus I, KP345887; Enterovirus J, AF326766; Rhinovirus A, FJ445111; Rhinovirus B, DQ473485 and Rhinovirus C, EF077279. The 3,136 sequences were grouped into clades according to the reference sequence with which they shared the greatest polyprotein amino acid identity. Only three sequences—KU587555, KX156158 and KX156159—had <65% amino acid identity to any of the 13 reference sequences and these sequences were left unclustered (Fig. 1b). For the sake of simplicity, recombination—a fairly common occurrence within enterovirus species36—was ignored. The phylogenetic tree (Fig. 1b) was constructed using polyprotein amino acid sequences aligned with MUSCLE37 and processed with Gblocks38 using default parameters to remove poorly aligned regions (resulting in a reduction from 2,461 alignment columns to 1,693 alignment columns). A maximum likelihood phylogenetic tree was estimated using the Bayesian Markov chain Monte Carlo method implemented in MrBayes version 3.2.339 sampling across the default set of fixed amino acid rate matrices, with 100,000 generations, discarding the first 25% as burn-in (other parameters were left at defaults). The tree was visualized with FigTree (http://tree.bio.ed.ac.uk/software/figtree/).
In each of the 3,136 sequences, the AUG codon in dVI of the IRES was identified based on the conserved sequences surrounding it (typically UU AUG GU(C/G)ACA, or slight variations thereof; dVI AUG in bold). Sequences were defined as having the uORF if the ORF beginning with this AUG codon and including the first in-frame stop codon: (1) overlapped the ppORF by at least 1 nt, (2) was not in-frame with the ppORF and (3) contained at least 150 nt upstream of the polyprotein AUG codon.
The dN/dS ratios were estimated using the codeml program in the PAML package10. To do this in an acceptable computational time, the alignments were reduced to fewer sequences by applying BLASTCLUST (a single-linkage BLAST-based clustering algorithm)40. First, within each clade, for those sequences containing a uORF according to the above definition, the uORF nucleotide sequences (3′-truncated, after a whole number of codons, to exclude the part overlapping the ppORF) were extracted, clustered with BLASTCLUST (-p F -L 0.95 -b T -S 95; that is, 95% coverage, >95% nucleotide identity threshold) and, within each BLASTCLUST cluster, a single representative sequence was retained. To mitigate the effect of potential sequencing errors, in each cluster the representative sequence was chosen to be the sequence with the most identical copies (with ties broken arbitrarily) or, if there were no duplicated uORF sequences, the sequence closest to the centroid (minimum summed pairwise nucleotide distances from sequence i to all other sequences j within the cluster). This reduced the uORF sequence sets for enterovirus clades A, B, C, E, F and G from 1,182, 357, 345, 9, 11 and 16 to 53, 177, 81, 8, 10 and 13 sequences, respectively. In each clade, the remaining nucleotide sequences were translated, aligned as amino acids with MUSCLE and the amino acid alignment used to guide a codon-based nucleotide alignment (EMBOSS tranalign)41. Alignment columns with gap characters in any sequence were removed, which resulted in a reduction from 53, 52, 54, 51, 51 and 69 to 50, 50, 50, 50, 51 and 44 codon positions in enterovirus clades A, B, C, E, F and G, respectively. PhyML42 was used to produce a nucleotide phylogenetic tree for each of these sequence alignments. Using these tree topologies, dN/dS was calculated for each alignment with codeml. Standard deviations for the codeml dN/dS values were estimated using a bootstrapping procedure, in which codon columns of the alignment were randomly resampled (with replacement); for each clade, 100 randomized alignments were generated and their dN/dS values calculated with codeml.
For sequences containing the uORF, the coding potential within each reading frame was analysed using MLOGD9. First, within each clade, the polyprotein amino acid sequences were determined for the sequences containing a uORF according to the above definition, clustered with BLASTCLUST (-p T -L 0.95 -b T -S 99; that is, 95% coverage, >99% amino acid identity threshold) and, within each BLASTCLUST cluster, a single representative sequence was retained using the same procedure as described above for uORF nucleotide sequences but using the polyprotein amino acid sequences. The ICTV reference sequences (as per Fig. 1b) were also retained as reference sequences for the Enterovirus E, F and G clades, whereas EV-A71, EV7 and PV1 were appended and used as the reference sequences for the Enterovirus A, B and C clades, respectively. This reduced the sequence sets for enterovirus clades A, B, C, E, F and G to 89, 220, 101, 8, 10 and 15 sequences, respectively. For each clade, the remaining polyprotein amino acid sequences were aligned with MUSCLE, processed with Gblocks as described above and analysed with PhyML to produce the tree topology for the MLOGD analysis. Then, for each clade, each individual genome sequence was aligned to the reference sequence using code2aln version 1.243 and mapped to reference sequence coordinates by removing alignment positions that contained a gap character in the reference sequence. These pairwise alignments were combined to give whole-clade alignments which were analysed with MLOGD using a 40-codon sliding window and a one-codon step size. For each of the three reading frames, within each window the null model is that the sequence is non-coding whereas the alternative model is that the sequence is coding in the given reading frame. Positive/negative values indicate that the sequences in the alignment are likely/unlikely to be coding in the given reading frame (Fig. 1d and Supplementary Fig. 1b).
For the analysis of non-Enterovirus taxa within the Picornaviridae family (Supplementary Figs. 14a–23a), the coding potential within each reading frame was analysed using MLOGD9 and synonymous site conservation was analysed with SYNPLOT244. For these analyses we generated codon-respecting alignments of full-genome sequences using a procedure described previously44. In brief, each individual genome sequence was aligned to a reference sequence using code2aln version 1.243. Genomes were then mapped to reference sequence coordinates by removing alignment positions that contained a gap character in the reference sequence and these pairwise alignments were combined to give the multiple sequence alignment. These were analysed with MLOGD (see above) using a 40-codon sliding window and a 5-codon step size. To assess conservation at synonymous sites, the polyprotein coding region and any non-overlapping portion of the additional ORF sequence were extracted from the alignment, the polyprotein and additional ORF sequences were concatenated in-frame (where relevant) and the alignment analysed with SYNPLOT2 using a 25-codon sliding window. Amino acid alignments of the complete putative new proteins (Supplementary Fig. 14b–23b) were performed with MUSCLE37.
Transmembrane domains were predicted with Phobius (EMBL-EBI)45.
All studies were conducted with informed patient and/or carer consent as appropriate and with full ethical approval; ethical approval was obtained from the NHS Research Ethics Service Committee East of England, Hertfordshire (REC-12/EE/0482). Informed consent was obtained from all patients/parents before participation, in accordance with approved study protocols.
Statistics and reproducibility
All t-tests are two-tailed and assume separate variances for the two populations being compared. Raw data for the organoid experiments and details of the t-tests performed are reported in Supplementary Table 3. Raw data for the dual luciferase assays are reported in Supplementary Table 4.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We thank the Cambridge NIHR BRC Cell Phenotyping Hub for assistance with confocal microscopy. We thank T. Sweeney, I. Brierley and E. Jan for stimulating discussions. This work was supported by Wellcome Trust grant no. 106207 and European Research Council grant no. 646891 to A.E.F and Wellcome Trust grant nos 097997/Z/11/Z and 207498/Z/17/Z to I.G.