Long-read assays shed new light on the transcriptome complexity of a viral pathogen

Characterization of global transcriptomes using conventional short-read sequencing is challenging due to the insensitivity of these platforms to transcripts isoforms, multigenic RNA molecules, and transcriptional overlaps. Long-read sequencing (LRS) can overcome these limitations by reading full-length transcripts. Employment of these technologies has led to the redefinition of transcriptional complexities in reported organisms. In this study, we applied LRS platforms from Pacific Biosciences and Oxford Nanopore Technologies to profile the vaccinia virus (VACV) transcriptome. We performed cDNA and direct RNA sequencing analyses and revealed an extremely complex transcriptional landscape of this virus. In particular, VACV genes produce large numbers of transcript isoforms that vary in their start and termination sites. A significant fraction of VACV transcripts start or end within coding regions of neighbouring genes. This study provides new insights into the transcriptomic profile of this viral pathogen.

www.nature.com/scientificreports/ Many PR and a few E transcripts have 5′ poly(A) tails (PAT). PATs are supposedly generated by RNA polymerase slippage onto adjacent thymines on the DNA strand 22 . In addition to genome tiling analyses 23 , RNA-Seq 19 , ribosome profiling 12,24 , and microarrays 25,26 have also been used for VACV transcriptome profiling. Yang and colleagues mapped 118 E genes and 93 PR genes using a short-read sequencing (SRS) technique 16 , and in a later study, they distinguished 53 I and 38 L genes among the PR genes 17 . Ribosome profiling analysis also confirmed canonical translational initiation sites (TISs) and demonstrated that additional TISs occur mostly within the ORFs. These TISs were also detected in 5′-untranslated (UTRs) and intergenic regions, as well as on complementary DNA strands 12 . However, because most of these ORFs were short, the authors doubted their biological relevance. In another study by Yang and colleagues, transcript abundance was highly correlated with ribosome-protected reads, suggesting that translation is largely regulated by mRNA abundance 12 . The techniques used in the above mentioned studies cannot detect full-length RNA molecules, and hence fail to provide a comprehensive assessment of the viral transcriptome. Although deep SRS provides satisfactory sequencing depth and coverage for global transcriptome profiling, the resulting RNA assemblies are incomplete, leading to inadequate annotations.
The major limitations in VACV transcriptome profiling relate to multiple transcriptional read-throughs that generate a complex meshwork of overlapping RNAs and to the large variation in transcriptional end sites (TESs). Additionally, the extensive use of alternative promoters leads to a large diversity of transcription start sites (TSSs). Hence, conventional techniques that fail to read entire RNA molecules have considerably underestimated the complexity of the poxvirus transcriptome. Earlier, we used two long-read sequencing (LRS) platform Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT)] to profile the transcriptomes of herpesviruses, including pseudorabies virus 27,28 , herpes simplex virus type 1 29,30 , varicella-zoster virus 31 , and human cytomegalovirus 32,33 . These analyses identified polycistronic RNAs, transcript isoforms, and transcriptional overlaps, and facilitated the kinetic characterization of viral transcripts 34,35 .
Herein, we report the transcriptomic analyses of the Western Reserve (WR) strain of VACV using PacBio RSII and Sequel platforms, and ONT MinION platform for cDNA, direct RNA (dRNA), and Cap-sequencing (Cap-Seq).

Results
Long-read transcriptome sequencing of vaccinia virus. In this study, we performed polyadenylation sequencing techniques for the investigation of the VACV transcriptome. We used PacBio isoform sequencing (Iso-Seq) template preparation protocol for the RSII and Sequel platforms, and the ONT MinION device to sequence cDNAs and native RNAs. For ONT sequencing, we used the company's own library preparation approach (1D-Seq), or the Teloprime Cap-selection protocol from Lexogen, which was adapted for the MinION sequencer. The obtained datasets were then used to define and validate viral TSS and TES coordinates using the LoRTIA pipeline (https ://githu b.com/zsolt -balaz s/LoRTI A) that was developed by our research group 36 . The workflow for library preparation and analysis is shown in Fig. 1.
PCR amplification was used in all our sequencing techniques with the exception of dRNA sequencing where no amplification occurred. Direct RNA sequencing offers an alternative to cDNA sequencing because it lacks the recoding and amplification biases inherent in reverse transcription (RT) and PCR-based methods. We also used random primer-based RT for some ONT sequencing analyses. All methods generated high-coverage LRS data and determined both TSSs and TESs of viral transcripts with base-pair precision (see below). LRS can be used to detect individual RNA molecules and identify alternatively transcribed and processed RNAs, polycistronic transcripts, and transcriptional overlaps. Despite this straightforward utility, identification of transcript ends (especially of TSSs) was extremely challenging due to the tremendous complexity of the VACV transcriptome. The amplified SMRT Iso-Seq technique utilizes a switching mechanism at the 5′-end of the RNA template, thereby generating full-length cDNAs 37 . www.nature.com/scientificreports/ An advantageous feature of the PacBio technique is that any error that arises can be easily corrected due to its high consensus accuracy 38 . In this study, we used a workflow and pipeline for transcriptome profiling of long-read RNA sequencing data that was developed in our laboratory 30,32,39 . In total, about 1,115,000 reads of inserts (ROIs) were generated using the PacBio platform (Supplementary Table S1). In addition to PATs, ROIs can potentially be produced non-specifically from A-rich regions of the RNA molecules 36 . Thus, we excluded these products from further analysis using the LoRTIA toolkit. Non-specific binding of the adapter and PCR primer sequences to cDNAs can also lead to false positives and we excluded these artefacts using bioinformatic filtering. We also used the LoRTIA toolkit to confirm that the 5′-and 3′-termini of the sequencing reads represented real TSSs and TESs, respectively. When two or more sequencing reads with the same TSS contained different TESs, they were considered independent ROIs. Similarly, ROIs with different PAT lengths were regarded as independent. We also accepted those TSSs that have been described by others 19,[40][41][42][43][44][45][46] .
The advantages of the ONT MinION sequencing technique over the PacBio platform include cost-effectiveness, higher read output, and the ability to read sequences within the range of 200 to 800bps, in which PacBio and the SRS techniques are less effective 47 . Although this method is hampered by high error rates, sequencing accuracy is not particularly detrimental to transcriptome analyses of well-annotated genomes, such as that of VACV 48 . In addition to the oligo(dT)-based 1D protocol from ONT, we prepared a random hexanucleotide-primed RT to detect the non-polyA(+) RNA fractions and to validate TSSs. A Cap-Seq approach was used to identify TSS positions. Together, the nanopore sequencing methods yielded approximately 535,000 viral sequencing reads.
VACV ORFs are relatively short (Supplementary Table S2) and many of them overlap with each other. Because the PacBio MagBead loading protocol selectively removes DNA fragments shorter than 1kb 49 , potential monocistronic transcripts with a short ORFs are underrepresented or missing in these datasets. Nonetheless, in analyses of very long polycistronic and complex transcripts, PacBio has better sequencing precision than ONT. Thus, the combined use of these LRS methods eliminates the shortcomings of both. Native RNA sequencing can circumvent spurious transcription reads that are generated in PCR and RT reactions by false priming, template switching, or second-strand synthesis. The major disadvantage of dRNA sequencing is that a few bases are always missing from the 5′-ends of transcripts, and in many cases also from the 3′-ends 50 . Under these sequencing conditions, it is assumed that the lacking 5′-end nucleotides are the consequence of perturbed base calling due to premature release of the mRNA from the motor protein, which hasten the progress of RNA molecules across pores. The frequent absence of PATs in sequencing reads is explained by the miscalling of adapter nucleotides that are ligated downstream of PATs on the RNA molecule, thus muddling of the raw signal of the downstream ' A' homopolymer. The present dRNA-Seq analysis returned a total of approximately 195,000 reads.
This study demonstrates an extremely complex transcription pattern along the entire viral genome. Our sequencing approaches detected transcriptional activity at every nucleotide of the VACV genome. The number of annotated ORFs vary between 201 and 263 3,51 depending on the strain and study. We identified 218 ORFs in the WR strain (GenBank accession number is LT966077.1) using in silico methods. 184 out of 218 ORFs were expressed either as exclusively monocistronic or as both mono-and polycistronic RNA molecules. We detected 8,191 unique putative transcripts using the LoRTIA pipeline (Supplementary Table S3a). When we applied a criterion of accepting only those transcripts which were identified by two different techniques and/or in three independent experiments, this number decreased to 1480 transcripts (Supplementary Table S3b Table S3d), but no full-length reads were detected for about 25 ORFs. These regions contain several transcriptional reads without exact TSS and/or TES positions.
We introduce the concepts of 'regular' and 'chaotic' genomic regions. At the 'regular' genomic segments viral genes produce transcript with relatively exact TSSs and TESs ( Supplementary Fig. S1). However, at the 'chaotic' genomic loci the transcripts exhibit high heterogeneity with respect to the exact positions of TSSs and TESs (Fig. 2). Transcripts with highly diverse transcription initiation and termination sites are encoded by the I and L genes (such as O1L, O2L, I1L, I2L, I3L, A10L, A11R, A12L, A18R genes) at particularly the late time points of the infection. This phenomenon creates considerable difficulty in identifying individual transcripts. putative protein-coding genes. Herein, we identified (a) genes in new genomic positions, (b) embedded genes (5′-truncted in-frame ORFs within larger canonical ORFs), and (c) short upstream ORFs (uORFs, preceding the main ORFs). We considered the most abundant transcript isoform specified by a given gene as the canonical (main) transcript. (Note S1). The novel ORFs and transcripts were named according to the common (Copenhagen) HindIII fragment letter/number-based nomenclature 45 . All VACV transcripts (identified by the LoRTIA pipeline) are presented in a Geneious file available at FigShare (Supplementary Note S2).
Genes at novel genomic positions. We detected two novel putative mRNAs with short ORFs (both 51bps) that are located in the intergenic genomic regions (B25.5R/C19.5L, C9.5L) of two divergently oriented genes. The TSS positions of these transcripts have previously been reported 14,19 (Supplementary Table S4). Our analysis showed that these ORFs are expressed both alone in separate transcripts and as upstream ORFs (uORFs) located 5′ of canonical ORFs in longer transcripts.
Embedded genes. A characteristic feature of the VACV genome is that it contains several short in-frame ORFs, called embedded genes, many of which are translated 12,40 . Using full-length RNA sequencing we identified 49 novel putative embedded genes with 5′-truncated in-frame ORFs, which were shorter than the in silico annotated canonical ORFs (Fig. 3). This study demonstrated that these putative genes specify more than 300 tran-Scientific RepoRtS | (2020) 10:13822 | https://doi.org/10.1038/s41598-020-70794-5 www.nature.com/scientificreports/ www.nature.com/scientificreports/ script isoforms. We initially assumed that the TISs of these internal ORFs were the closest in-frame ATGs to canonical host ORFs. However, this is not necessarily the case, a large number of TISs were detected by others using ribosome profiling 12 . We identified 22 unique promoters associated with these TISs using our in-house scripts but we detected only six transcripts that contain in-frame ORFs (Supplementary Table S5).
Upstream ORFs. Upstream ORF-containing transcripts are considered to play a regulatory role in eukaryotic gene expression 52 . Translation of uORFs typically inhibit downstream expression of canonical ORFs. The 5′-UTR of uORF-containing transcripts are reportedly ≥ 75 bp in length 52 . Several in silico and experimental studies have shown that 40-50% of human and mouse mRNAs contain at least one uORF 53,54 . We applied strict criteria for filtering data obtained by the LoRTIA toolkit and thereby revealed that 25 previously annotated VACV genes express longer TSS variants than the canonical transcripts. On average, the 5′-UTRs of the main transcript isoforms were 101 bp ( Supplementary Fig. S2), whereas the longer variants were 258 bp long (Supplementary Table S6). Most of these transcript isoforms (64%) contain at least one ORF preceding the main ORF with an average 5′-UTR length of 331 nts. We set the minimal size limit of these uORFs to 9 nts comprising at least one triplet between the putative start and stop codons, as described by Scholz and colleagues 55 . We also set a maximal ORF length of 90 nts and considered transcripts with longer upstream ORFs as bicistronic. Eleven VACV transcripts with 5′-UTR variants were found to contain multiple uORFs. G5.5R was the only gene in which the longer transcript isoform contained a single uORF. The VACV transcriptome is distinguished by a high number of TSS and TES positions and.
Antisense RNAs. With few exceptions, we detected transcriptional activity from both DNA strands along the entire viral genome. The ratio between the mRNAs and antisense RNAs (asRNAs) varied between genomic locations. After applying the strict criteria to the LoRTIA tool, exactly 100 antisense transcripts were located at the genomic loci A5R (a single transcript), A11R (89 transcripts), and A18R (10 transcripts) (Fig. 4). The A11R antisense transcripts are various combinations of six TSS and 40 TES positions, whereas antisense A18R transcripts have the same TSSs but vary in their termination sites. Some of these transcripts contain small ORFs, which might indicate coding potentials (Note S2). In most cases, the level of asRNA expression was relative low, that is why the LoRTIA tool was unable to identify these molecules.
Intragenic non-coding transcripts. After applying our strict filtering criteria, we identified 260 3′-truncated transcripts from the LoRTIA dataset. We considered these as non-coding transcripts because they lacked inframe stop codons. However, we cannot exclude the possibility that they utilize an out-of-frame ORF for protein coding. These transcripts were detected to be expressed from 41 viral genes (36.15% of them are located within the O1L region). The A37R (8.85%) and A10L (5.38%) regions and the F12L-F13L gene cluster (11.15%) also expressed a substantial number of 3′-truncated RNAs. Altogether 62% of the 3′-truncated ncRNAs are located within five VACV ORFs (Supplementary Table S3b). www.nature.com/scientificreports/ Replication-associated transcripts. The VACV genome contains multiple replication origins (Oris) 56 located near the genomic termini. Oris are overlapped by many low-abundance ncRNAs; however, due to their large variation in length, their TSSs and TESs were not identified by either LoRTIA or other methods. We demonstrate that these RNAs are expressed from 4 h post infection (p.i.), and they may play similar roles as the replicationassociated RNAs of other herpesviruses 57 .
Novel mono-, bi-and polycistronic RNAs. In 'regular' genomic regions we detected 135 monocistronic coding transcripts of previously annotated ORFs using the LoRTIA software suit, and 12 additional mRNAs that did not meet the LoRTIA criteria due to their low abundance (Supplementary Table S7). Similar to other viruses and prokaryotes, but unlike the eukaryotic organisms, VACV reportedly express bi-and polycistronic transcripts [58][59][60] Table S3b, c, d, S8, Note S2) encoded by nine genomic segments. Six of these are TES isoforms of the A12L-A11R complex transcript, five cxRNAs are the combinations of TSSs and TESs within the A14L-A13L-A12L-A11R region, and four transcripts are alternative TES variants of VACVWR_161-A38L RNA. Three additional complex transcripts were excluded by LoRTIA analyses due to the low coverage at this genomic region. Since most VACV genes stand in tandem orientation with each other, and only a few convergently and divergently positioned gene pairs are present, the number cxRNAs is low compared to herpesviruses. Two cxRNAs (c-A21L-A20R, c-G3L-G2R) were considered non-coding because their upstream genes stand in an antisense orientation on the transcript. The remaining complex transcripts were categorized as mRNAs, because their first ORFs stand in the sense direction. We detected cxRNAs with convergently-oriented genes at E10R-E11L and A8R-A9L genomic regions. Fully overlapping convergent cxRNAs were also detected in the E10R-E11L region (Supplementary Fig. S3). www.nature.com/scientificreports/ LRS reveals highly complex isoform heterogeneity. Transcript isoforms include 5′-and 3′-UTR length variants of RNA molecules. Using tag-based RNA-Seq methods with SOLiD and Illumina sequencing platforms, Yang and co-workers determined 5′-and 3′-ends of VACV E mRNAs with high precision 14,19 and also mapped terminal sequences of I and L transcripts 40 . These authors also described the expression profiles of I and L RNA-products 16 . These studies identified several hundreds of TSSs and PASs; but they were unable to ascertain the precise 3′-ends of RNAs due to the high transcriptional complexity 14 .
Transcription start sites. We annotated 1073 TSSs of VACV transcripts, 987 of which have not previously been detected (Supplementary Table S9). Eighteen percent of TSSs matched with base-pair precision to those described by Yang and colleagues 14,15,19,40 . This value increased to 70% when we allowed a ± 10-nt, and to 93% once we allowed a ± 30-nt precision interval for the location of TSSs (Fig. 5a). Altogether 898 TSSs were detected in 'regular' regions. The previously published TSS positions are depicted in FigShare (Supplementary Note S2).
Transcription end sites. The TESs of early transcripts were increasingly heterogenic at late time points of infection, potentially because transcription termination signal recognition depends on different factors in early-and PR phases 12,14,15,19 . Due to the extremely high TES diversity ( Supplementary Fig. S4), we analysed their positions within a ± 30-nt interval. We found that 24% of our annotated TESs matched the positions of the polyadenylation sites described by Yang et al. 15 (Fig. 5b). Moreover, most genes express multiple transcript isoforms (median = 4, mean = 6.95), up to 30 in extreme cases, such as in regions.N1L, O2L, I1L, and A12L (Supplementary Table S3b; Supplementary Fig. S5). Moreover, no TSS positions belonging to 30 annotated ORFs, particularly in the A8R-A17L region, have been annotated in our and others' studies (Note S3).
cis-regulatory elements. Yang and colleagues described a 15-nucleotide consensus promoter core motif upstream of E genes, but these sequences were also present at other genomic locations. We identified this motif at nine novel positions upstream of (C9.5L, F4L, F7L, J3R, A15R, VACWR_161, and A51R) and inside (A36R and VACWR_169) ORFs. Cis-regulatory elements have not previously been annotated within these regions (Note S2). However, these core motifs are not present in the chaotic genomic regions. Additionally, most TESs of novel transcripts are located about 50 nucleotides downstream of UUUUUNU termination signals, as described by Yang et al. 19 . Those TESs which were not preceded by this motif might use an alternative mechanism for polyadenylation-site selection. Matching TSSs with TESs can only be carried out using LRS techniques, which are therefore crucial for the transcriptome analysis, especially in VACV which exhibits a very complex gene expression profile. We found that in certain DNA segments of 'chaotic' regions almost all base positions can function as TESs. Transcription initiation was also found to be stochastic at these regions. Accordingly, our analysis revealed that VACV genes express transcripts with multiple shared TSSs (Fig. 2a-c, Supplementary Fig. S4a) and TES positions (Fig. 2c, d, Supplementary Fig. S3b). Distributions of TSSs and TESs along the VACV genome are presented in Supplementary Fig. S6. We identified TSSs for 90% of the promoters described by Yang et al., (2010), with the exception of G2R, J6R, A18R, A20R, and five hypothetical genes located in the repeat region of the VACV genome. We used in silico promoter predictions to detect potential TATA-and CAAT-box consensus sequences of the present TSSs. We analysed 146 TSSs without previously annotated promoters and found TATA-boxes in 88 (Fig. 6, Supplementary Table S10), and CAAT-boxes in 5 of them (Supplementary Table S10). The median distance between a TSS and a putative TATA-box was 54 nts. Yet, no significant differences in promoter-TSS distances were identified between 5′-truncated and canonical transcript isoforms, suggesting that these cis-regulatory structures were similar in these two types of VACV genes (Fig. 6).
transcriptional overlaps. Using multiplatform analyses, we revealed an extremely complex meshwork of transcriptional overlaps, which were expressed by all viral genes. Cumulatively, 154 tandem, 32 divergent, and 32 convergent gene pairs were found in the VACV genome, with 21 overlapping ORFs between tandem genes, 13 between convergent gene pairs, and 5 between divergent gene pairs. Moreover, closely-spaced tandem genes can form partial and full parallel overlaps. We also detected 106 parallel transcriptional overlaps between tandem genes (Supplementary Table S11). Seventeen of the 32 convergent gene pairs formed UTR-UTR overlaps (Sup-   Table S12).

Discussion
Short-read sequencing has become a standard technique for the characterization of transcriptomes 61,62 . Yet comprehensive annotation of transcripts from SRS data remains challenging 63 . Auxiliary methods, such as Northern-blotting and rapid amplification of cDNA ends analyses are too laborious for investigations of large transcriptomes. Long-read sequencing allows the determination of full-length transcripts in single reads without computational inference, and thus distinguishes between transcript isoforms, polycistronic RNAs, and transcript overlaps with ease 64 . Previous studies using Illumina sequencing have detected large numbers of TSSs, TESs, and TISs of VAVC transcripts 12,15,19 . However, these analyses have not identified transcripts because SRS techniques are unable to match TSSs and TESs.
In this study, we employed two LRS platforms (PacBio and ONT) and sequenced cDNAs and native RNAs using various library preparation methods for surveying the temporal transcriptional activities of VACV genes. We developed pipelines for profiling long-read RNA sequencing data. In actuality, RNA molecules expressed by the virus may be much more numerous than we annotated. This may be due to a high heterogeneity of transcript ends, especially in 'chaotic' regions in the viral genome, which are difficult to analyse by bioinformatic techniques. The intermediate and late transcripts are particularly polymorphic in length, but the variation in TSS and TES positions of early transcripts is also increased at late stages of infection. Generally, canonical transcripts are significantly overrepresented among isoforms encoded in 'regular' genomic regions. Apart from canonical transcripts, VACV genes typically encompass shorter genes with N-terminal truncated ORFs, which may encode shorter polypeptides, possibly with altered effector functions. Moreover, longer transcript variants often incorporate uORFs, which may regulate the expression of downstream genes 32 .
The genomic organization of poxviruses differs from that of herpesviruses in many ways. Poxvirus E genes are gathered at the genome termini, whereas I and L genes are located in central parts of the DNA molecule. In addition, unlike herpesviruses, VACV DNA encodes numerous TSS and TES isoforms. While prototype herpesvirus transcripts are organized into tandem gene clusters generating overlapping transcripts with co-terminal RNA molecules, adjacent pox genes produce large numbers of alternative TESs. In VACV, the heterogeneity of TESs exceeds that of TSSs, whereas the opposite is the case in herpesviruses. The general presence of within-ORF TSSs is also unique to VACV. Yang and colleagues, using ribosome profiling assays, have also identified several downstream methionine residues and mapped these within main ORFs 12 .
We detected numerous TSSs upstream of TISs, thereby providing transcriptional evidence for the existence of potential truncated proteins. We also detected long 5′-UTR isoforms containing uORFs in most VACV genes, providing a potential source of micro peptides with regulatory functions, as demonstrated previously in Kaposi's sarcoma virus 65 . The observed transcript diversity of VACV may, however, represent transcriptional noise from cryptic promoters and/or error-prone transcriptional machinery. Moreover, the majority of transcripts contain multiple translationally active ORFs, and most isoforms contain unique combinations of ORFs and uORFs, suggesting that this transcriptional diversity is functional. Hence, further studies are needed to demonstrate the biological significance of these RNA molecules.
We provide evidence that poly(A) signals are frequently transmitted by RNA polymerase without downstream cleavage of transcripts. This process results in the production of polycistronic RNAs from tandem genes and asRNAs from convergent gene pairs. The ensuing stochastic transcription initiation and termination in poxviruses represents a unique gene expression system, which has not yet been described in other well-characterized organisms. No previous studies show translation of polycistronic RNA molecules from downstream genes in www.nature.com/scientificreports/ poxviruses. To explain this phenomenon and read-through transcription in herpesviruses 66,67 , we hypothesize that transcriptional overlaps are sites of transcriptional interference. These interactions have been suggested to be a part of a genome-wide regulatory system, coined transcriptional interference networks (TINs) 68 . We report sources of potential error in RT, PCR, and bioinformatic techniques, and propose strategies for addressing these.
In particular, we applied very strict criteria for accepting sequencing reads as transcripts. However, a large fraction of excluded reads could represent existing transcripts.

cells and viruses.
CV-1 African green monkey kidney fibroblast cells were obtained from the American Type Culture Collection. Cells were plated at a density of 2 × 10 6 cells per 150-cm 2 tissue culture flask and were cultured in RPMI 1640 medium (Sigma-Aldrich) supplemented with 10% foetal bovine serum and antibioticantimycotic solution (Sigma-Aldrich). CV-1 cells were incubated at 37 °C in a humidified atmosphere containing 5% CO 2 until confluent. Subsequently, about 2.6 × 10 7 cells were rinsed with serum free RPMI medium immediately prior to infection with the vaccinia virus WR strain diluted in serum free RPMI medium.
infection. Cells were infected with 3-ml aliquots of VACV at a multiplicity of infection of 10/cell, and were incubated at 37 °C in an atmosphere containing 5% CO 2 for 1 h with brief agitation at 10-min intervals to redistribute virions. Three-mL aliquots of complete growth medium (RPMI + 10% FBS) were then added to tissue culture flasks and the cells were incubated at 37 °C for 1, 2, 3, 4, 6, 8, 12, and 16 h in a humidified atmosphere containing 5% CO 2 . Following incubation, media were removed and cells were rinsed with serum free RPMI 1,640 medium and were subjected to three cycles of freeze-thawing. Cells were finally scraped into 2-ml aliquots of PBS and were stored at − 80 °C until use. Size selection: PCR products from mixed RNA samples were size-selected manually and were then run on Ultrapure Agarose gels (Thermo Fischer Scientific). Subsequently, fragments of greater than 500-bp were isolated using Zymoclean Large Fragment DNA Recovery Kits (Zymo Research).

RNA purification. Total RNA was extracted from infected cells at various stages of viral infection using
Barcoding: Individually sequenced cDNAs were barcoded using a combination of the following ONT protocols: the 1D protocol was used until the first end-preparation step, then we switched to the 1D PCR barcoding (96) genomic DNA (SQK-LSK108) protocol (version: PBGE96_9015_v108_revS_18Oct2016, updated  prediction of cis-regulatory sequences. Sequences at 100-nt upstream of a given TSS were extracted and an in-house script based on the GPMiner promoter prediction tool 70,71 was used for analyses of upstream cis-regulatory sequences of novel transcripts. The general settings of the algorithm were as follows: the eukaryotic promoter database of GPMiner was used to search exact matches without gaps or substitutions. The Vaccinia early promoter motif 14,15,19 and late promoter motif 41 were picked and visualized using the Geneious R10 Motif Finder tool, allowing one substitution in the string. Promoter statistics were calculated using Mann-Whitney U tests with two-tailed p-values. transcript naming scheme. Previously described (Copenhagen nomenclature 45 ) ORFs were not renamed.
Our scheme allowed for future additions of novel transcripts. Multicistronic transcripts were named for all contributed ORFs, the most upstream ORF was listed first. Non-coding transcripts are identified with the prefix "nc, " and complex transcripts carry the prefix "c. " When a non-coding transcript is antisense to the ORF for which it was named, an "as" prefix was added. Names of length isoforms, e.g., TSS and TES isoforms, end with "l" for long, "s" for short, or "AT" for alternative termination (Note S1). Isoform categories are presented in Supplementary Fig. S7.
The details for validation of specificity, quality and quantity of libraries are found in Supplementary Methods. The detailed protocols and the raw dataset were published 72