Structure mapping of dengue and Zika viruses reveals new functional long-range interactions

Dengue and Zika are clinically important members of the Flaviviridae family that utilizes an 11kb positive strand RNA for genome regulation. While structures have been mapped primarily in the UTRs, much remains to be learnt about how the rest of the genome folds to enable function. Here, we performed secondary structure and pair-wise interaction mapping on four dengue serotypes and four Zika strains in their native virus particles and infected cells. Comparative analysis of SHAPE reactivities across serotypes nominated potentially functional regions that are highly structured, show structure conservation, and low synonymous mutation rates, including a structure associated with ribosome pausing. Pair-wise interaction mapping by SPLASH further reveals new pair-wise interactions, in addition to the known circularization sequence. 40% of pair-wise interactions form alternative structures, suggesting extensive structural heterogeneity. Analysis of shared pair-wise interactions between serotypes revealed macro-organization whereby interactions are preserved at their physical locations, beyond their sequence identities. In addition, structure mapping of virus genomes released in solution-as well as inside host cells-showed that other helicases, in addition to the ribosome, play a role in unwinding viral structures inside cells. Mutational experiments that disrupt in cell and in virion pair-wise interactions result in virus attenuation, demonstrating their importance during the virus life-cycle.


Introduction
DENV and ZIKV viruses are members of the flavivirus genus of the Flaviviridae family of RNA viruses and are important human pathogens imposing a high economic and social burden worldwide 1 . DENV is predicted to infect 390 million people per year, resulting in DENV fever and in severe cases, death 1 . The ZIKV outbreak in Brazil was declared a public health emergency of international concern by the World Health Organisation in 2016 and has been associated with Guillain-Barre syndrome and microcephaly in infants 2 .
The genome of DENV and ZIKV viruses consist of a ~11kb long positive strand RNA that encodes a single polyprotein that is post-translationally cleaved into ten mature viral proteins, including three structural proteins (C, prM and E) and seven non-structural (NS) proteins (NA1, NS2A, NS2B, NS3, NS4A, NS4B and NS5) 3 . Beyond studying the primary sequence of such genomes, investigating how the genome is organized is important for understanding virus function 4 . Highly structured elements and long-range interactions in the 5' and 3' terminal regions, including the capsid region, of flaviviral genomes have been shown to be essential to both translation and replication of these viruses [5][6][7][8] . Additional local structures throughout the genome have been computationally predicted to exist by Proutski et al however, these predictions and their potential functional relevance to the viral life cycle have not been assessed 9 .
Here, we perform genome-wide RNA secondary structure and interactome mapping on all four serotypes of dengue (dengue 1-4) and four geographically distinct Zika viruses (African, Brazil, French Polynesia, and Singapore)representing the known genetic diversity of this emergent virus. To circumvent limitations in RNA structure probing of virus RNAs in vitro, related to changes in solvent conditions, altered RNA-protein interactions and the absence of the virus envelope 10,11 , we performed structure probing of dengue and Zika inside their native virus particles and in infected cells (Figure 1a). We observe that these genomes are indeed highly structured and identified new conserved structures across these viruses. Further, we show that many additional long-range interactions, besides the circularization signal, exist and are essential for the virus life cycle. Finally, comparison of in cell and in virion RNA interactions show that many interactions are disrupted in vivo, suggesting that they are being actively unwound by helicases in the cell.

NAI-MaP of Dengue and Zika genomes inside virus particles
To map secondary structures across the viral genome, we treated intact virus particles with a SHAPE-like chemical, 2-methylnicotinic acid imidazolide (NAI) 12 , which has been shown to modify single-stranded regions along RNAs efficiently, in vivo. We then extracted the RNA genomes from the virus particles and identified the modification sites by mutational mapping (MaP) (Figure 1a) 12 . NAI-MaP signals have first been validated to be accurate in vivo, based on the 28S rRNA of Hela cells (Extended data 1a).
We performed at least two biological replicates of NAI-MaP on each of the four dengue and four Zika viruses, and sequenced >120 million reads per sample (Supplementary Table 1).
This resulted in >100 reads mapped per base on 99.99% of all bases across the eight viruses, yielding structure information for most of the bases along each genome (Supplementary   Tables 2,3). The NAI-MaP reads between two biological replicates were well correlated to each other (Pearson correlation, p>0.81), suggesting that the structure probing was of good quality. Gel electrophoresis of extracted dengue and Zika RNA showed that the majority of the genomic RNA is intact (Extended data 2a,b), indicating that structure probing was performed while the RNA was in its full-length context. NAI-MaP signals on renatured dengue 1 RNA are also largely congruent with the structure signals from Parallel Analysis of RNA Structures (PARS), which utilizes enzymatic footprinting coupled with deep sequencing 13 (Extended data 2c).

Dengue and Zika genomes contain extensive secondary structures in the coding regions
In addition to the known elements in the 5' and 3'UTRs 6 , application of NAI-MaP to the dengue and Zika genomes showed that there are many previously unidentified structured regions across the entire genome (Figure 1b, top). We also calculated the Shannon entropies 12 , which indicates the likelihood that a region forms defined structures, along the entire length of dengue and Zika genomes to identify regions that are likely to form unique structures (Figure   1b, bottom). NAI-MaP reactivities mapped to known structures in the 5' and 3'UTRs of dengue virus demonstrated that we could accurately detect single-and double-stranded regions within the dengue1 and dengue 2 genomes as expected (Figure 1c,d, Extended data 2d).
Structure probing across different viruses allows us to perform comparative structure analysis across the 8 viruses to identify shared structures. As the four dengue serotypes share around 60-70% sequence identities from each other and ~58% sequence identity with the Zika strains (Extended data 3a), the sequence divergence allow us to analyse the relationship between their structure conservation and sequence identity. For ease of comparing structural information across multiple genomes, we discretized the gradated NAI-MaP reactivities into a binary score that indicates whether a base is reactive (unstructured, reactivity >=0.5) or unreactive (structured, reactivity <0.5), based on the benchmark reactivity of the 28S rRNA (Extended data 3b). We observed that unpaired bases shared across the viruses tend to have higher sequence conservation (Extended data 3c), suggesting that these regions might contain important sequence information for gene regulation such as for interaction with RNA binding proteins. As expected, regions that share similar structure patterns between the 8 viruses generally share higher sequence identity, consistent with the idea that sequence plays an important role in RNA structure (Figure 1e, Extended data 3d). However, we also observed genomic regions that consistently show higher structural similarity over sequence similarity, such as in the NS2A region, suggesting that specific regions along the genomes are under evolutionary pressure for structure conservation (Figure 1e, starred).
To further identify potentially important structures along the genome, we searched for structurally similar, and also highly structured regions with low synonymous mutation rates.
Synonymous mutation rates are calculated along hundreds/thousands of dengue sequences and hundreds of Zika sequences within each serotype, to identify mutation cold spots within the genome (Extended data 4a,b, Methods). We nominated sixteen dengue and twelve Zika consensus RNA regions that fulfilled two of three criteria, namely that they: (i) are highly structured; (ii) share similar NAI reactivity patterns; and (iii) show low mutation rates within and across serotypes (Figure 2a, Extended data 4c, in purple, Methods). The newly identified genomic locations have significantly lower Shannon entropies than average, indicating they are likely to form unique structures (Extended data 4d). They also overlap with the regions that show evolutionary pressure at the structural level (above), further confirming the importance of these regions. SHAPE reactivities from structure probing experiments have been previously incorporated as pseudo-energies to generate accurate structure models for long RNAs using programs such as RNAstructure 14 . To model the RNA structures present within the dengue and Zika genomes, we incorporated our NAI-reactivities as pseudo-energies into the RNA structure program. We checked that incorporating NAI-MaP reactivities into structure modelling significantly improved the accuracy of the structure models using 16S and 23S rRNA (Extended data 5a, Methods). Structure modelling of the dengue and Zika genomes showed that it is highly structured, with numerous structures across the entire genome (Figure 2b).
Predicted structures are significantly enriched for co-varied bases, confirming the high quality of the structure models (Extended data 2b). Interestingly, similar to other flaviviruses such as HCV 15 , all of the eight dengue and Zika genomes maintain a short median helix length of 4 consecutive canonical base pairs, with >90% of the base-pairs existing in helices that contains 7 consecutive base-pairs or less, enabling the genomes to evade immune surveillance inside cells (Extended data 3c,d, e).
To determine whether any of our structures in dengue and Zika could be associated with cellular processes such as translation, we performed ribosome profiling on human liver cells (Huh7) infected with dengue 1 virus 16 . Ribosome profiling showed the classic three-nucleotide periodicity of translating ribosomes on the genome (Extended data 7a), with a large pileup of footprints at the start of the dengue 1 polyprotein, suggesting that ribosome profiling data is of good quality (Extended data 7b). Globally, we did not observe a correlation between ribosome pausing and increased structure along the dengue genome (Extended data 7c), although we did observe a strong ribosome pause site in the middle of the coding region for the viral membrane protein NS4B (Figure 2c). Interestingly, the ribosome pause site resides in one of our dengue consensus structures that has low Shannon entropy (segment 12, Figure   2b, Extended data 7c). Structure modelling of this region indicates that the ribosome pause site is located near the base of a long stem, in a highly structured environment (Figure 2b,c, starred), suggesting that structure could play a role in translation pausing. To rule out that the strong ribosome pausing could be due to codon usage, we calculated the frequencies of codons around the ribosome pause site. We did not observe a significant difference in codon frequencies, as well as the presence of any rare codons, around the pause site, suggesting that codon usage is unlikely to be the main reason behind the strong ribosome pause (Extended data 7d). Further work needs to be done to understand the role of RNA structure in regulating translation in this region.

Long-range pair-wise interactions are abundant in Dengue and Zika genomes
Beyond local RNA base pairing, long-range RNA-RNA interactions have been shown to play important roles in viral replication by enabling the genome to circularize and position the RNAdependent RNA polymerase close to the transcription start site 5,7,17 . The three principal RNA sequences of the dengue genome responsible for the long-range interaction of the 5' and 3' ends are: (i) the circularization sequence (CS); (ii) the upstream AUG region (UAR); and (iii) the downstream AUG region (DAR) 17 . A fourth interaction region located at 150 bases (C1 structure of the capsid protein) and the dumbbell of 3'UTR has also been found to be involved in genome circularization 7 . To directly capture long-range pair-wise interactions that span longer than 500 bases in dengue and Zika genomes comprehensively, we performed two biological replicates of sequencing of psoralen crosslinked, ligated, and selected hybrids (SPLASH) on each of the eight viral genomes, inside their viral particles, using biotinylated psoralen, proximity ligation and deep sequencing (Figure 1a) 18 . We further enriched our data for real interactions by filtering against random interactions that could occur by permutation (Extended Figure 8a).
The resultant SPLASH data revealed thousands of new intramolecular long-range RNA interactions inside virus particles (median distance of interaction = 3.6kb), greatly expanding the list of long-range interactions known for any family of RNA viruses (Supplementary   Tables 4,5). We observed interactions that arise from the known circularization sequences in both the dengue and Zika viruses as one of our top interactions (Figure 3a), indicating that genome circularization is not only important in vivo, but is also maintained in virions. To increase the resolution for identifying interaction positions, we performed peak calling of the mapped reads on each end of the interaction (Methods). Peak calling revealed clear peaks that were around 50 bases wide, centred at the known 5' and 3' UAR and DAR interaction sites (Figure 3b). Hybridization of the 5' and 3' peak regions using the program RNAcofold revealed the interaction sites of the UAR, DAR and CS regions in the dengue genomes ( Figure 3b) 19 , demonstrating the precision of SPLASH data.

Dengue and Zika genomes forms extensive alternative structures
Previous studies in HCV suggests that its genome can fold into alternative conformations 20 .
To determine the amount of structural heterogeneity in the dengue and Zika genomes, we calculated the number of RNA regions that are observed to interact with two or more other regions along the genome. We observed that ~40% of the genome folds into alternative structures, indicating a large amount of structural heterogeneity within the genomes (Figure   3c, Extended data 8b). Interestingly, the alternative interaction sites are enriched in regions with significantly higher Shannon energies, confirming that these regions tend to take on more than one structure (Figure 3d). To test whether the different alternative structures are generally functional or whether only the most stable interaction is important to the virus, we mutated long-range interactions that could pair with three different regions along the Zika French Polynesia genome (1786:5290, 1784:7034 and 1784:8462), disrupting the interactions that bring the nucleotide sequences encoding the envelop protein into close proximity with those of NS3, NS4B and NS5 respectively (Figure 3e, Extended data 8c). As we chose mutations that disrupt pair-wise interactions and preserve the amino acid sequence and codon frequencies, we were unable to design for compensatory mutations (Figure 3f). Mutations along each individual strand in the different alternative structures resulted in a decreased amount of virus produced in both the supernatant and inside infected Huh7 liver cells ( Figure   3g). Interestingly, we observe a greater decrease in virus production in the supernatant as compared to inside cells, suggesting that the mutations might affect genome packaging.

SPLASH reveals two distinct modes of conserved long-range interactions
Interactome mapping across the four dengue and four Zika strains allow us to identify shared long-range interactions that are present across different viruses. In addition to the known circularization signal, we also observed macro-patterns of conservation whereby genomic sequences from similar spatial locations consistently interact with each other, in two or more different viruses (Figure 4a,b, Supplementary Tables 6,7). We observed two distinct types of conserved pair-wise interactions. The first type preserved both the sequence and the spatial locations of the interactions across serotypes. This is exemplified by the discovery of a conserved long-range interaction between E and NS4B (bases 1200:7000) in dengue and Zika -whereby different viruses use homologous sequences to form the long-range interaction (Figure 4c). The second type of interaction preserves the spatial location, but not the sequence of interaction between the serotypes, to bring two distant regions of the genome into close proximity. This latter type of interaction is exemplified in the pairings between C and NS5 and NS2A and NS5 regions in both dengue and Zika genomes ( are used to hybridize with bases 9629-9648 instead. This second mechanism of genome interaction, whereby different sequences could be utilized for genome organization, could hypothetically enable RNA viruses to tolerate mutations -as long as the overall architecture of the genome is not affected -and could also reflect convergent evolution in different viruses to achieve similar genomic folds.

Longer-range interactions are actively disrupted inside cells
While structure probing of virus genomes inside their virions provides useful information on their packaging, virus genomes exist in very different states inside their host cells. To understand how dengue and Zika genomes fold inside infected cells, we performed pair-wise interactome mapping of dengue1 and the four Zika viruses in human liver and neuronal precursor cells respectively (Extended data 10a, Supplementary Tables 8,9). We observed that the circularization signal is present inside cells, agreeing with its importance in genome replication 5 (Figure 5a, Extended data 10b). Interestingly, we observed that the viruses form much shorter pair-wise interactions inside cells as compared to inside virion particles (Figure   5b), suggesting that the genomes are less structured inside cells. We hypothesized that the genomes are either more structured inside virions because of the spatial constraints induced by the virus envelop, and/or that the genomes are actively being unwound in vivo. To test this, we performed interactome mapping on virus genomes that are released from the virions by performing proteinase K treatment on the virus particles (Figure 5c). Although we do observe genome rearrangements when the genome is released in solution, in particular longer-range interactions are disrupted while new shorter-range interactions are formed, the genome remains highly structured in solution (Figure 5c,d). This suggests that the unstructured state we observe inside cells is a result of active unwinding by enzymes in vivo 21 .
As the ribosome is a major helicase that disrupts RNA structures in vivo and previous reports have shown that translation preferentially disrupts longer-range interactions 22 , we further tested the effect of translation inhibition on structure by performing SPLASH on dengue infected cells that are treated with harringtonine to immobilize ribosomes right after initiation 23 .
As expected, inhibiting translation partially restored long pair-wise RNA interactions in human mRNAs, and not in human non-coding RNAs 22 . Surprisingly, we observed a negligible effect on RNA pairing in the dengue genome with and without translation inhibition (Figure 5e Disruption of structures that are present in high abundance either inside virion particles or inside cells resulted in virus attenuation. We observed less virus production in the supernatant, as compared to inside cells, for two out of four interactions, indicating that genome packaging is a major stage of virus lifecycle that the interactions are involved in (Figure 5f,i). Interestingly, mutating two interactions that abundantly present in both the cells and the virus particles resulted in severe phenotype of >100 fold decrease in virus production (Figure 5g,h). While it is extremely difficult to design compensatory mutations to rescue pair-wise interactions in the virus coding region, due to amino acid and codon frequency constraints, we managed to design point mutations to rescue one of the key interactions between NS2A and NS3. While mutations in NS3 region (5723-5750) resulted in a severe virus attenuation, this phenotype is restored by designing compensatory mutations in the NS2A region (3871-3901) (Figure 5h).
These results confirm that the newly identified pair-wise interactions play important roles during the lifecycle of the viruses.

Discussion
Elucidating the molecular underpinnings of RNA viruses is key to understanding their pathogenesis 4 . Here, we integrated high throughput secondary structure mapping, pair-wise interaction mapping, ribosome profiling and evolutionary conservation information for four dengue and four Zika viruses to study how the structural organization of the genome impacts virus function 16,18 . Importantly, the structure mapping is performed inside native virus environments, including inside virion particles and host cells, enabling us to identify biologically relevant structures. We discovered (i) highly structured and conserved elements across the dengue and Zika genomes, (ii) a new structural element associated with strong ribosomal pausing in the coding region of NS4B, (iii) alternative and conserved pair-wise interactions across serotypes, and (iv) that longer-range pair-wise interactions are actively disrupted inside host cells. Many of the shared pair-wise interactions preserve sequence locations but not sequence identities, suggesting an alternative mechanism by which viruses could use diverse sequences to achieve the overall goal of genome packaging. Surprisingly, while the ribosome has been shown to be a major helicase to unwind RNA structures during translation 21 In summary, our data expands upon the known structures in dengue and Zika and reveals a network of new interactions within the dengue and Zika genomes that are likely work together to support and facilitate virus fitness. This comprehensive resource of dengue and Zika genome organization provides the first foray into higher order genome organization of large RNA viruses and aids in the design of broad based RNA therapeutics in targeting the structure of these pathogenic viruses. We performed gel electrophoresis of the virus RNA after each extraction on a 0.6% agarose gel, using ssRNA (NEB) as the ladder, to ensure that the virus RNA is intact. We typically see a single band above the 9kb RNA ladder that indicates the presence of intact full length dengue and Zika RNA genomes. We then perform library preparation following the SHAPE-MaP protocol to generate cDNA libraries compatible for Illumina sequencing.

NAI-MaP structure probing of Hela cells
Hela cells were grown in DMEM high glucose media (Thermo Fisher Scientific), supplemented with 10% FBS, 1% Pen-Step to 70-80% confluency. Cells from 1 10 cm plate was trypsinized, washed once with PBS, and resuspended in 475 µl of PBS. 25 µl of 1M NAI was added to the cells at 37°C for 15 min. Total cellular RNA was extracted using TRIzol extraction, followed by passing through the RNeasy column (Qiagen). We performed structure libraries for sequencing following the SHAPE-MaP protocol.

Pair-wise interactome mapping inside virions
Freshly collected virus was treated with a 1:500 volume of 20 mM biotinylated psoralen (2 µl of biotinylated psoralen in 1 ml of virus in cellular media) and incubated at 37°C for 10 min.
The treated viruses were then spread onto a 10 cm dish and UV irradiated at 365 nm for 20 min on ice to crosslink the interacting regions. The crosslinked virus genomes were then extracted using TRIzol LS reagent (Thermo Fisher Scientific) following the manufacturer's instructions. We performed SPLASH libraries similarly to the published protocol in Aw et al 18,25 .

Pair-wise interactome mapping of dengue and Zika viruses inside human cells.
Huh7 and human neuronal precursor cells were infected with dengue 1 and Zika viruses at an  25 , and total RNA was extracted using TRIzol. We performed SPLASH libraries similarly to the published protocol in Aw et al 18,25 .

Ribosome profiling of dengue1 virus in Huh7 cells
Huh7 cells were seeded in DMEM high glucose media (Thermo Fisher Scientific), supplemented with 10% FBS, 1% Pen-Step to 50-70% confluency. The cells were infected with dengue1 virus (MOI=1) in serum free media for 1 hour and grown in DMEM high glucose media (10% FBS, 1% Pen-Step) for 20 hours. To harvest the polysomes, we added 1:1000 volume of 100 µg/ml of cyclohexamide to the cell media at 37°C for 10 min. We then typsinized the cells (trypsin contains 0.1 µg/ml of cyclohexamide), washed them with PBS (containing 0.1 µg/ml of cyclohexamide) and lysed the cells using mammalian lysis buffer from the TruSeq Ribo Profile kit (Illumina). We prepared the ribosome profiling library using the TruSeq Ribo Profile kit (Illumina) following the manufacturer's instructions.

Functional assay to test for virus fitness
The entire genomes of dengue1 and Zika French Polynesia viruses were cloned into five

Shape-MaP Analysis
After quality filtering with a Phred cutoff of 25, reads were aligned against the published reference genomes (Table S1) for DENV serotypes 1-4 and the four ZIKV strains using Bowtie 2 at the highest default sensitivity setting 26 . These alignments were performed separately for the data sets from experiments treated with DMSO and NAI respectively, as well as the denatured experiment. Mutations were counted separately in each set and subsequently expressed as a shape score by calculating (M NAI -M DMSO )/M denat at each position as outlined by Weeks and co-workers 12 . Subsequently, shape scores discretized for each nucleotide position by classifying every score below 0.5 as low reactivity and at or above 0.5 as high reactivity.
Discretization allowed us to easily compare structured and unstructured stretches across different viruses as well as to characterize the length and distribution of structured and unstructured motifs along the genomes. We calculated the distribution of continuous segment lengths and segment lengths with two transitions as would be expected for hairpin or helices containing a bulge. We compared these metrics with a variety of other types of RNA such as Xist mRNA, 28S rRNA and random controls 27 . Significance levels in the difference of distributions was calculated using the Wilcoxon rank-sum test.

Sequence analysis of different DENV and ZIKV genomes
We performed multiple sequence alignment of the reference genomes using the program MAFFT with the -genafpair option 28

Analysis of sequence conservation at structurally similar or different bases in DENV and ZIKV genome
Based on shape reactivity, bases were classified as high (H) or low (L) using a cut-off of 0.5.
The cut-off for NAI-Shape-MaP was optimized to have the best discriminative power for the known 28S structure. To assess whether structures are conserved across DENV and ZIKV viruses, majority votes were applied at each position of the reference alignment. Bases whereby the four DENV subtypes or the four ZIKV strains have either 3/4 or 4/4 identical shape classes (e.g. HHHL would be classed as H, LLHL would be classed as L) as classified as structurally similar. Classes without majority were identified as indeterminate (e.g. HLHL or HHLL) and were not included in the subsequent analysis. Subsequently, we divided the genome into positions that have identical majorities for DENV and ZIKV (same, S), and positions where the majority structure is different between DENV and ZIKV (different, D). We then calculated the distribution of multiple sequence alignment scores within each class.
Significance levels in the difference of distributions was again assessed using the Wilcoxon rank-sum test.

Structure modelling of DENV and ZIKV genomes
Using the Shape-MaP data in conjunction with RNA structure prediction algorithms allowed us to create structural models for the full-length Dengue and Zika genomes. We utilized the program RNAstructure 29 and incorporated the Shape-MaP results as additional restraints using a slope of 0.5 and an intercept of 0.0 kcal/mol for the Shape-MaP potentials. These values were obtained by optimization of structure prediction for 28S human ribosome using NAI-Shape-MaP data, otherwise using default settings.

SPLASH long-range interactions
SPLASH long-range interactions. SPLASH reads were aligned to their respective reference genome using the program Bowtie 2 to perform a paired read alignment 26 . A matrix containing a count of all pairwise T positions observed in the reads was compiled. To filter random pairs, we constructed a baseline matrix by randomly shuffling all read pairs 100 times and subsequently subtracted this baseline matrix from the raw read matrix. We then considered all sites that exceed a threshold value of 10% of the maximal observed value as valid SPLASH interactions. The shuffling procedure and threshold values were optimized for the known human and E. coli ribosome structures, and allowed us to markedly improve specificity and the positive predictive value for SPLASH interactions. Long-range interactions were considered conserved if they occur in at least two strains. Plotting the precise read depth across the genome allows for identification of distinct peaks mostly within <50nt segments and thus allows the localization of the interaction sites. We analyzed the promiscuity of these longrange interactions by counting the number of alternate interaction partners we find at each site originating long-range interactions. We identified several sites that show multiple coinciding interactions for the same sequence stretch. Competing interactions were modelled using RNAcofold from the ViennaRNA package and visualized including the Shape-MaP and covariation data 19 .

Peak calling of SPLASH interaction sites
The genome of each of the eight viruses were divided into non-overlapping bins of 100 bases.
Each chimeric read was split and mapped to two paired bins. Paired bins that contain at least 20 chimeric reads were then selected for peak calling. The density curves of read coverage were fit (using the "density" function of R) for the left and right bins respectively. The peak of the interaction was determined to be the position with the maximum coverage density.

Identifying regions of functional interest
Regions of functional interest were identified if they fulfill at least 2 out of 3 of the following criteria: (i) Structure similarity is in the top 30% among all regions; (ii) structured stretch length is in the top 30% of all regions; and (iii) the region is in the bottom 30% for synonymous mutation rate, indicating selection pressure on the RNA itself for both dengue and Zika.
Structural similarity was assessed by calculating the local identity of the discretized shape scores. The length of structured stretches was calculated from the same data set. The synonymous mutation rate was calculated from an alignment of all dengue and Zika full genome sequences available at the time of writing. Structure models for all identified sequences including available covariation data are presented in the manuscript.