The architecture of the SARS-CoV-2 RNA genome inside virion

SARS-CoV-2 carries the largest single-stranded RNA genome and is the causal pathogen of the ongoing COVID-19 pandemic. How the SARS-CoV-2 RNA genome is folded in the virion remains unknown. To fill the knowledge gap and facilitate structure-based drug development, we develop a virion RNA in situ conformation sequencing technology, named vRIC-seq, for probing viral RNA genome structure unbiasedly. Using vRIC-seq data, we reconstruct the tertiary structure of the SARS-CoV-2 genome and reveal a surprisingly “unentangled globule” conformation. We uncover many long-range duplexes and higher-order junctions, both of which are under purifying selections and contribute to the sequential package of the SARS-CoV-2 genome. Unexpectedly, the D614G and the other two accompanying mutations may remodel duplexes into more stable forms. Lastly, the structure-guided design of potent small interfering RNAs can obliterate the SARS-CoV-2 in Vero cells. Overall, our work provides a framework for studying the genome structure, function, and dynamics of emerging deadly RNA viruses.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causal pathogen of Coronavirus Disease 2019 (COVID-19) 1-3 . As a single-stranded and positive-sense RNA virus, SARS-CoV-2, together with SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV), all belong to the Coronaviridae family 4 . SARS-CoV-2 carries one of the largest RNA genomes (~30 kilobases, kb) among all RNA virus families and encodes about 29 proteins 1,3,5-7 . Since its outbreak in late December 2019, SARS-CoV-2 has infected tens of millions of peoples and caused over one million deaths worldwide (https://covid19.who.int/). Although global efforts and resources have been redirected to ghting against SARS-CoV-2, there are no effective antiviral medicines or approved vaccines available yet. Considering the RNA nature of SARS-CoV-2, RNA-based therapeutics such as small interference RNAs (siRNAs) or antisense oligos (ASOs) are emerging as potent agents to cleave the viral RNA genome in infected host cells. Because RNA structure can signi cantly in uence the e cacy of siRNAs and ASOs 8,9 , deciphering the 3D structure of SARS-CoV-2 becomes an urgent need prior to RNA-based drug development.
RNA structures are widely recognized as critical modulators in regulating transcription, translation, and replications of coronavirus and other RNA viruses [10][11][12][13][14][15][16][17] . At this frontier, many efforts have been devoted to study the structure of SARS-CoV-2. Even though Cryo-electron microscopy and 3D electron tomography are powerful in delineating the global architectures of SARS-CoV-2, the entire RNA genome inside virions remains unrevealed [18][19][20] . Besides these physical approaches, several chemical-based high-throughput sequencing methods, such as icSHAPE and DMS-MaPseq, have been recently applied to probe singlestranded regions of SARS-CoV-2 in infected cells or in vitro [21][22][23][24] . The mapped single-stranded information could be further used as restraints to predict the base-paired regions within 500 nt 25 . Yet the current secondary structural model of SARS-CoV-2 might be incomplete since it missed information of longrange duplexes, which are prevalent and vital for completing the life cycles of positive-strand RNA viruses 26,27 . As a signi cant advance, a psoralen-based method called COMRADES recently identi ed many long-range RNA duplexes of SARS-CoV-2 inside cells, further highlighting the importance of RNA duplexes in maintaining SARS-CoV-2 tness 28,29 . Signi cantly lagging behind those in-cell structural studies, how the 30 kb genome of SARS-CoV-2 is organized and arranged in virions remains unclear.
We recently developed an RNA in situ conformation sequencing technology, named RIC-seq, for unbiased mapping of RNA-RNA spatial interactions in living cells 30 . RIC-seq utilizes a pCp-biotin to label proximally interacting chimeric RNAs and high-throughput sequencing to retrieve their spatial proximity information.
We demonstrated that RIC-seq could successfully detect short-and long-range duplexes, multiple-way junctions, and loop-loop contacts without base pairing potentials. These merits make RIC-seq more competent to decipher the 3D structure of the SARS-CoV-2 RNA genome. But SARS-CoV-2 virions are typically 80 nanometers in diameter and can't be pelleted down as human cells by standard centrifugation 1 . This feature makes virions not compliant with our current RIC-seq protocol that includes extensive washing and standard centrifugation at every enzymatic step 30 . To overcome the major challenges, we design a way to hold SARS-CoV-2 virions on Concanavalin A (ConA) beads that bind speci cally to glycoproteins present on the surface of virions. By optimizing virion capture, crosslinking, and enzymatic conditions, we further developed a virion RNA in situ conformation sequencing technology, named vRIC-seq, for global mapping of viral RNA genome structures in intact virions.
In this study, we successfully applied the vRIC-seq technology to probe the SARS-CoV-2 RNA genome structure in intact virions. We reconstructed a model of the surprisingly compact yet unentangled tertiary structure of the SARS-CoV-2 RNA genome. At the secondary structure level, we discovered that the invirion structure of SARS-CoV-2 keeps mostly unchanged in the cell except for many long-range RNA duplexes. We noted that many long-range RNA-RNA interactions are under purifying selection and might contribute to the 3D package of the SARS-CoV-2 RNA genome. Finally, we uncovered several highly accessible single-stranded regions in SARS-CoV-2 for e cient viral RNA cleavage in Vero cells by using siRNAs.

Overview of vRIC-seq technology
Like other coronaviruses, the envelope of SARS-CoV-2 contains two major glycoproteins spike (S) and membrane (M) 31,32 . According to this feature, we used magnetic beads coated with ConA, a kind of plant lectin that can speci cally bind glycoproteins, to capture the SARS-CoV-2 virions prepared from the supernatants of infected Vero cells (see Methods). After capturing the virions on beads, formaldehyde was further applied to crosslink the nucleocapsid (N) protein-mediated proximal RNA-RNA interactions, as well as ConA and the surface glycoproteins (Fig. 1a). Next, the virions were permeabilized and treated with micrococcal nuclease (MNase) to digest away free RNAs that were not close to each other.
Subsequently, all the proximal RNAs were 3' end-labeled with pCp-biotin and ligated together by T4 RNA ligase. Lastly, the resulting chimeric RNA marked with pCp at the juncture were enriched and converted into libraries for paired-end sequencing (about 260 bp, Fig. 1a and Supplementary Fig. 1a).
We obtained approximately 24.6 million unique reads for each replicate and ~3.4 million chimeric reads resulting from different RNA fragments. Using the RIC-seq data analysis work ow established earlier 30 , we found that 97.3% of the chimeric reads could be aligned to the SARS-CoV-2 genome, and 2.3% were mapped to host RNA (Fig. 1b). The trace amounts of host RNA reads might be derived from detached Vero cells at the virion collection stage. Notably, approximately 0.4% of the chimeric reads were virus-tohost RNA interactions (Fig. 1b), which may re ect the random ligation rates between virions and Vero cells, therefore, further highlighting the speci city of vRIC-seq technology. Besides, the virus-to-host interactions also raised the possibility that some host RNAs might be packaged into the virion, just like tRNA in the HIV virions for its replication 33 . However, such a possibility was excluded because we observed a random virus-to-host RNA interaction pattern across all the green monkey chromosomes ( Supplementary Fig. 1b).
The pCp-biotin labelling and selection were successful because approximately 85% of the additional nucleotides at chimeric junctions were cytosine ( Supplementary Fig. 1c). vRIC-seq is highly reproducible on chimeric reads coverage (R = 0.999) and interaction strength (R = 0.995) of pairwise sites ( Fig. 1c and Supplementary Fig. 1d). Moreover, we noted that 99.7% of the SARS-CoV-2 genome was covered at least 2500× by chimeric reads ( Supplementary Fig. 1e), and the remaining 0.3% covered by less than 2500× was mainly located at the stem-loop 1 of 5' UTR and the poly (A) region at the 3' terminal. Next, we aligned the pairwise interacting RNA fragments to the SARS-CoV-2 genome, and such analysis revealed many regions that are preferably cut by MNase and subsequently labelled with pCp-biotin (Fig. 1d). We also noticed that the rst 3,500 nucleotides (nt) of ORF1b had a relative lower vRIC-seq coverage, and nucleotide content didn't contribute to this difference (Fig. 1d).

Recapitulate known structures of the SARS-CoV-2 genome
We rst determined to validate the viral RNA spatial interactions revealed by vRIC-seq. For this purpose, we deduced several conserved RNA duplexes in coronavirus using vRIC-seq data and compared it with recently proposed SARS-CoV-2 secondary structure models 13,[21][22][23][24][34][35][36] . As expected, vRIC-seq faithfully recapitulated all of the stem-loops in the 5' UTR (1-395 nt) of SARS-CoV-2 except stem-loop 1 (SL1), which is approximately 40 nt in length and can't be puri ed by AMPure XP beads (Fig. 1e). Moreover, the 3' UTR of SARS-CoV-2 contains several conserved structural elements known to functionally impact viral RNA synthesis and translation, including the stem-loop II-like motif (s2m), hyper-variable region (HVR), and mutually exclusive bulged stem-loop (BSL) or pseudoknot (PK) 36 . We found that vRIC-seq successfully captured the canonical s2m and HVR structures (Fig. 1f). However, in contrast to the previous theoretical model 36 , our data preferentially supported a double hairpin conformation for BSL and P2 stem, rather than the pseudoknot conformation (Fig. 1f). These results demonstrate that vRIC-seq can faithfully probe RNA spatial interactions in the virion, and support that this proximity information can be used for structure modelling.
Topological organization of the SARS-CoV-2 genome After validating the vRIC-seq data, we next investigated the features of SARS-CoV-2 genome organization in the virion. To this end, we rst calculated the spanning distance of pairwise interacting RNA fragments. Approximately 90.6% of the pairwise interactions happened within 100 nt, whereas approximately 6.3% of the interactions spanned more than 600 nt, and three sharp peaks with 810, 1360, and 2090 nt separately, were observed (Fig. 2a). Of note, those long-distance interactions were not caused by the discontinuous transcription of SARS-CoV-2 during negative-strand synthesis 6 , because we observed a clear enrichment of pCp at the chimeric junctions (see red lines, Supplementary Fig. 2a,b).
Following our previous approach, we divided the SARS-CoV-2 genome into 10-nt windows and constructed an RNA interaction map (Fig. 2b). Using the map, we identi ed 254 clustered interactions positioned perpendicular to the diagonal, suggesting the widespread occurrence of local duplexes in the SARS-CoV-2 genome (Fig. 2b). Surprisingly, we also noticed 77 long-range interactions that were sequentially distributed and covered almost the entire genome (Fig. 2b). Similar to the organization of human primary transcripts 30 Table 1). The largest domain is located before the 3' UTR and contains sequences encoding ORF7a, ORF7b, ORF8, and the N protein.
Theoretically, vRIC-seq can capture base-paired RNA duplexes as well as protein-mediated indirect RNA contacts. To examine the base-pairing probabilities for the observed long-range interactions within 2.5 kb, we calculated the minimum free energy (MFE) for pairwise interacting fragments that spanned over 400 nt. We observed signi cantly lower MFE values than randomly shu ed sequences with the same nucleotide content (P = 1.62e-10, Supplementary Fig. 2d), indicating that those long-range RNA-RNA interactions may be directly base-paired. Notably, we also uncovered 62 pairwise interacting RNA fragments that could span over 2.5 kb ( Supplementary Fig. 2e). However, those individual RNA fragments showed 12-fold stronger interactions with its local partners rather than distal partners ( Supplementary   Fig. 2f), as exampled for long-range interactions between 1,180-1,190 nt and 29,343-29,354 nt, which both showed stronger local interactions (Supplementary Fig. 2g-i). These data suggest that some alternative topology of the SARS-CoV-2 genome might be present in the virion.

3D globule con guration of the SARS-CoV-2 genome in virions
Previous microscopy studies revealed a spherical shape with a mean diameter of ~80 nm for CoV and SARS-CoV-2 viral particles 1,37,38 . To explore how the 30 kb RNA genome of SARS-CoV-2 is folded in the tiny virion, we utilized the contact frequencies data of different RNA fragments as constraints to model the global conformation of the SARS-CoV-2 genome. Pursuing this, we adopted a widely used miniMDS (multidimensional scaling) approach to infer the 3D structure of SARS-CoV-2 39 . Our modeling revealed a 3D globule conformation of the RNA genome ( Fig. 2c and Supplementary Video 1). Notably, the threadlike genome was unentangled, appearing to have a mildly helical conformation. Moreover, different segments of the viral genome seemed to occupy separate territories (see different colors, Fig. 2c), forming an organization apparently similar to known structures of mammalian genomes 40 . Of note, this "globule" and knot-free con guration might re ect the arrangement of the N protein, which has been shown to bind the CoV genome and interacted with M protein via its C-terminal domain in the interior of the lipid membrane of virions 18 .

Secondary structure model of the SARS-CoV-2 genome
Based on the 3D RNA interaction map, we further developed an adaptive strategy to model the secondary structure of the entire SARS-CoV-2 genome. Brie y, we rst predicted local duplexes, which were positioned in our SARS-CoV-2 RNA interaction map as perpendicular signals to the diagonal, and then used these duplexes as constraints to predict long-range duplexes ( Supplementary Fig. 3a). To evaluate the performance of this strategy, we rst predicted the secondary structure of 28S rRNA using our previously published RIC-seq data in HeLa cells 30 . The structural model achieved a sensitivity of 83.0% and a positive predictive value (PPV) of 78.3%, and both criteria were signi cantly higher than structures merely based on the minimal free energy values provided by several computational tools ( Supplementary  Fig. 3b). Importantly, our algorithm showed higher PPV and sensitivity if RIC-seq detected duplexes spanning over 600 nt were counted ( Supplementary Fig. 3c). Together, these data demonstrate the accuracy of our adaptive strategy in deducing RNA structures.
Having validated the prediction algorithm, we next applied it to reconstruct the secondary structure of the whole SARS-CoV-2 genome (Fig. 3). Our structural model is highly favoured by the vRIC-seq data, as base-paired regions showed stronger interaction strength than size-matched unpaired control sequences (P < 2.2e-16, Supplementary Fig. 3d). In our structural model, the median and mean distances between two paired regions in SARS-CoV-2 were 28 and 111 nt, respectively. The maximal spanning distance (2,145 nt) was observed for a duplex formed between two fragments: 27,357-27,396 nt and 29,465-29,502 nt (Green dash line boxed region, Fig. 3). We found that about 63.8% (19,064 nt) of the SARS-CoV-2 genome were base-paired, and our model precisely recapitulated the stem-loop structures (SL1-SL7) in the 5' UTR, as well as 3' UTR structures including the s2m and the HVR (Fig. 3). Notably, our vRIC-seq data strongly supported an extended duplex SL8, rather than the two previously proposed separate duplexes which were theoretically predicted using the coding sequence of nsp1 (410-470 nt) 36 (see blue dash line marked region, Fig. 3). Besides identifying many duplexes, our structural model unexpectedly revealed 167 multi-way junctions, including 57 three-way junctions and three 12-way junctions; these junctions seemed to organize the SARS-CoV-2 RNA genome into many petaloid structures ( Fig. 3 and Supplementary Fig. 3e).
Next, we examined whether there are any correlations between structural elements and viral RNA abundance. During negative-strand RNA synthesis, SARS-CoV-2 produces nine subgenomic RNAs (sgRNAs) via the template-switching activity of RNA-dependent RNA polymerase (RdRP) between the 5' leader sequence and the transcription-regulatory sequence in the body (TRS-B) 6 . Interestingly, the TRS-B of ORF7a was located in a long-range duplex spanning over 2,000 nt, while the other eight TRS-B were located in local stem-loops (cyan boxed regions, Fig. 3 and Supplementary Fig. 3f). Moreover, we found that the number of single-stranded nucleotides within TRS-B was negatively correlated to the abundance of the corresponding sgRNA (R = -0.52, Supplementary Fig. 3g). By contrast, the number of singlestranded nucleotides in regions adjacent to the TRS-B showed positive correlations with sgRNA levels ( Supplementary Fig. 3h). It will be interesting in the future to examine how these structural features determine viral RNA transcriptions.
The in-cell and in-virion structural dynamics SARS-CoV-2 can hijack host cells by engaging its spike proteins with the host receptor angiotensinconverting enzyme 2 (ACE2) 7 . After entering into the host cell, the SARS-CoV-2 genome is released from the densely coated N proteins for translating nonstructural proteins to replicate its genome, as well as to produce structural proteins to assemble new virions (Fig. 4a). Whether there is any genome structure remodeling before and/or after the infection is still unclear. By analyzing the in vivo SHAPE-MaP data 24 , we found that the single-stranded regions revealed by vRIC-seq tend to have stronger SHAPE signals than base-paired regions (P < 2.2e-16, Fig. 4b). Unexpectedly, 76% of the duplexes were observed in both the host cell and the virion, indicating that the SARS-CoV-2 genome structure was largely unchanged before and after infection (Fig. 4c). Additionally, the duplexes spanning longer distances tended to be changed in cells (Fig. 4d).
Moreover, the duplexes in the 5' UTR regions barely changed during the life cycle of SARS-CoV-2 (Fig. 4e). One of the most dynamic regions was 12.6 kb -13.6 kb, which covered the frame-shifting element (FSE) and resided at the boundary of ORF1a and ORF1b. The in-cell structure reported a pseudoknot conformation, but the in-virion structure suggested that the Stem 1 of pseudoknot (PK) in FSE preferably formed an alternative duplex with the upstream fragment ( Fig. 4f and Supplementary Fig. 4a-c).
Consistent with our observation, another study indirectly predicted this alternative duplex conformation in infected cells by using DMS-MaPseq revealed single-stranded information 23 . Although both conformations might be dynamically present in virions, we indeed observed 2.9-fold more vRIC-seq signals to support the alternative duplex than the PK (79.95 vs. 27.91, Supplementary Fig. 4b,c). It will be interesting to explore the functionality of these structural changes in the future. We also examined another well-known PK in the 3' UTR of betacoronavirus 13 . Our vRIC-seq data barely support the presence of the 3' UTR PK (Supplementary Fig. 4d-f), and both the in-virion and in-cell structures predict an extended BSL rather than a pseudoknot (Fig. 3, Fig. 4g, and Supplementary Fig. 4d-f). Together, these results demonstrate the accuracy of our secondary structure model of the SARS-CoV-2 genome in virions and highlight the potentially impactful structural dynamics during infection.
Co-variant in duplexed regions of SARS-CoV-2 RNA duplexes usually are under the selection pressure to maintain the viral genome's conformation to facilitate its replication 17 . To uncover duplexes under purifying selection, we rst conducted an analysis to examine the co-variant base pairs in 429 non-redundant coronavirus genomes. This comparison revealed eight co-variant base pairs for SL5-6 in the 5' UTR and 15 co-variant base pairs for the HVR in the 3' UTR ( Supplementary Fig. 5a,b and Supplementary Table 2). Importantly, we detected 13 duplexes that spanned more than 600 nt and showed focused co-variations among multiple strains, highlighting the potential functionality of these long-range duplexes revealed by vRIC-seq ( Supplementary Fig. 5c-e).
Next, we investigated the co-variant base pairs among 50,300 SARS-CoV-2 strains. This analysis identi ed 74 co-variant events across the whole SARS-CoV-2 genome (see red arc lines, Fig. 5a and Supplementary Table 3), of which six co-variants were located in SL1 and four in the s2m (see red shadowed regions in 5' UTR and 3' UTR, respectively. Fig. 5b,c). Moreover, we found ten co-variant events in a three-way junction located in ORF7a (Supplementary Fig. 5f), indicating that this junction might be important, and maintaining such a junction structure seemed critical to viral infection. Future investigations that experimentally determine the function of each three sequences should help elucidate the practical impact(s) of this structurally fascinating junction.
D614G and accompanying mutations on structure remodeling SARS-CoV-2 genome is frequently mutated during evolution due to lacking proofreading activity of RdRP. Some mutations are bene cial to SARS-CoV-2 and emerge as dominant strains in the global pandemic, such as D614G and three accompanying mutations 41 . We found that these four mutations were located in local duplexed regions, and no interactions could be observed between them (Fig. 6a). The most prevalent D614G mutant caused by an A-to-G nucleotide transition at position 23,403, was located in the single-nucleotide bulge of a stem-loop (Fig. 6b). Interestingly, we found that the A23403G mutation netuned the two local bulge structures into a thermodynamically more favorable six-nucleotide bulge structure (-10.3 kcal/mol vs. -13.7 kcal/mol) (Fig. 6b). Agreeing well with our model, the three uracil (U) in the bulge also showed strong SHAPE-MaP signals in the SARS-CoV-2 infected cells 24 .
Additionally, the D614G accompanying mutations at 241 nt, 3,037 nt, and 14,408 nt all reside in the single-stranded internal or apical loops (Fig. 6c-e). These loops could also be partially supported by the SHAPE-MaP signals revealed in the host cells 24 . However, the C14408U mutation in RdRP (P323L) might introduce a novel structure with a smaller apical loop (14,380-14,441 nt) (Fig. 6e). It might be worthwhile to study further how these mutations are introduced during infection and how they enable viral tness advantage. In addition, we systematically analyzed all the observed mutations in different SARS-CoV-2 strains stains. We found that the base-paired nucleotides are less mutated and have slightly lower entropy scores ( Supplementary Fig. 6), further highlighting the functional importance of maintaining duplexes during evolution.
Structure-guided cleavage of SARS-CoV-2 RNA RNA cleavage mediated by small interfering RNA (siRNA), antisense oligonucleotides, and RNA-targeting CRISPR, has emerged as a therapeutic modality to restrain viral gene expression and inhibit viruses in host cells 42 . However, it bears emphasis that target RNA cleavage e ciency is strongly affected by duplex structures 8,9 . In order to screen for cleavage-vulnerable regions of the SARS-CoV-2 RNA genome to develop anti-viral drugs, we systematically selected 130 single-stranded regions that i) have at least 10 consecutive nucleotides and ii) contain strong conservation among the different SARS-CoV-2 strains we examined. Next, we synthesized six siRNAs speci cally targeting single-stranded regions and three siRNAs targeting duplex regions to cleave the SARS-CoV-2 RNA genome, and conducted infectivity assays using Vero cells (Fig. 7a and Supplementary Table 4). qPCR analysis of cells at 24 hours post viral exposure showed that, compared to the non-targeting siRNA control (denoted as NC), four out of the six tested siRNAs targeting single-stranded regions could completely prevent SARS-CoV-2 ampli cation in Vero cells (see si-2, si-3, si-4, and si-5; Fig. 7b); accordingly, we observed no viral infection on cells treated with these four siRNAs (Fig. 7c). Agreeing well with the qPCR results, we found that the Vero cells treated with these four siRNAs looked healthier than those with the non-targeting siRNA control (Supplementary Fig. 7a). In sharp contrast, none of the siRNAs targeting duplexes could completely prevent infection or viral ampli cation (Fig. 7b,c). As each of the four effective siRNAs target sequences common to more than 90% of the 50,300 known SARS-CoV-2 strains, combining two or more of these siRNAs is expected to result in cleavage for nearly all known SARS-CoV-2 strains.
Importantly, we found that the structure of siRNA target regions revealed in virion is largely consistent with the structures uncovered in host cells (Fig. 7d). Furthermore, the siRNA targeting region designed by vRIC-seq showed more single-stranded base numbers than those from targeting regions selected by in silico methods (Supplementary Fig. 7b). This data highlight the SARS-CoV-2 RNA structure's values in guiding the design of potent siRNAs. Together, these results validated the accuracy of vRIC-seq deduced structures and showcased its practical utility for supporting drug development to combat SARS-CoV-2 during the ongoing COVID-19 pandemic.

Discussion
The emerging deadly RNA viruses have caused severe global epidemics and pandemics. RNA viruses usually form intricate and dynamic structures to orchestrate their translation, replication, and packaging 13,43 . Seeking to understand the in situ structure of viral RNA genome in virions, we developed vRIC-seq technology to map proximal contacts between different RNA fragments of the SARS-CoV-2. This global proximity information enabled us to construct a high-con dence RNA interaction map and build a 3D structure model of the SARS-CoV-2 genome, which is organized into an unentangled, globule, and seemingly spiral overall architecture. This knot-free globule conformation may allow for rapid unfolding after SARS-CoV-2 enters into the host cells and support maximally dense packaging of its RNA genome inside the viral particles. We also uncovered many short-and long-range duplexes, pseudoknots, and multiway junctions in SARS-CoV-2 RNA. Unexpectedly, these long-range duplexes could further isolate the SARS-CoV-2 genome into RNA topological domains. However, they are generally disrupted in the infected cells, indicating an active unwinding process that might be regulated by some RNA helicases.
We also developed an adaptive algorithm to reconstruct a complete SARS-CoV-2 secondary structure model by integrating the 3D pairwise interactions of vRIC-seq data. This approach is signi cantly different from other widely used strategies that indirectly predict viral RNA duplex structures based on the nucleotide exibility information 14,15,17,44,45 . Considering the duplex length restraint during prediction, the previously modelled structures of SARS-CoV-2 in the host cells might be incomplete because many functionally relevant long-range duplexes spanning over 600 nt were omitted [21][22][23][24] .
The formation and maintenance of RNA duplexes are known to strongly in uence viral tness 46 . Thus, understanding single-and double-stranded regions in the viral genome is informative for the e cient development of RNA-targeted drugs to ght SARS-CoV-2 infections. The structural model we proposed here lays the foundation for developing and deploying highly potent siRNAs, single guide RNAs, and antisense oligos. Moreover, our model provides novel insights on how some pathogenic SARS-CoV-2 variants, such as D614G, enable viral tness advantage at the structure level. However, it should be noted that the current structural model did not include binding information for the nucleocapsid (N) protein, which densely packages the SARS-CoV-2 genome into the ribonucleoprotein core in the interior of the virion 47 . Therefore, identi cation of the N protein's binding sites along the genome may further narrow down the targeting regions to develop agents for e cient RNA cleavage.
In summary, we developed a high-throughput approach for probing the in situ genome structure of any RNA viruses theoretically. We further used SARS-CoV-2 as a model to illustrate the power of vRIC-seq in delineating its general architecture and sophisticated long-range RNA-RNA interactions, such as duplexes and multi-way junctions. The vRIC-seq approach is also easily adaptable to probe viral RNA structurome and interactome in host cells in the future. More importantly, we could also apply the vRIC-seq technology in pro ling RNA spatial interactions with animal cells and plant cells. The inclusion of the ConA beads capture step may signi cantly reduce the cell numbers used in our original RIC-seq protocol, thus enable RNA spatial interaction mapping started with a small number of cells. were added to the tube and incubated for 10 min at room temperature (RT). After washing three times with binding buffer and once with PBST buffer (1 × PBS, 0.1% Tween 20), the beads were resuspended in 1 ml of PBST buffer. To x nucleocapsid (N) protein-mediated RNA-RNA interactions, 27 μl of 37% formaldehyde was applied and incubated for 10 min at RT with rotation at 20 rpm. The crosslinking was stopped by adding glycine to a nal concentration of 0.125 M and incubated at RT for 5 min.

RNA puri cation and library construction
At the next day, the ligation was stopped by washing three times with 1× PNK buffer. To extract viral RNAs, the beads were resuspended and incubated with 200 μl of proteinase K buffer (10 mM Tris-HCl pH 7.5, 10 mM EDTA, 0.5% SDS) and 50 μl of proteinase K (Takara, 9034) at 37 °C for 60 min and 56 °C for transferred to a new tube. The viral RNA was extracted from the supernatant with 750 μl of TRIzol LS (Thermo Fisher, 10296028) and 220 μl of chloroform according to the manufacturer's instruction. After centrifugation at 13,000 rpm for 15 min at 4 °C, the supernatant was transferred to a 1.5 ml Eppendorf tube and mixed with 500 μl of isopropanol and 1 μl of glycoblue (15 mg/ml, Thermo Fisher, AM9515).
The RNA was precipitated overnight at -20 °C and pelleted at 13,000 rpm for 20 min at 4 °C. After washing twice with 75% ethanol, the RNA pellet was resuspended in 15 μl of nuclease-free water. The subsequent steps of vRIC-seq were performed exactly as we previously described 30 Supplementary Table 4. To quantify viral RNA levels inside the cell, we rst extracted total RNAs from the infected Vero cells using TRIzol Reagent (Invitrogen, 15596026). For RT-qPCR, 1 μg of total RNA was treated with RQ1 RNase-free DNase (Promega, M6101) and converted into cDNA using MMLV reverse transcriptase (Promega, M1701) with oligo dT (20) primer. qPCR was performed with Hieff qPCR SYBR Green Master Mix (YEASEN, 11203ES08) on a StepOnePlus real-time PCR machine (Applied Biosystems). The primers targeting the RdRp gene of SARS-CoV-2 were used for qPCR.

Processing of vRIC-seq data
After removing adapters, PCR duplicates, and low-quality sequences, we rst aligned the unique vRIC-seq reads to the human 45S pre-rRNA (Refseq accession number NR_046235.3). For all the unmapped reads, we further mapped them to a pan-genome consisting of the Chlorocebus sabaeus reference genome (genome assembly version: chlSab2) and the SARS-CoV-2 reference genome (Refseq accession number NC_045512.2) using a STAR software (v020201) 50 . Chimeric reads were identi ed using the previously described RIC-seq analysis pipeline 30 . Chimeric reads coverage along the genome was visualized by the Circos suite (v0.   51 .
The RNA interaction map of SARS-CoV-2 The chimeric reads mapped to the SARS-CoV-2 reference genome were collected and used for generating an RNA interaction map. Of note, we removed spliced reads containing gaps resulted from discontinuous transcriptions 6 . As described previously 30 , we identi ed the pairwise junction sites in the chimeric reads and used them for building a two-dimensional RNA interaction matrix. This matrix could be converted into .hic format and visualized by the Juicebox tool (v1.11.08) 52 . The Knight-Ruiz algorithm 53 was used to balance this interaction matrix to eliminate sequencing bias along the virus genome.
Predict local structures of SARS-CoV-2 for validating vRIC-seq data We used the extended 5' UTR (1-480 nt) and 3' UTR (29,546-29,870 nt) sequences to generate candidate secondary structures by the Fold program from RNAstructure software suite (v6.2) 54 . As a general approach, the maximum distance between two paired nucleotides was allowed within 250 nt. The structure that agreed well with the maximized RIC-seq signals between pairwise interacting RNA fragments was selected and visualized by the StructureEditor program (v6.0). The RNAComposer software was applied to deduce the three-way junction's 3D structure 55 , and the con guration agreed with vRIC-seq data was chosen.

RNA topological domain in SARS-CoV-2 genome
The SARS-CoV-2 genome was separated into isolated topological domains using our previously published iterative algorithm with minor revisions 30 . Brie y, we iteratively chose the optimal boundary that minimized the inter-domain's vRIC-seq density as a new candidate domain boundary. To avoid tiny domains resulting from over division, we stopped the iteration once more than 35% of the pairwise 10-nt windows (connection score greater than 0.01) were classi ed as inter-domain. Lastly, adjacent domains were merged if both did not contain interactions between their 5' and 3' boundary.

MFE analysis of the duplexes revealed by vRIC-seq
To explore whether the pixels with a strong vRIC-seq signal in the RNA interaction map could form longrange duplexes, we rst divided the SARS-CoV-2 genome into non-overlapping 10-nt windows. The pairwise 10-nt windows with a connection score greater than 0.01 were used for downstream analyses 30 .
Next, we clustered pairwise 10-nt windows adjoining or overlapping at both ends as one interaction. The lowest hybrid free energy was then computed for the possible hybrids formed between these pairwise RNA stretches using the bifold function in the RNAstructure suite (v6.2) with default parameters 54 . Lastly, arti cial sequences with the same nucleotide content as real interactions were generated ten times. The lowest hybrid free energy for those shu ed sequences was also calculated.
3D structural simulation of SARS-CoV-2 genome Based on the RNA interaction map, we used the miniMDS program to model the spatial conformation of the SARS-CoV-2 genome using the following parameters 39 : minimds.py -l 10 -m 0.01 -p 0.01. The spatial coordinates reported by miniMDS were smoothed with the LOWESS (locally weighted scatterplot smoothing) algorithm and then visualized 56 .
SARS-CoV-2 RNA secondary structure modeling The secondary structure of SARS-CoV-2 genomic RNA was constructed in silico based solely on the vRICseq data by an adaptively optimized algorithm we developed in this study. We rst split the SARS-CoV-2 genome into shorter segmental domains by maximizing the ratio between intra-domain and interdomain's vRIC-seq signals. Notably, the domains smaller than 4 kb will not be further split to avoid the potential loss of long-range duplexes over the domains' boundaries. Like a previously described approach 15 , we determined the secondary structure for each domain independently. To this end, we systematically screened pairwise 5-nt windows with connection scores higher than 0.03, and the windows adjoining or overlapping at both ends were further clustered as high-con dence interactions. For each interaction spanned region within a domain, we used the Fold program in the RNAstructure software suite (v6.2) to perform structure prediction 54 . The maximum distance between any two paired positions was allowed within 2500 nt. From the structural candidates reported by the Fold program, we selected the one that matched best with vRIC-seq data and forced it as a constraint in the subsequent prediction. Of note, we generated duplexes for short local interactions rst and then used them as restrains to perform prediction for long-range interactions spanned regions. Moreover, interactions having stronger vRIC-seq signals were processed with priority. Finally, by restraining duplexes generated in the former stage, we folded each domain's entire sequence, including regions not covered by the high-con dence interactions. The structure agreed best with vRIC-seq signals were selected. The nal secondary structure model of the viral genome RNA in SARS-CoV-2 was visualized by the VARNA program (v3-93) 57 and the Integrative Genomics Viewer (IGV) visualization tool (v.2.3.92) 58 .
Evaluate the accuracy of the algorithm To evaluate the adaptive algorithm's accuracy, we rst used our previously published rRNA+ RIC-seq data in HeLa cells to predict the secondary structure of 28S rRNA 30 . We also used four widely used computational algorithms to predict the secondary structure based on the minimum free energy, including RNAstructure (v6.2) 54 , Mfold (v3.6) 59 , RNAfold 60 , and LinearFold 61 . All parameters were set as default except for the maximum pairing distance within 1600 nt was allowed.
We used two criteria to evaluate a predicted secondary structure's accuracy: sensitivity and PPV (positive predictive value). Sensitivity was de ned as the fraction of base pairs that were correctly predicted. PPV was the fraction of base pairs in the predicted structure that were correct. We used a relaxed structure comparison mode. A base pair i/j was considered correctly predicted if any of the following pairs exist in