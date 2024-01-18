Development of KARR-seq

Effective capture of spatially proximal RNAs has been challenging primarily owing to modest RNA crosslinking efficiency by limited available RNA crosslinkers16. Click chemistry reactions happen fast and quantitatively in cells under ambient conditions, enabling the detection of binding landscapes of many biomacromolecules17. However, these reactions have not been applied in RNA proximity studies due to a lack of ‘clickable’ functional groups on cellular RNA. In KARR-seq, we applied N 3 -kethoxal, a cell-permeable and nucleus-permeable small molecule that efficiently functionalizes RNA with an azide tag14,18. We also decorated commercially available dendrimers with multiple DBCO and biotin moieties (Supplementary Fig. 1a,b), with DBCO reacting with proximal N 3 -kethoxal-modified RNA via ‘click’ reactions and biotin enabling enrichment of the crosslinked products.

We first labeled fixed and permeabilized murine embryonic stem cells (mESCs) with N 3 -kethoxal and then diffuse modified dendrimer G3 at 37 °C to initiate the ‘click’ reaction (Fig. 1a). Gel electrophoresis (Supplementary Fig. 1c) and dot blot (Supplementary Fig. 1d) of the purified RNA confirmed successful RNA crosslinking. Control experiments performed in the absence of N 3 -kethoxal or dendrimer showed very weak or invisible signals in dot blot (Supplementary Fig. 1d). RNA from crosslinked cells was then fragmented and applied for pull-down using streptavidin-coated beads. On-bead end repair and proximity ligation were subsequently performed, and the post-ligation RNA was amplified for pair-ended sequencing for roughly 100 million reads per sample (Fig. 1a).

Fig. 1: KARR-seq maps higher-order RNA structures. a, KARR-seq workflow. Cells are first treated with N 3 -kethoxal to modify RNAs with azide tags (red), which enables crosslinking of the tagged RNA molecules by DBCO-decorated dendrimers (blue). Biotin modifications (pink) on dendrimers facilitate the enrichment of crosslinking products, followed by proximity ligation, RNA library construction and sequencing. Chimeric sequencing reads are aligned to identify RNA–RNA interactions. b, Physical distances between interacting fragments of TERC in K562 cells, measured by KARR-seq data generated using G1 and G7 dendrimers, respectively. The physical distances were measured using the cryo-EM structure of TERC. The actual physical distance distribution in the cryo-EM structure is shown in blue for comparison. c, Illustration of loop and stripe structures detected by KARR-seq. In arc groups, loops, left stripes and right stripes are denoted in blue, yellow and pink, respectively. Corresponding KARR-seq chimeric reads are displayed below. d, The KARR-seq interaction maps and arc groups for the Eef1g (EEF1G) transcript in mESCs (left) and HepG2 cells (right). e, The simulated physical distance map of the human EEF1G transcript. For b–d, KARR-seq was performed in two biological replicates. Full size image

De-duplicated KARR-seq chimeric reads constituted around 6% of all reads and recapitulate ribosomal RNA (rRNA) higher-order structures (Supplementary Fig. 1e). In negative controls performed in the absence of N 3 -kethoxal, chimeric reads constitute only 0.26% of sequencing reads. RNA contact maps generated from individual KARR-seq replicates show high correlations (Supplementary Fig. 1f). Because N 3 -kethoxal specifically reacts with guanines14, we evaluated the effect of nucleotide content on the frequency of KARR-seq chimeric reads. Using data produced by G1 dendrimers, we analyzed transcripts with more than 250 chimeric reads and grouped all base positions according to their chimeric reads coverage. We found that guanine is modestly enriched among bases with high chimeric reads coverage (Supplementary Fig. 1g). We then performed KARR-seq with a 1:1 mixture of human (K562) and Drosophila (S2) cells. In this case, chimeric reads constitute 8.4% of total reads when mapped to the reference genomes. Interspecies chimeric reads account for 7.2% of chimeric reads (Supplementary Fig. 2a) and 0.61% of all sequencing reads, with no significant interaction detected between human and Drosophila RNAs (Supplementary Fig. 2b). The percentage of interspecies chimeric reads for KARR-seq (0.61%) is similar to that in RIC-seq (0.6%)12. Note that, in RIC-seq, the proximity ligation reaction was performed in fixed cells instead of free solutions, and, therefore, the interspecies ligation frequency is expected to be low.

We next analyzed KARR-seq data produced by dendrimers with different diameters, namely G1 (22 Å), G3 (36 Å), G5 (54 Å) and G7 (81 Å), in mESCs and K562 cells. We projected KARR-seq chimeric read positions onto the 3D cryogenic electron microscopy (cryo-EM) structures of TERC and U1 and calculated the spatial distances between the two RNA fragments from each chimeric read. For both transcripts, G7 captures a larger median distance than G1 does (Fig. 1b and Supplementary Fig. 3a). Dendrimers with similar sizes detect similar transcriptome-wide RNA–RNA interaction landscapes (Supplementary Fig. 3b,c). The choice of dendrimers did not affect ligation efficiency (Supplementary Fig. 3d), but G1 and G3 captured twice the amount of transcripts with valid interactions as G7 did (Supplementary Fig. 3e), potentially because G1 and G3 are more accessible to compact ribonucleoprotein (RNP) complexes19. We, therefore, used G1 for KARR-seq experiments unless otherwise noted.

KARR-seq maps higher-order RNA structures

KARR-seq chimeric reads reveal both intra-molecular and inter-molecular RNA–RNA interactions (Supplementary Fig. 3c). We first evaluated the behavior of KARR-seq in depicting higher-order RNA structures in HepG2 cells, K562 cells and mESCs. We plotted the KARR-seq interaction map of human 18S rRNA from K562 cells and found that KARR-seq contact frequency recapitulates main features of the 18S rRNA physical distances map revealed by cryo-EM20 (Supplementary Fig. 4a). The distribution of physical distance revealed by KARR-seq overlaps decently with the actual distribution revealed by cryo-EM (Supplementary Fig. 4b). In comparison, RIC-seq and PARIS enrich interactions within short physical distances (Supplementary Fig. 4b), suggesting that KARR-seq may capture RNA proximity in a broader spatial distance range.

Compared to rRNA, mapping mRNA tertiary structures is particularly challenging owing to their dynamic nature and the relatively low abundance of mRNAs. KARR-seq detects mRNA loops and stripes, where loops stand for relatively stable duplex interactions and stripes represent more dynamic contacts or RNA–RNA proximity without direct pairing (Fig. 1c and Methods). We simulated RNA tertiary structures based on the freely jointed chain (FJC) model. Benchmarking with the cryo-EM structure using RPPH1 resulted in a Pearson’s correlation coefficient of 0.696 between the actual and simulated physical distance maps (Supplementary Fig. 4c), indicating decent accuracy of the simulation. KARR-seq interaction frequency maps of mRNAs match the corresponding simulated physical distance maps (Fig. 1d,e), demonstrating the capability of KARR-seq in mRNA tertiary structure depiction. Interaction maps for homologous transcripts in HepG2 cells and mESCs reveal similar topologies (Fig. 1d), suggesting conserved RNA tertiary structure in different species.

Benchmarking KARR-seq with RIC-seq and PARIS

We systematically compared KARR-seq data (HEK293T, K562 and HepG2 cells) and results from PARIS (HEK293T cells)4 and RIC-seq (HeLa cells)12; PARIS and RIC-seq represent the state-of-the-art methods in mapping RNA duplexes and protein-mediated RNA proximity, respectively. KARR-seq exhibits a stronger correlation with RIC-seq than with PARIS (Fig. 2a), because both KARR-seq and RIC-seq detect spatially proximal RNA, whereas psoralen primarily reacts with duplexes. We next analyzed the minimal free energy (MFE) of the intramolecular RNA–RNA interactions detected by KARR-seq, PARIS and RIC-seq on TERC and U1 and divided all interactions into three categories depending on whether they match with cryo-EM structures. Interactions detected by PARIS mostly correspond to known secondary structures and tend to have low mean MFE, whereas interactions detected by KARR-seq and RIC-seq include more spatially proximal non-duplex contacts (denoted as tertiary; Fig. 2b) and interactions that are not revealed by cryo-EM (denoted as novel; Fig. 2b). All three methods share a similar ratio of chimeric reads (5–7%) when data were mapped to the human genome (Supplementary Fig. 5a). However, the percentage of chimeric reads dropped to only around 1% when RIC-seq data were mapped to the transcriptome (Supplementary Fig. 5a), indicating that RIC-seq chimeric reads enrich pre-mRNA. Transcripts detected by RIC-seq exhibit lower expression levels (Supplementary Fig. 5b) and are longer (Supplementary Fig. 5c) than those detected by KARR-seq and PARIS. Concordantly, KARR-seq and PARIS share a similar RNA–RNA interaction landscape between different RNA categories, whereas the majority (56%) of interactions identified by RIC-seq are intron mediated (Fig. 2d), further suggesting that RIC-seq enriches interactions in the cell nucleus.

Fig. 2: Benchmarking KARR-seq, RIC-seq and PARIS. a, Average Pearson correlation between the interaction maps of KARR-seq (K562 and HEK293T cells), RIC-seq (HeLa cells) and PARIS (HEK293T cells). b, MFE for RNA–RNA interactions detected using KARR-seq, PARIS and RIC-seq within TERC and U1 transcripts, respectively. Interactions were grouped into ‘secondary’, ‘tertiary’ and ‘novel’. ‘Secondary’ refers to the interactions that match secondary structure prediction. ‘Tertiary’ refers to spatially proximal RNA regions revealed by the cryo-EM structure that do not correspond to secondary structures. ‘Novel’ refers to interactions that are not supported by secondary structures or cryo-EM structures. c, Circos plots showing the RNA–RNA interaction landscape revealed by KARR-seq, PARIS and RIC-seq. The width of the link between two RNA categories indicates the relative abundance of chimeric reads taken by interactions between these two categories. d, Left, the physical distance map of TERC revealed by the cryo-EM structure of TERC. Right, higher-order structures of TERC detected by KARR-seq, PARIS and RIC-seq under the same sequencing depth. The blue dots denote base-pairing secondary structures acquired from the Rfam annotations (RF00024). e, The ROC–AUC curves for KARR-seq, RIC-seq and PARIS for detecting higher-order structures of TERC, 18S, 28S and U3. The dashed lines denote random classifiers. RIC-seq and PARIS data were acquired from the Gene Expression Omnibus (RIC-seq: GSE127188; PARIS: GSE74353). Cryo-EM structures were acquired from the Protein Data Bank (accession codes: 7QXB for TERC, 6QX9 for U3 and 4V6X for 18S and 28S). KARR-seq was performed in two biological replicates. Full size image

We then performed differential analysis of RNA–RNA interactions detected by KARR-seq and RIC-seq. Interactions uniquely detected by KARR-seq are mostly stripes, whereas RIC-seq-specific interactions are mostly loops (Supplementary Fig. 5d), suggesting that KARR-seq could detect more transient and dynamic contacts. Around half of KARR-seq-specific intramolecular interactions on mRNAs are located at coding sequences (CDS), whereas 84% of RIC-seq-specific intramolecular interactions on mRNAs are at the 3′ untranslated region (UTR) (Supplementary Fig. 5e), which could be due to the binding preference of specific RBPs21.

We next benchmarked KARR-seq, PARIS and RIC-seq using transcripts with published cryo-EM structures, including 18S, 28S, U3 and TERC. KARR-seq detects pervasive intramolecular RNA–RNA interactions and recapitulates the physical distance maps (Fig. 2d and Supplementary Fig. 6a). Quantitatively, receiver operating characteristic (ROC) analysis revealed higher or similar area under the curve (AUC) numbers of KARR-seq than the other two methods (Fig. 2e). Note that KARR-seq, PARIS and RIC-seq datasets were generated from different cell lines and were sequenced to different depths, which could complicate direct comparisons.

Because psoralen crosslinks double-stranded RNA, PARIS data show segment-like patterns that represent RNA duplexes on the contact maps. In comparison, kethoxal predominantly reacts with single-stranded regions. Therefore, KARR-seq data present triangular domain-like structures (Fig. 2d and Supplementary Fig. 6b,c). Notably, ‘KARR-seq domains’ cover similar nucleotide regions as ‘PARIS segments’ do (Fig. 2d and Supplementary Fig. 6b,c), suggesting that PARIS and KARR-seq reveal different possible RNA conformations at the same RNA region to complement each other.

Distinct mRNA higher-order structures in the cell nucleus

RNA secondary structures are dynamic when RNAs transit from the nucleus to cytoplasm22. However, higher-order RNA structures among different cellular compartments have not been characterized. We purified K562 nuclei (Supplementary Fig. 7a) for KARR-seq, with dot blot showing a comparable RNA-labeling efficiency in purified nuclei and intact K562 cells (Supplementary Fig. 7b). KARR-seq data from the nuclear fraction differ evidently from that using intact cells (Supplementary Fig. 7c). Transcripts with valid higher-order structures detected in the nucleus are longer and show lower expression levels than those detected in intact cells (Supplementary Fig. 7d,e). Notably, transcripts detected by KARR-seq using cell nuclei and transcripts detected by RIC-seq using intact cells display similar length and expression level (Supplementary Fig. 7d,e). These results suggest distinct higher-order RNA structure landscapes between the nucleus and cytoplasm and corroborate that RIC-seq enriches nuclear RNA–RNA interactions.

We further investigated the higher-order structure differences between nuclear and cytoplasmic RNA in K562 cells. We also performed KARR-seq in vitro using purified and refolded K562 total RNA, which reveals the intrinsic ability of RNA polynucleotide chain to fold in the absence of cellular factors. RNA contact frequency decreases log-linearly to the coordinate distance in all tested conditions. The slope of the trend lines, defined as beta coefficients (β), is −2.06 and −1.63 for mRNA in intact cells and the nuclei, respectively, with more long-range contacts detected in the nuclear fraction (Supplementary Fig. 8a). We calculated the beta coefficient for each individual transcript. RNAs within the nuclei tend to have higher beta coefficient compared to same transcripts from intact cells, shown as a transcriptome-wide distribution (Supplementary Fig. 8b) or scrutinized at the level of individual transcripts (Supplementary Fig. 8c–e).

In the KARR-seq protocol, cells are pre-fixed using 1% formaldehyde before N 3 -kethoxal treatment. We assayed the effect of formaldehyde crosslinking by performing KARR-seq using weakly fixed (0.1% formaldehyde) and unfixed K562 cells. Under 1% and 0.1% formaldehyde conditions, RNA–RNA interactions detected by KARR-seq are largely similar both at the transcript level (Supplementary Fig. 9a,b) and transcriptome wide (Supplementary Fig. 9c,d). However, interaction maps generated from the unfixed cells resemble those for refolded RNA (Supplementary Fig. 9a,b). Quantitatively, we observed higher beta coefficients across the transcriptome along with more long-range interactions using unfixed cells (Supplementary Fig. 9c,d). We speculate that weakly bound RBPs and ribosomes could fall off from RNA during the labeling steps in the absence of formaldehyde, which could lead to partial RNA refolding. Formaldehyde crosslinking is likely necessary to capture bona fide cellular RNA conformations using KARR-seq.

To quantify the extent of RNA folding across transcripts with varying lengths and abundance, we devised the folding index, an exponential decay transformation of the genomic distance between the two arms of each chimeric read23 (Methods). The folding index describes the relative genomic distance of two interacting RNA fragments, with a high folding index reflecting RNA–RNA interaction and/or proximity detected between two distant RNA fragments, or an RNA is more extensively folded (Supplementary Fig. 10a,b). To calculate the folding index of a given transcript (or region), we computed the mean value of folding indexes for all chimeric reads mapped to this transcript (or region) as the transcript-level (or region-level) folding index. For transcriptome-wide comparisons, we plotted the distribution of folding index for all chimeric reads or the distribution of all transcript-level folding indexes. Sequencing depth differences and changes in the abundance of certain transcripts show minimal effects on the comparison of folding index between different conditions (Supplementary Fig. 10c–e). The median folding index for nuclear mRNA is 0.51, which is largely higher than that for the total cellular mRNA (median: 0.37; Supplementary Fig. 10f), confirming more extensive folding of nuclear RNA.

Certain RBPs are associated with RNA–RNA interactions

RNA secondary structures have been demonstrated to drive RBP binding24, but the relationship between higher-order RNA structures and RBPs has yet to be fully explored. We first examined the association between RBP and RNA–RNA interactions by measuring RBP density at interaction regions using large-scale eCLIP data from ENCODE25. A larger number of RBP eCLIP peaks was observed on RNA–RNA interaction regions (identified using refolded RNA) than on shuffled regions in HepG2 and K562 cells (Supplementary Fig. 11a,b). We next quantified the association between individual RBP and RNA–RNA interactions by applying a multiple linear regression to correlate eCLIP reads density of each RBP to region-level folding index values throughout the K562 transcriptome. A small set of RBPs, including LIN28B, SRSF1, FXR1, FXR2, FMR1, SND1, METAP2, BUD13, ZNF622, UPF1 and YBX3, was identified to be positively correlated with RNA–RNA interactions (Supplementary Fig. 11c). The low correlation coefficients suggest a weak association, indicating that each RBP binds only to a small portion of RNA–RNA interaction regions.

To validate this observation, we performed KARR-seq in K562 cells after YBX3 or SRSF1 knockdown (Supplementary Fig. 11d,e). Knockdown of YBX3 or SRSF1 did not lead to obvious higher-order structure variations on their target RNAs (Supplementary Fig. 11f,g) nor transcriptome-wide changes of the folding index (Supplementary Fig. 11h). Therefore, each RBP seems to be associated with or regulate only a small number of RNA–RNA interactions, potentially due to their binding preferences to specific sequence and secondary structure motifs.

Translation suppresses mRNA long-range interactions

We next reasoned that ribosome translocation during translation could remodel higher-order RNA structures by resolving RNA duplexes. To test this hypothesis, we calculated the folding index difference between in vitro and in vivo conditions for each transcript. We observed a positive correlation between the folding index difference and ribosome occupancy density, suggesting that higher translation efficiency could lead to a larger difference in RNA folding between in vitro and cellular conditions (Fig. 3a). We then performed KARR-seq after treating HepG2 cells with translation inhibitors, harringtonine and cycloheximide. KARR-seq arc groups and interaction maps showed more intramolecular contacts upon translation inhibition (Fig. 3b and Supplementary Fig. 12a). Most harringtonine-induced and cycloheximide-induced interactions are located at CDS of mRNA (Fig. 3c), the region where ribosome translocation occurs during translation.

Fig. 3: Translation suppresses mRNA higher-order structures under native and stress conditions. a, The effect of ribosome binding on RNA–RNA interactions in HepG2 cells. The x axis denotes ribosome binding strength, and the y axis shows the folding index difference between in vitro and in vivo. b, KARR-seq arc groups for the NCL transcript in control and harringtonine-treated HepG2 cells. Folding index: 0.246 for control and 0.290 for harringtonine. c, Metagene plot showing the relative abundance of intermolecular mRNA interactions under denoted conditions. CHX, cycloheximide; HT, harringtonine. d, The transcriptome-wide distribution of beta coefficients under denoted conditions. *** indicates P < 0.001. e, Folding index for mRNA and lncRNA in control and harringtonine-treated HepG2 cells. f, The length of transcripts that exhibit upregulated and downregulated intramolecular interactions after arsenite treatment in K562 cells. g, The 5′ UTR, CDS and 3′ UTR length for mRNAs that exhibit upregulated and downregulated intramolecular interactions after arsenite treatment in K562 cells. h, The translation efficiency under the normal condition for mRNAs that exhibit upregulated and downregulated intramolecular interactions after arsenite treatment in K562 cells. In f–h, for the analysis of all transcripts, n = 104 transcripts for the downregulated group and n = 73 transcripts for the upregulated group. For the analysis of mRNAs, n = 102 transcripts for the downregulated group and n = 68 transcripts for the upregulated group. i, mRNA folding index in control, arsenite-treated and harringtonine-treated K562 cells and purified K562 nuclei. n refers to the number of chimeric read level folding index. n = 440,484 for whole cell control, n = 242,268 for whole cell arsenite, n = 251,601 for whole cell HT, n = 154,797 for nuclear control and n = 162,507 for nuclear arsenite. j, mRNA folding index for SG-localized transcripts and other (non-SG) transcripts in control and arsenite-treated K562 cells. n = 161 transcripts in the non-SG group and n = 215 transcripts in the SG group. For f–h, P values were calculated by the one-sided Mann–Whitney test. For e,i,j, P values were calculated by the two-sided Mann–Whitney test. In box plots shown in e–h, the lower and the upper bounds denote 25th and 75th percentiles, respectively. The minima denote the lower bound −1.5× IQR. The maxima denote the upper bound +1.5× IQR. KARR-seq was performed in two biological replicates. IQR, interquartile range. Full size image

Consistently, inhibitor treatments right-shifted the distribution of RNA beta coefficients (Fig. 3d) and increased the averaged mRNA folding index (Fig. 3e). Meanwhile, translation inhibition did not alter the folding index for long non-coding RNAs (lncRNAs) (Fig. 3e), in line with the absence of translation machinery acting on these transcripts. These results collectively suggest that translation could resolve mRNA intramolecular interactions, leading to more stretched mRNA conformations. To validate this effect, we employed fluorescence in situ hybridization (FISH) imaging to measure the spatial distance between mRNA 5′ ends and 3′ ends under control and translation inhibition conditions. The 5′–3′ distances were around 100 nm for all three tested transcripts (Supplementary Fig. 12c), consistent with findings from previous single-molecule studies26. Translation inhibition resulted in shorter 5′–3′ distances in all three cases (Supplementary Fig. 12c), confirming that translation could indeed contribute to less extensive mRNA folding.

mRNA was proposed to form a ‘closed-loop’ structure mediated by translation initiation and termination complexes27. However, recent studies suggested a revised model of mRNA 5′–3′ communication26,28,29. Using KARR-seq data, we detected modest amounts of chimeric reads between mRNA 5′ and 3′ ends (Supplementary Fig. 12d). Intriguingly, 5′–3′ interactions were not more prevalent than cis-mRNA interactions at other regions (Supplementary Fig. 12e), and translation inhibition increased the proportion of 5′–3′ chimeric reads among all chimeric reads (Supplementary Fig. 12d). Consistent with recent reports26,28, our data suggest that mRNA loops are not completely closed, and translation inhibition may facilitate closer proximity between mRNA ends. Because mRNA loops are proposed to be mediated by large protein complexes, it is also possible that the sizes of dendrimers are not sufficient to capture all these interactions.

Remodeling of higher-order RNA structures under arsenite stress

Extracellular stresses could remodel RNA homeostasis and induce stress granules (SGs)30; nonetheless, how stresses affect higher-order RNA structures and how RNA interactomes contribute to SG assembly remain to be investigated31. We conducted KARR-seq using K562 cells treated by sodium arsenite that induces oxidative stresses and SG formation. We performed differential RNA–RNA interaction analysis between normal and arsenite conditions (Supplementary Data 1). After arsenite treatment, transcripts bearing upregulated intramolecular interactions are much longer than those bearing fewer intramolecular interactions (Fig. 3f). mRNAs bearing upregulated intramolecular interactions after arsenite treatment possess longer 3′ UTR, CDS and 5′ UTR (Fig. 3g). These mRNAs also show a higher translation efficiency under the normal condition (Fig. 3h). Collectively, these results suggest the importance of RNA length and mRNA translation efficiency in remodeling higher-order RNA structurome under stress conditions.

Transcriptome-wide analysis revealed that arsenite treatment increased the mRNA folding index to a level seen in harringtonine-treated cells (Fig. 3i), indicative of more long-range RNA–RNA interactions and more extensive mRNA folding. Because arsenite is known to induce translation repression, this observation corroborates the suppressive effect of translation to higher-order RNA structures in a real physiological context. Interestingly, arsenite treatment decreased mRNA folding index in the nucleus (Fig. 3i), which might be attributed to the change of RBP composition in the nucleus or other secondary effects.

We next categorized mRNAs into two groups based on their localizations revealed by published SG transcriptomics data24,32. SG-localized transcripts show a lower averaged folding index than those that are not localized to SG (referred to as non-SG; Fig. 3j) under normal and arsenite conditions. Non-SG transcripts demonstrate an increased folding index after arsenite treatment, whereas the folding index of SG-localized transcripts remains unchanged (Fig. 3j). We envision that less extensive mRNA folding enhances the accessibility of mRNAs to interact with RBPs and other RNA molecules, which is crucial to the assembly of multi-component messenger RNP (mRNP) complexes within SGs. Indeed, the binding targets of SG marker proteins G3BP1 and TIA1 (refs. 33,34,35) showed slightly lower folding indexes under both conditions (Supplementary Fig. 12f). However, the small difference in folding indexes between G3BP1 targets and other transcripts suggests the existence of additional factors that determine RNA composition within SGs.

KARR-seq identifies functional intermolecular RNA–RNA interactions

We next analyzed the intermolecular chimeric reads to identify RNA–RNA interactions between different RNA molecules. KARR-seq detects intermolecular interaction between various RNA categories in both intact cells and cell nuclei, such as lncRNA–mRNA, snoRNA–mRNA, snoRNA–rRNA and mRNA–mRNA interactions, with mRNA-mediated interactions taking the largest portion (Fig. 4a). KARR-seq and PARIS data reveal similar intermolecular interaction landscapes (Supplementary Fig. 13a,b), whereas snRNA, snoRNA and some other non-coding RNA species are depleted in RIC-seq data (Supplementary Fig. 13c).

Fig. 4: KARR-seq identifies functional RNA–RNA interactions between diverse RNA categories. a, The landscape of intermolecular RNA–RNA interactions revealed by KARR-seq in K562 cells (left) and K562 nucleus (right). The width of the link between two RNA categories denotes the relative abundance of chimeric reads taken by interactions between these two categories. mRNA–rRNA interactions, which are primarily a result of translation, were excluded from the plots. b, Interactions between C/D box snoRNA and 18S in K562 cells. Previously identified interaction sites are shown in pink. Interaction sites identified by KARR-seq are shown in green. c, Snapshots of KARR-seq data revealing SNORD25–18S and SNORD65–18S interactions. Regions colored in green denote identified interaction regions. The dashed lines denote previously known 2′ OMe modification sites. d, Scheme showing the organization and processing of human pre-rRNA 5′ ETS. e, Top, KARR-seq reads density for interactions between U3 and 5′ ETS in K562 cells (human) and mESCs (mouse). Bottom, KARR-seq interaction maps showing the higher-order structures of 5′ ETS in the corresponding cell lines. Stem loops are enclosed in black squares. f, Relative pre-rRNA levels in K562 cells treated by ASO that blocks a U3 interaction site at the 5′ ETS. Two sets of primers amplifying A′ and A0-proximal regions were applied for qPCR, respectively. Data are mean ± s.d. P values were calculated by Student’s t-test. n = 3 biological replicates. KARR-seq was performed in two biological replicates. Full size image

To assess the sensitivity of KARR-seq in detecting functional intermolecular interactions, we enumerated interactions between C/D box snoRNA and rRNA in HepG2 cells and overlapped these interactions with previously identified rRNA 2′-O-methylation (2′-OMe) sites from the snoRNA database (snoRNA-LBME-db)36. snoRNA–rRNA interactions detected by KARR-seq overlap with 80% (12/15) of known modification sites on 18S (Fig. 4b,c) and 85% (34/40) on 28S (Supplementary Fig. 13d,e), suggesting a high sensitivity of KARR-seq. We also identified eight uncharacterized snoRNA–rRNA interactions on 18S (Fig. 4b,c) and 74 on 28S (Supplementary Fig. 13d,e). We experimentally validated these snoRNA–18S interactions by applying biotinylated antisense oligos (ASOs) that target regions adjacent to these identified interaction sites to pull down specific 18S rRNA fragments. A random biotinylated ASO was used as a control. The input and pull-down samples were subsequently subjected to reverse transcription–quantitative polymerase chain reaction (RT–qPCR) to amplify the corresponding snoRNAs. In four out of the six tested interactions, we observed enrichment of snoRNAs in the 18S pull-down samples compared to the control pull-down, suggesting the validity of these interactions (Supplementary Fig. 13f). These interactions may correspond to potential 2′-OMe sites that are not yet documented or suggest snoRNA functions beyond 2′-OMe deposition. The other two interactions, SNORD65–18S and SNORD12C–18S, could be false positives (Supplementary Fig. 13f).

KARR-seq detects RNA–RNA interactions that affect pre-rRNA processing

rRNA maturation involves stepwise spacer cleavage from the polycistronic 45S pre-rRNA, which relies on extensive interactions between pre-rRNA and U3 RNA37. Several U3 binding sites on the 5′ external transcribed spacer (ETS) have been identified in bacteria and lower eukaryotes biochemically38,39,40,41, but the landscape of U3–pre-rRNA interactions within mammalian cells and how these interactions influence the 5′ ETS structure are unclear.

The 5′ ETS of mammalian pre-rRNA harbors three closely located cleavage sites, namely A′, A0 and 1 (Fig. 4d)37. KARR-seq revealed extensive interactions between U3 and the A′-A0 region in both K562 cells and mESCs, whereas minimal interactions were detected at the A0-1 region (Fig. 4e, top). In the meantime, A′-A0 and A0-1 regions show distinct higher-order structure features: A′-A0 forms highly dynamic stripe and domain structures, whereas A0-1 includes an array of stable stems (Fig. 4e, bottom). The strength of U3–rRNA interaction tends to decrease as intramolecular 5′ ETS interactions become pronounced (Fig. 4e). Therefore, we propose that U3 RNA may regulate 5′ ETS processing by maintaining the correct conformation of A′-A0 through direct U3–pre-rRNA interactions. The A0-1 region is less involved in U3-mediated interactions because its stem structures are relatively stable and less susceptible to conformational changes. ASO (Methods) that blocks U3–ETS interactions at the A′-A0 region increased pre-rRNA level in HepG2 cells compared to the control ASO (Fig. 4f), suggesting the importance of U3–ETS interaction in regulating rRNA biogenesis in mammalian cells.

KARR-seq detects RNA–RNA interactions in virus-infected cells

Many viruses use RNA to store genetic information. These viruses have evolved extensively to regulate their life cycle through RNA structure-based mechanisms and can efficiently harness host cellular machineries42,43. The higher-order structures of most viral genomes and RNA–RNA interactions between virus and host are largely unexplored. In light of this, we applied KARR-seq to A549 cells infected by human RSVs and VSVs, respectively. RSV is a prominent cause of respiratory tract infection in infants, children, the elderly and immunocompromised individuals44, whereas VSV has been used for decades as a model system for negative-sense RNA viruses45.

KARR-seq coverage on VSV is roughly five times the coverage on RSV. KARR-seq data revealed three layers of information: (1) higher-order structures of RSV and VSV RNAs; (2) effects of virus infection on the higher-order structures of the host transcriptome; and (3) RNA–RNA interactions between viral and host RNAs. Both RSV and VSV are non-segmented negative-sense RNA viruses and share a similar genome organization. However, one unique feature in the RSV RNA genome is the presence of a G/C-rich G gene. KARR-seq detected higher-order structures of both RSV and VSV RNAs. Interestingly, within the RSV RNA, identified interactions are clustered around the G gene and are mostly short-ranged (<2 Kb) (Fig. 5a, upper panel). In contrast, the VSV RNA includes substantially more long-range stem–loop interactions (Fig. 5a, lower panel).

Fig. 5: KARR-seq reveals viral RNA structures and virus–host RNA–RNA interactions. a, Loop and stripe structures across the RSV (top) and VSV (bottom) RNAs in infected A549 cells. b,c, KARR-seq arc groups for the NUCB1 (b) and EWSR1 (c) transcripts in control and RSV-infected A549 cells. Folding index: 0.415 for NUCB1 after RSV infection, 0.527 for NUCB1 without infection, 0.411 for EWSR1 after RSV infection and 0.532 for EWSR1 without infection. d, RNA folding index in control, RSV-infected and VSV-infected A549 cells. n denotes the number of chimeric read level folding index. n = 1,772,734 for no infection, n = 596,451 for RSV and n = 159,725 for VSV. The lower and the upper bounds denote 25th and 75th percentiles, respectively. The minima denote the lower bound −1.5× IQR. The maxima denote the upper bound +1.5× IQR. P values were calculated by the two-sided Mann–Whitney test. e,f, The number of host RNAs from each RNA category that interact with RSV (e) and VSV (f) RNAs. g, Fluorescent imaging of GFP-tagged RSV and GFP-tagged VSV after cells were transfected with denoted LNA ASOs. These ASOs target mRNA transcripts at positions that interact with RSV RNA. Scale bar, 100 µm. h,i, The percentage of RSV-positive (h) and VSV-positive (i) cells quantified by flow cytometry after cells were treated with denoted LNA ASOs. Data are mean ± s.d. n = 3 biologically replicates. P values were calculated by two-tailed Student’s t-test. IQR, interquartile range. Full size image

We next analyzed how RSV and VSV infection could affect the higher-order structures of the host transcriptome. We found that RSV infection resulted in a reduction of intramolecular interactions of the host mRNA in A549 cells (Fig. 5b–d). This effect was not observed in VSV-infected cells. Because translation could suppress global mRNA compaction, the differences between mRNA intramolecular interactions in RSV-infected and VSV-infected cells are potentially related to a rapid host translation shutdown upon VSV but not RSV infection46,47.

Both RSV and VSV RNAs interact with host mRNAs and ncRNAs (Fig. 5e,f). Host microRNAs and small nuclear RNAs (snRNAs) have been demonstrated to bind viral genomes to regulate virus life cycles and host RNA metabolism7,48. However, interactions between viral RNA and host mRNAs have not been well documented. KARR-seq detected 111 and 664 host mRNA transcripts that interact with the RSV and VSV RNAs, respectively (Supplementary Fig. 14a,b). Different from the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA that preferentially interacts with the CDS of host mRNAs49, both RSV-mediated and VSV-mediated interactions are enriched at the 3′ UTR of host mRNAs (Supplementary Fig. 14c). The interactions between RSV and VSV RNAs with host RNAs predominantly occurs through N, G and L genes (Supplementary Fig. 14d,e). Although RSV and VSV infections activate similar pathways in A549 cells (Supplementary Fig. 14f,g), mRNA transcripts that interact with RSV and VSV RNAs enriched distinct functions. RSV-interacting transcripts are involved in responses to cytokine and regulation of apoptosis (Supplementary Fig. 14h), whereas VSV-interacting transcripts regulate RNA processing, translation, decay and protein targeting (Supplementary Fig. 14i).

We next investigated the potential functional relevance of host mRNA–RSV interactions. We focused on mRNAs that are related to cytokine-mediated signaling and apoptosis and designed ASOs that contain locked nucleotides (LNA ASOs) to target these mRNAs in A549 cells, to block specific RNA–RNA interactions. We infected the cells with GFP-tagged RSV or VSV and assayed virus replication by measuring GFP signals in ASO-treated cells. As shown by fluorescent imaging and quantification by flow cytometry, LNA ASOs targeting KANK2 and CD44 mRNA repressed RSV replication by more than 60% (Fig. 5g,h). In the meantime, these ASOs showed minor effects on VSV replication (Fig. 5g,i), supporting that the repression effect is RSV specific and likely from the blockage of the corresponding RNA–RNA interactions. These results confirmed the capability of KARR-seq in identifying functional intermolecular RNA–RNA interactions in infectious disease models. Instead of passively binding to the most abundant host transcripts, different viral RNAs interact with host mRNA with diverse functions, suggesting roles of virus-specific RNA–RNA interactions in regulating virus propagation. Future systematic studies are required to reveal the exact molecular mechanisms.