RIP-PEN-seq identifies a class of kink-turn RNAs as splicing regulators

Li, Bin; Liu, Shurong; Zheng, Wujian; Liu, Anrui; Yu, Peng; Wu, Di; Zhou, Jie; Zhang, Ping; Liu, Chang; Lin, Qiao; Ye, Jiayi; He, Simeng; Huang, Qiaojuan; Zhou, Hui; Chen, Jianjun; Qu, Lianghu; Yang, Jianhua

doi:10.1038/s41587-023-01749-0

Download PDF

Article
Published: 10 April 2023

RIP-PEN-seq identifies a class of kink-turn RNAs as splicing regulators

Bin Li ORCID: orcid.org/0000-0002-5717-1551¹^na1,
Shurong Liu¹^na1,
Wujian Zheng^1,2^na1,
Anrui Liu^1,2^na1,
Peng Yu¹^na1,
Di Wu^1,2,
Jie Zhou^1,2,
Ping Zhang¹,
Chang Liu¹,
Qiao Lin^1,2,
Jiayi Ye^1,2,
Simeng He²,
Qiaojuan Huang¹,
Hui Zhou¹,
Jianjun Chen ORCID: orcid.org/0000-0003-3749-2902³,
Lianghu Qu ORCID: orcid.org/0000-0003-3657-2863¹ &
…
Jianhua Yang ORCID: orcid.org/0000-0003-3863-2786^1,2

Nature Biotechnology volume 42, pages 119–131 (2024)Cite this article

11k Accesses
4 Citations
48 Altmetric
Metrics details

Subjects

Abstract

A kink-turn (K-turn) is a three-dimensional RNA structure that exists in all three primary phylogenetic domains. In this study, we developed the RIP-PEN-seq method to identify the full-length sequences of RNAs bound by the K-turn binding protein 15.5K and discovered a previously uncharacterized class of RNAs with backward K-turn motifs (bktRNAs) in humans and mice. All bktRNAs share two consensus sequence motifs at their fixed terminal position and have complex folding properties, expression and evolution patterns. We found that a highly conserved bktRNA1 guides the methyltransferase fibrillarin to install RNA methylation of U12 small nuclear RNA in humans. Depletion of bktRNA1 causes global splicing dysregulation of U12-type introns by impairing the recruitment of ZCRB1 to the minor spliceosome. Most bktRNAs regulate the splicing of local introns by interacting with the 15.5K protein. Taken together, our findings characterize a class of small RNAs and uncover another layer of gene expression regulation that involves crosstalk among bktRNAs, RNA splicing and RNA methylation.

SPF45/RBM17-dependent, but not U2AF-dependent, splicing in a distinct subset of human short introns

Article Open access 13 August 2021

NAP-seq reveals multiple classes of structured noncoding RNAs with regulatory functions

Article Open access 18 March 2024

Smu1 and RED are required for activation of spliceosomal B complexes assembled on short introns

Article Open access 13 August 2019

Main

RNA structural motifs confer diversity of RNA architecture to regulate various biological processes¹. The kink-turn (K-turn) is the most prevalent three-dimensional (3D) RNA structural motif in messenger RNAs (mRNAs) and non-coding RNAs (ncRNAs)^2,3,4. The K-turn structures are typically characterized by a canonical stem (C-stem) followed by an asymmetric 3-nucleotide (3-nt) bulge and G•A and A•G base pairs on the 3′ non-canonical stem (NC-stem)^2,3. The naturally occurring K-turn often serves as a specific binding site for RNA-binding proteins (RBPs), including the protein L7Ae⁵ in archaea and the other homologues YbxF, YlxQ⁶ and 15.5-kDa protein (15.5K)⁷. Notably, the unique conformational property of the K-turn, as well as its binding proteins, enables the K-turn to play pivotal roles in maintaining the proper structure and biological function of various types of RNAs^2,3,4.

The K-turn structural motifs are recurrently observed in various types of ncRNA families and play important roles in RNA metabolism^2,3,4. In eukaryotes, the well-known K-turn ncRNAs are the box C/D small nucleolar RNAs (snoRNAs) and U4/U4atac small nuclear RNAs (snRNAs). The C/D boxes are located at the box C/D RNA terminal pair and form the terminal K-turn motif⁸. Recognition of the K-turn structural motif of C/D box RNAs by the 15.5K protein initiates small nucleolar ribonucleoprotein (snoRNP) assembly to carry out site-specific 2′-O-methylation of ribosomal RNAs (rRNAs) and snRNAs⁹, which is essential for the functional fidelity of ribosomes and gene expression^10,11. The 15.5K protein also binds to the identical K-turn structural motifs in the 5′ stem-loop of U4 and U4atac, promoting the assembly of the major and minor spliceosomes^12,13. Mutations in the U4atac K-turn structural motif have been shown to impair the binding of the 15.5K protein¹⁴ and cause various diseases, such as the Taybi–Linder syndrome (TALS/MOPD1)^15,16, Roifman syndrome (RFMN)¹⁷ and Lowry–Wood syndrome (LWS)¹⁸. However, the prevalence, mechanism and function of K-turn structures in the transcriptome remain largely unknown.

In this study, we developed a method for RNA immunoprecipitation coupled with sequencing of paired ends of ncRNAs (RIP-PEN-seq) to identify the full-length sequences of ncRNAs bound by the 15.5K protein and discover a class of backward K-turn RNAs (bktRNAs; Fig. 1a) with two consensus motifs. In contrast to all well-known K-turn RNAs (ktRNAs) in humans, whose internal loops are located at the 5′ ends of the RNAs (forward ktRNAs (fktRNAs); Fig. 1a), the internal loops of the ktRNAs identified in this study are located at the 3′ termini; we, therefore, named them bktRNAs (Fig. 1a). These bktRNAs have a strong preference for a 5′ motif (CUGA) and a 3′ motif (UGAUG) at the fixed terminal position of RNA. Moreover, we identified a functional bktRNA1 that interacts with the 15.5K and fibrillarin (FBL) proteins to regulate RNA modification of U12 snRNA. Loss of bktRNA1 results in aberrant RNA splicing of hundreds of U12-type introns. In addition, we reveal bktRNA1 and that 80% of randomly selected bktRNAs are involved in the local regulation of intron splicing. Together, our results reveal that bktRNAs form a previously undescribed class of post-transcriptional regulators.

**Fig. 1: A class of ncRNAs with a backward K-turn structure.**

Results

bktRNAs with consensus motifs identified by RIP-PEN-seq

To identify ncRNAs (20–500 nt) with K-turn structural motifs, we developed an efficient RNA cloning scheme called RIP-PEN-seq that uses dual RNA adapters and size selection, and a series of advanced experimental strategies to capture both ends of any ncRNA bound by the 15.5K protein (Fig. 1b, Extended Data Fig. 1a and Methods). We applied RIP-PEN-seq to HEK293T cells expressing FLAG-tagged 15.5K (Extended Data Fig. 1b,c), and a total of approximately 520 million paired-end reads were obtained, of which approximately 282 million could be uniquely mapped to the human genome. Notably, more than 45% of the uniquely aligned reads mapped to previously defined fktRNAs. The start and end sites of these known fktRNAs could be precisely identified at single-nucleotide resolution (Extended Data Fig. 1d,e). For example, the precise 5′ and 3′ ends of 10 known C/D box fktRNAs with K-turn motifs located within the introns of GAS5 genes were identified by our RIP-PEN-seq method (Extended Data Fig. 1f). Together, these results suggest that our RIP-PEN-seq approach not only shows high specificity and accuracy in enriching ktRNAs but can also capture their full-length sequences.

Although almost all known ktRNAs are fktRNAs in humans, as the 15.5K protein has no preference for binding forward or backward K-turn structural motifs^2,19, we hypothesized that the 15.5K RIP-PEN-seq data may contain hidden bktRNAs. We then developed a computational tool, called kturnSeeker, to identify potential ktRNAs based on sequence and secondary structure information from the RIP-PEN-seq data (Extended Data Fig. 1g,h). As a result, we identified 605 previously undescribed fktRNAs as well as 118 canonical fktRNAs (Extended Data Fig. 1i and Supplementary Table 1). Notably, most previously undescribed fktRNAs are located within introns (Extended Data Fig. 1i and Supplementary Table 1). Furthermore, after removing fktRNAs and known annotations and keeping the ktRNAs present in at least two sequencing libraries, we discovered 84 ktRNAs (alignment to 93 genomic loci) with backward K-turn structural motifs, which we named bktRNA1 to bktRNA84 (Supplementary Table 1).

Intriguingly, the K-turn structural motifs of most bktRNAs are located at 4 nt and 2 nt from the RNA 5′ end and 3′ end, respectively (Fig. 1c,d). Moreover, we observed highly substantial enrichment of a CUGA motif often 4 nt downstream of the 5′ end and a UGAUG motif 2 nt upstream of the 3′ end by de novo motif enrichment analysis (Fig. 1e and Extended Data Fig. 2a). Notably, all these enrichment consensus motifs were located within the K-turn structural motifs of bktRNAs (Fig. 1f–i and Extended Data Fig. 2b).

To further confirm that these two conserved motifs can truly adopt the backward K-turn structure in vivo, we developed 15.5K RIP-PEN-SHAPE-MaP, an approach combining SHAPE-MaP^20,21 and our RIP-PEN-seq experiments to probe intact RNA structures of bktRNAs (Extended Data Fig. 2c and Methods). We first constructed reactivity scores for the known fktRNAs and found that paired nucleotides in the NC-stem and C-stems exhibited lower SHAPE reactivity and that unpaired nucleotides in the loop region (Loop) had relatively high SHAPE reactivity (Extended Data Fig. 2d,e and Supplementary Table 2), demonstrating the accuracy of the 15.5K RIP-PEN-SHAPE-MaP method. For example, the SHAPE reactivity signal agreed well with the structure of fktRNA SNORD102 (Extended Data Fig. 2e). We then mapped the SHAPE reactivity signals to the predicted secondary structures of bktRNAs and found that the paired nucleotides in backward K-turn structures had lower reactivity signals than the unpaired nucleotides (Fig. 1g–i and Supplementary Table 2). Therefore, these data confirmed that the 5′ and 3′ motifs identified in bktRNAs are likely to adopt the backward K-turn structure in vivo.

We next applied RIP-PEN-seq to mouse Hepa1-6 cells stably expressing FLAG-tagged 15.5K (Supplementary Fig. 1a,b), and it also detected mouse fktRNAs with high specificity and accuracy (Supplementary Fig. 1c–e). We identified 43 bktRNAs in mice that shared common genomic and motif characteristics with human bktRNAs (Supplementary Fig. 1f–k and Supplementary Table 3). Overall, by developing the RIP-PEN-seq approach and the kturnSeeker algorithm, we uncovered a class of ktRNAs with a backward K-turn structural motif composed of two consensus sequence elements in fixed positions of RNAs in humans and mice.

bktRNAs have complex folding, expression and evolution patterns

To identify bktRNAs across human tissues and cells, we further applied our kturnSeeker tool to 28 small RNA sequencing (sRNA-seq) datasets generated by our PEN-seq method (Methods) and 234 public sRNA-seq datasets produced by the ENCODE consortium. This method revealed 379 high-confidence candidate bktRNAs (Fig. 2a), 72 of which overlapped with the abovementioned bktRNAs bound by the 15.5K protein (~86%, 72/84; Fig. 2a). By integrating the RIP-PEN-seq dataset and the sRNA-seq datasets, we identified 391 bktRNAs (407 genomic loci; Fig. 2b and Supplementary Table 4). As described for the 15.5K-bound bktRNAs, these bktRNAs had substantial enrichment of CUGA motifs 4 nt downstream of the 5′ end and UGAUG motifs 2 nt upstream of the 3′ end (Fig. 2c,d).

**Fig. 2: Complex expression patterns of bktRNAs in tissues, cell lines and subcellular compartments and their evolution patterns.**

To determine the folding characteristics^22,23 of these K-turn sequences, we first devised a nomenclature for the nucleotide positions in the backward K-turn structure according to the nomenclature for the forward K-turn structure^23,24 (Extended Data Fig. 3a). We further examined the distribution of 3b:3n and −1b:−1n sequences in bktRNAs (Extended Data Fig. 3b). We found that almost all bktRNAs have a strong propensity for the 3b:3n = U:U (99.5%, 405/407), which leads to an N3 conformation and the weak folding in response to metal ions^22,23,24,25. We also discovered that approximately 69% (281/407) of bktRNAs have −1b:−1n = C:G or A:U, which leads to good folding in response to metal ions^22,23,24,25 (Extended Data Fig. 3c). Moreover, bktRNAs in mice showed similar folding properties (Extended Data Fig. 3d,e). Notably, the 3b:3n = U:U identified in bktRNAs is considerably different from the distribution of ribosomal Kt-7, SAM-I riboswitch and U4 snRNAs^22,23,24,25.

We next explored the m⁶A modification of bktRNAs^26,27 by intersecting our bktRNAs with the annotated m⁶A sites from RMBase²⁸ and found that 12 and 1 bktRNAs contained m⁶A modifications in humans and mice (Extended Data Fig. 3f), respectively. Moreover, neither human nor mouse bktRNAs contain m⁶A modifications at the 1n position (Extended Data Fig. 3g). Together, these findings suggest that the sequence composition of the backward K-turn structure as well as the interaction with RBPs might contribute to the folding of bktRNAs.

We further characterized bktRNAs and found that the sizes of these bktRNAs ranged from 20 nt to ~400 nt, with a median length of 40 nt (Fig. 2e). Notably, more than 84% of bktRNAs (342/407) originated from the intron locus (Fig. 2f). Intriguingly, the host genes of the bktRNAs were significantly enriched in various Gene Ontology terms, such as the actin filament-based process and regulation of cell adhesion terms (Extended Data Fig. 3h).

To explore the expression patterns of bktRNAs, we analyzed bktRNA expression levels across cells and tissues as well as subcellular localization. Numerous bktRNAs appeared to be specifically expressed across human tissues or cells (Fig. 2g,h and Extended Data Fig. 3i). For example, the bktRNA198 and bktRNA126 genes were substantially expressed in lung tissues (Extended Data Fig. 3i). Moreover, we applied our PEN-seq method in subcellular RNA fractions (sub-PEN-seq) and discovered that most bktRNAs exhibited nuclear-specific localization patterns (Fig. 2i,j and Supplementary Table 4).

We next reconstructed homologous families based on sequence similarity, and we inferred a stringent minimum evolutionary age of bktRNAs. We discovered that most (145/391, 37.1%) bktRNA families were primate specific, but 16 (4.1%) families likely originated more than 90 million years (Myr) ago, and one (0.26%) family likely originated more than 450 Myr ago (Fig. 2k). Interestingly, we found that bktRNA1, which is located in an intron of CWF19L1, was the most conserved among all bktRNAs (Fig. 2k) and is an endogenous chimeric ncRNA composed of a backward K-turn domain and an H/ACA domain (SNORA12) (Extended Data Fig. 4a–c, Supplementary Figs. 2 and 3 and Supplementary Note 1). Notably, the SHAPE reactivity signals from RIP-PEN-SHAPE-MaP experiments also confirmed this conservative secondary structure of bktRNA1 (Extended Data Fig. 4b,c). Moreover, we discovered a K-turn-like structure that formed a helix–internal-loop–helix and did not have G•A and A•G base pairs in bktRNA1 (Extended Data Fig. 4b,c and Supplementary Note 1). In summary, we have confidently identified a large number of bktRNAs with complex folding, expression patterns and sequence conservation.

U12 snRNA is the direct target of bktRNA1

To identify direct targets of bktRNAs, we performed ultraviolet cross-linking, ligation and sequencing of hybrids (CLASH) experiments for 15.5K (Fig. 3a). By analyzing the chimeric reads in the 15.5K CLASH sequencing data (Supplementary Fig. 4a), we found that bktRNA1 interacted with U12 snRNA and formed a duplex containing 14 perfect base pairs (Fig. 3b,c and Supplementary Fig. 4a,b). Moreover, PARIS data in HEK293T cells²⁹ also showed that bktRNA1 and U12 formed an RNA duplex in vivo (Fig. 3d and Supplementary Fig. 5a,b). We also found that the evolutionary conservation of the pairing region between bktRNA1 and U12 is very highly conserved across 100 vertebrates (Fig. 3e,f and Extended Data Fig. 5a). In addition, the fluorescence in situ hybridization (FISH) and immunofluorescence (IF) experiments confirmed that bktRNA1 and U12 snRNA showed co-localization with 15.5K proteins (Extended Data Fig. 5b,c). These results reveal that U12 snRNA is the target of bktRNA1 bound by the 15.5K protein.

**Fig. 3: bktRNA1 guides the FBL protein to introduce 2′-O-methylation in U12 snRNA.**

bktRNA1 is indispensable for 2′-O-methylation of U12 snRNA

As a known 2′-O-methylation site³⁰ is located within the complementary region between U12 snRNA and bktRNA1, we hypothesized that bktRNA1 guides the 2′-O-methyltransferase FBL to introduce 2′-O-methylation of U12 snRNA. To test this hypothesis, we first developed an infrared primer extension (irPE) method (Methods) and confirmed that the A8 residue in U12 snRNA is 2′-O-methylated in human cells (Supplementary Fig. 6a–c). The known 2′-O-methylated G22 site and two new sites (Um2 and Am21) were also identified by the irPE method (Supplementary Fig. 6b,c). In addition, we identified 34 bktRNA1:U12 chimeras in the FBL CLASH sequencing data (Fig. 3g and Supplementary Fig. 7a,b). IF and FISH experiments further confirmed that bktRNA1 co-localized with FBL (Supplementary Fig. 8a,b). Moreover, knockdown of FBL in HCT116 cells decreased the level of 2′-O-methylation of A8 (Am8) in U12 snRNA as well as the level of 2′-O-methylation of G22, which is installed by FBL³⁰ (Supplementary Fig. 8c–g). These results suggest that bktRNA1 might guide FBL to introduce a conserved 2′-O-methylation of the A8 residue in U12 snRNA.

To investigate bktRNA1 participation in the 2′-O-methylation of U12 snRNA, we knocked out bktRNA1 in human HCT116 cells via CRISPR–Cas9. We generated four knockout (KO) clones, including the KO-1 and KO-2 clones with deletion of full-length bktRNA1 and the KO-3 and KO-4 clones with deletion of the guide region and K-turn domains at the 3′ end of bktRNA1 (Fig. 3h and Supplementary Fig. 9a–c). Quantitative polymerase chain reaction (qPCR) and northern blotting analyses failed to detect accumulation of bktRNA1 in any of the four KO cell lines, and the expression of the U12 snRNA was not affected (Fig. 3i,j). Likewise, correct processing and efficient accumulation of CWF19L1 mRNA in bktRNA1-KO cells were verified by RT–PCR and Sanger sequencing (Supplementary Fig. 9d,e). The 2′-O-methylation state of U12 at A8 was monitored by the irPE method with RNA extracted from wild-type (WT) and KO cells (Fig. 3k). Comparing with WT HCT116 cells, the primer extension stop signal was absent from A8 in KO cells, indicating that 2′-O-methylation of A8 was completely inhibited in the absence of bktRNA1 (Fig. 3k).

We further restored bktRNA1 expression in KO-4 cells (Supplementary Fig. 9f,g) and found that stable expression of full-length bktRNA1, but not partial bktRNAs, in the KO cells (Fig. 3l) largely restored the primer extension stop signals at 1 nt downstream of A8 (Fig. 3m), indicating that ectopically expressed bktRNA1 fully restored A8 methylation of U12 snRNA. Additionally, bktRNA1 mutants with mutation or truncation of the backward K-turn domain, as well as substitution in the residue that directly interacts with the A8 residue of U12 snRNA, could completely inhibit or significantly diminish A8 methylation of U12 snRNA (Supplementary Figs. 10 and 11 and Supplementary Note 2). Moreover, artificial guide RNAs that contained the guide sequences of bktRNA1 could induce 2′-O-methylation of A8 in U12 snRNA (Supplementary Fig. 12 and Supplementary Note 2). Collectively, these results demonstrate that bktRNA1 is required for efficient 2′-O-methylation of U12 snRNA.

Depletion of bktRNA1 causes dysregulation of U12-type introns

Our finding that bktRNA1 is responsible for A8 2′-O-methylation in U12 snRNA raised the question of whether bktRNA1 and its modification are required for the splicing of U12-type introns in human cells. To explore this, we performed strand-specific mRNA-seq in all four bktRNA1-KO cell lines (Extended Data Fig. 6a). Global analysis of aberrant intron retention revealed substantially greater retention of U12-type introns in bktRNA1^−/− cells than in WT cells (Fig. 4a–d and Supplementary Table 5). Depletion of bktRNA1 affected more than 75% of U12-type introns (Fig. 4e); at least 37% of the retained introns had significant changes (P < 0.05; Extended Data Fig. 6b). In contrast, depletion of bktRNA1 had a negligible effect on U2-type introns (Extended Data Fig. 6b). Moreover, compared to U2-type introns, U12-type introns showed global and considerable increases in inclusion level in all four bktRNA1^−/− cell lines (Fig. 4f–i). Furthermore, these retained U12-type introns exhibited striking overlap across the four bktRNA1-KO cell lines (Fig. 4j). Intriguingly, differential expression analysis revealed that the aberrant splicing of U12-type introns did not affect the expression levels of mature RNAs, consistent with the findings of a previous study³¹ (Extended Data Fig. 6c). These results suggest that loss of bktRNA1 has a global effect on the splicing of U12-type introns.

**Fig. 4: Dysregulation of U12-type intron splicing in bktRNA1-depleted cells.**

To validate the aberrant U12-type intron retention identified by RNA-seq, we applied qPCR and RT–PCR to measure splice products in bktRNA1-KO and control cells. As expected, the abundance of all 17 randomly selected U12-type introns was upregulated in bktRNA^−/− cells as compared to WT cells and the abundance of the U2-type intron in GAPDH genes (Fig. 4k–m and Extended Data Fig. 6d). Notably, re-expression of bktRNA1 or artificial guide RNAs (ART7 and ART10) in KO cells restored correct U12-type splicing (Extended Data Fig. 6e,f). In contrast, restoration of the H/ACA domain (SNORA12) of bktRNA1 did not rescue the deficiency in minor splicing (Extended Data Fig. 6g). These data strongly suggest that bktRNA1 and Am8 in U12 snRNA are important for the fidelity of U12-type splicing in human cells.

bktRNA1 modulates the recruitment of ZCRB1 to minor spliceosome

To examine whether the loss of bktRNA1 interferes with the assembly of minor spliceosome components, we performed a RIP assay in which RNA–protein complexes were immunoprecipitated using antibodies against U12-associated proteins (ZCRB1, PDCD7, RNPC3 and SNRNP48)³² (Fig. 5a and Extended Data Fig. 7a). The amount of snRNAs (U11, U12, U4atac, U5 and U6atac) involved in minor spliceosomes in the co-precipitate were then measured by qPCR and northern blotting in WT and bktRNA1 KO cells. Compared to that in WT cells, we found that the ZCRB1-antibody-bound complex in KO cells had obviously lower amounts of U11 and U12 snRNAs (Fig. 5b and Extended Data Fig. 7b,c). Additionally, these differences were not due to the relative amounts of proteins because no changes in the protein levels of ZCRB1 were observed upon knockout of bktRNA1 (Fig. 5c). Overexpression of bktRNA1 in KO cells largely increased the enrichment of the U11 and U12 snRNAs in the ZCRB1 complex (Fig. 5d,e and Extended Data Fig. 7d), indicating that ectopically expressed bktRNA1 fully restored the interaction of ZCRB1 with U11 and U12. We next performed U12 chromatin immunoprecipitation by RNA purification (ChIRP) assays followed by western blotting to verify the interaction of ZCRB1 and U12 in vivo (Fig. 5f and Supplementary Fig. 13a,b). In ChIRP analyses, U12 probes retrieved more ZCRB1 in bktRNA1 WT cells than in KO cells (Fig. 5f, upper panel). Conversely, re-expression of bktRNA1 in KO cells markedly increased the amounts of ZCRB1 retrieved by U12-specific probes (Fig. 5f, lower panel).

**Fig. 5: Depletion of bktRNA1 affects U12 interactions with ZCRB1.**

We further performed RNA affinity chromatography followed by immunoblotting to validate the proteins that interact with methylated U12 probes. We discovered that ZCRB1 selectively bound to 2′-O-methylated U12 RNA baits compared to the unmethylated control (Fig. 5g). Similarly to endogenous proteins, the purified recombinant ZCRB1 proteins (Supplementary Fig. 13c,d) also preferentially bound to the 2′-O-methylated U12 probe (Methyl-U12, K_d = 340.3 ± 48.9 nM) over the unmethylated one (Unmethyl-U12, K_d = 2,293.0 ± 358.4 nM) in vitro (Fig. 5h). We further performed an RNA EMSA analysis with mutant U12 probes, which have a change of A to G (UUGA) in the consensus sequence recognized by ZCRB1 (refs. ^33,34), and found that recombinant ZCRB1 exhibited higher affinity toward the mutated U12 probe with methylated guanine ribonucleotide (Methyl-mutU12, K_d = 1,787.0 ± 211.4 nM) than unmethylated probes (Unmethyl-mutU12, K_d = 13,029.0 ± 4,687.0 nM) (Supplementary Fig. 13e).

We then investigated the minor splicing dysregulation and phenotypic changes upon ZCRB1 knockdown in HCT116 cells (Extended Data Fig. 8a). The high similarity between the bktRNA1-KO and ZCRB1-knockdown cells for inducing minor intron retention and functional phenotypes further indicates a close connection between bktRNA1 and ZCRB1 (Extended Data Fig. 8b–m, Supplementary Table 6 and Supplementary Note 3). Together, these results suggest that the 2′-O-methylation at A8 in U12 snRNA guided by bktRNA1 is crucial for the recruitment of ZCRB1 to the U11–U12 di-snRNP complex as well as for the splicing of U12-type introns (Fig. 5i).

bktRNAs regulate the splicing of local introns

Given that bktRNAs are universally located within introns and that their binding partner 15.5K can promote the assembly of minor and major spliceosomes⁷, we hypothesized that bktRNAs might alternatively be involved in regulating intron processing and RNA splicing. To test this hypothesis, we performed PEN-seq and mRNA-seq in 15.5K-knockdown cells (Supplementary Fig. 14a,b). As expected, the expression of bktRNAs was markedly decreased in 15.5K-knockdown cells compared to control cells (Supplementary Fig. 14c,d and Supplementary Table 7). Notably, intron retention analysis from the mRNA-seq datasets showed that the silencing of 15.5K resulted in the global alteration of intron splicing (Supplementary Fig. 14e,f and Supplementary Table 8). Moreover, the inclusion level was considerably different between bktRNAs and the remaining introns upon 15.5K knockdown (Supplementary Fig. 14g,h).

We further validated the splicing efficiency of local introns containing bktRNA1 in WT and KO cells by qPCR and found that the depletion of bktRNA1 resulted in downregulation of local intron splicing but did not affect the distant intron splicing (Fig. 6a). Moreover, the impaired splicing efficiency of local introns could not be rescued by re-expression of WT bktRNA1 in either bktRNA1 KO-1 or KO-4 cells (Fig. 6b), implying that bktRNA1 might regulate the local splicing of introns by sequence elements of bktRNA1 but not the specific bktRNA1 transcript, which may be similar to some long non-coding RNAs (lncRNAs)³⁵.

**Fig. 6: Local regulation of intron splicing by bktRNAs.**

To test the role of bktRNAs in the local regulation of intron splicing, we constructed GFP reporters that harbor bktRNAs with WT or backward K-turn motif mutants (mutated from CUGA to CUAG, Mut) (Fig. 6c). Consistent with the local regulation of intron splicing by bktRNA1, WT bktRNA1 in the GFP reporter intron also promoted the splicing of local introns, whereas mutation of the 5′ motif (bktRNA1-MM) or deletion of the backward K-turn motif significantly decreased the splicing of local introns (Fig. 6d). By applying this GFP reporter to ten randomly selected bktRNAs, we found that eight bktRNAs can regulate the splicing of local introns (Fig. 6d). For example, WT bktRNA168 enhanced the local splicing in comparison to the mutant type (Fig. 6d), whereas bktRNA2 with a backward K-turn-mutant increased the splicing of local introns (Fig. 6d). These results suggest that the backward K-turn motif of bktRNAs is important for the splicing of local introns.

To confirm the local regulation of bktRNAs in intron splicing in vivo, we mutated the 5′ motif (from CTGA to CTAG) of the backward K-turn structure in bktRNA161 with prime editing³⁶ (Supplementary Fig. 15a). We generated seven homozygous mutants (from mut161-1 to mut161-7) based on Sanger sequencing (Supplementary Fig. 15b). In line with the GFP reporter assays (Fig. 6d), the splicing of the local intron containing bktRNA161, compared to the distant intron, was enhanced upon mutation of the backward K-turn structural motif of bktRNA161 in all mutants (Fig. 6e). Moreover, the binding affinity of the bktRNA161-containing intron with the 15.5K protein was substantially reduced in bktRNA161 mutant HEK293T cells (Fig. 6f and Supplementary Fig. 15c). Together, these results strongly support that bktRNAs are involved in the local regulation of intron splicing in a backward K-turn motif-dependent manner (Fig. 6g).

Discussion

In this study, we discovered a class of bktRNAs that act as binding partners for the K-turn binding protein 15.5K by developing the RIP-PEN-seq method, and we revealed that bktRNA1 can act as a post-transcriptional regulator of minor splicing and that bktRNAs can regulate the splicing of local introns. Although U12-type introns constitute ~0.5% of the introns in the human genome, they exist in approximately 700–800 genes involved in various cellular processes^37,38. Mutations in genes encoding the components of the U12 spliceosome, including both snRNAs and interacting proteins, have been shown to cause a series of developmental disorders in humans^{39,40,41,42,43}. For example, mutation in U12 is associated with cerebellar ataxia⁴⁴. These findings raise an interesting question: whether aberrant expression of bktRNA1 could lead to some of the abovementioned developmental disorders in humans. Coincidentally, mutation of CWF19L1, the host gene of bktRNA1, has been considered as cause of autosomal recessive cerebellar ataxia (ARCA)^45,46,47, consistent with the phenotype caused by the mutation in U12 snRNAs⁴⁴. Given that the loss of bktRNA1 leads to global splicing dysregulation of U12-type introns, we hypothesized that mutation of CWF19L1 might affect the expression and processing of bktRNA1 to cause ARCA disease. Therefore, future works might address whether bktRNA1 is associated with ARCA.

Although backward K-turn structural motifs have been identified in few ncRNAs (for example, 23S rRNA from Archaea)², these motifs are not composed of consensus sequences and are located within different positions of ncRNAs. In contrast to these few ncRNAs², our bktRNAs identified from mammals all have two consensus sequence elements located within backward K-turn motifs. Importantly, these two consensus sequence elements are located at fixed positions in bktRNAs (Fig. 1c–e). Moreover, we found that these backward K-turn structural motifs of bktRNAs are indispensable for avoiding degradation by exonucleases, the processing and maturation of bktRNAs (Supplementary Fig. 10 and Supplementary Note 2) and the local regulation of intron splicing (Fig. 6).

We used the bktRNA1 gene as a model to explore the regulatory function of bktRNAs (Fig. 5i). Moreover, the consensus structural motifs of bktRNAs and their dynamic, cell-specific/tissue-specific expression patterns have shed light on their roles in gene expression regulation. We found that bktRNA1 and other bktRNAs are involved in the local regulation of intron splicing (Fig. 6g). Notably, the backward K-turn structural motifs are indispensable for the local regulation of intron splicing by bktRNAs (Fig. 6). Interestingly, bktRNAs can act not only as positive regulators but also as negative regulators of the local splicing of introns (Fig. 6g), suggesting that bktRNAs may interact with different factors to regulate the splicing of local introns. Thus, it would be very interesting to explore any cofactors (for example, splicing factors) that control the local regulation of intron splicing by these bktRNAs.

In summary, our data strongly suggest that bktRNAs form a class of post-transcriptional regulators that probably help RBPs recognize substrates through base-pairing interactions (for example, bktRNA1) and generally function in modulating their binding partner 15.5K to regulate the splicing of local introns. Our studies reveal that a single methylation alteration caused by the loss of bktRNA1 contributes to global splicing dysregulation and that bktRNAs can act as local regulators in the splicing of introns. Overall, these discoveries add another layer of complexity to the control of gene expression that involves crosstalk among RNA splicing, RNA methylation and bktRNAs.

Methods

Construction of RIP-PEN-seq and PEN-seq libraries

For conventional RIP-seq, the immunoprecipitated RNAs (especially for RNAs with length ≥50 nt) were often fragmented and then subjected to RNA-seq library construction with random primer-based reverse transcription⁴⁹. As a result, the conventional RIP-seq method cannot identify the full length of immunoprecipitated RNA, and, thus, conventional RIP-seq cannot discover the precise positions of motifs at RNAs. In this study, we used dual RNA adapters to capture both ends of ncRNAs. This strategy and the following strategies allowed us to identify the full-length sequences of ncRNAs (from 20 nt to 500 nt) and facilitated the discovery of bktRNAs with characteristic motifs and structures presented at RNA ends.

RIP-PEN-seq and PEN-seq (Fig. 1b and Extended Data Fig. 1a) employed several strategies to achieve high-performance ncRNA full-length sequencing. First, to enable the high ligation efficiency and quantitative full-length library, specific randomized barcode-containing (4N, N represents for A, T or U, G, C) 5′ adapters and 3′ adapters were synthesized and directly ligated to RNAs. Second, to increase the diversity of RNA species and discover low-expression RNAs, we used RNase H or Ribo-Zero strategies to remove high-abundance rRNAs and/or snRNAs^50,51. It is worth noting that the Ribo-Zero kit from Illumina had been discontinued, and the nuclease-mediated depletion displays huge biases in sRNA-seq following the traditional procedure⁵². Here, we performed rRNA/snRNA depletion after adaptors ligation and used single-stranded DNA (ssDNA) exonuclease RecJf rather than DNase I to remove the probes. Third, to avoid the widespread mispriming artifacts⁵³ and obtain full-length cDNAs, we used a nested reverse transcription primer during reverse transcription, along with a full-length reverse PCR primer for amplification. Finally, to overcome the obstacles of RNA modifications and stable RNA secondary structures in cDNA synthesis, we used SuperScript IV Reverse Transcriptase, which has a high level of processivity and thermostability, to generate cDNAs.

RIP experiments were performed using the protocol outlined in a previous report^54,55. In brief, for each immunoprecipitate, 1 × 10⁷ cells stably expressing FLAG-15.5K were pelleted; resuspended in an equal volume of ice-cold polysomal lysis buffer (10 mM HEPES pH 7.0, 100 mM KCl, 5 mM MgCl₂, 0.5% NP-40) supplemented with 1 mM DTT, 100 U ml⁻¹ RNase inhibitor (Takara), 1× protease inhibitor cocktail (Roche) and 0.4 mM RVC (New England Biolabs (NEB)); incubated on ice for 15 minutes; snap frozen in liquid nitrogen; and stored at −80 °C. After the lysate was thawed on ice and centrifuged at 15,000g for 15 minutes, the supernatant was pre-cleared with Dynabeads Protein G (Invitrogen) at 4 °C for 1 hour. Ten microliters of the supernatant was saved as input and mixed with 1 ml of RNAzol. Subsequently, a 100-μl aliquot of the cell extract was diluted with 900 μl of NT2 buffer (50 mM Tris pH 7.4, 150 mM NaCl, 1 mM MgCl₂, 0.05% NP-40) supplemented with 1 mM DTT, 100 U ml⁻¹ RNase inhibitor (Takara), 1× protease inhibitor cocktail (Roche) and 20 mM EDTA. The extracts were mixed with 5 μg of an anti-FLAG antibody (Proteintech) or a normal IgG antibody (Millipore) and incubated overnight at 4 °C with end-over-end rotation. The next day, the RNP/antibody complex was precipitated by incubation with Dynabeads Protein G at 4 °C for 3 hours. The beads were collected and washed five times in NT2 buffer. After the final wash, the beads were resuspended in 1 ml of RNAzol and saved as RIP RNA or IgG RNA.

For PEN-seq, total RNA was isolated from cell lines with RNAzol, treated with RQ1 DNase (Promega) and purified by RNA Clean & Concentrator-5 (Zymo Research). The samples were then ligated to 3′ randomized RNA adaptor with T4 RNA ligase 2 truncated KQ (NEB) in 1× T4 RNA ligase reaction buffer supplemented with 12.5% PEG 8000 at 16 °C for 18 hours. The excessive adaptors were digested with 100 U of 5′ deadenylase (NEB) at 30 °C for 1 hour, incubated with 2 μg of Escherichia coli single-strand DNA-binding protein (Promega) on ice for 30 minutes and subjected to ssDNA digestion with 60 U of RecJf (NEB) at 37 °C for another hour. The ligated RNA was ligated to a 5′ randomized RNA adaptor using T4 RNA Ligase 1 (NEB) in 1× T4 RNA ligase reaction buffer supplemented with 1 mM ATP at 16 °C for 18 hours. The ligated RNA was column-purified by RNA Clean & Concentrator-5 (Zymo Research) and subjected to RNase H-based ribosomal RNA depletion according to a standard protocol with some modifications⁵¹. For rRNA/snRNA probe digestion, DNase I was replaced by RecJf exonuclease to prevent digestion of the 3′ randomized RNA adaptor. After column-based purification, the rRNA/snRNA-depleted RNA was reverse transcribed using SuperScript IV Reverse Transcriptase (Invitrogen) in 1× reverse transcription buffer with truncated reverse transcription primer to eliminate mispriming artifacts. cDNA was amplified using Phusion High-Fidelity DNA Polymerase (Thermo Fisher Scientific) in 1× Phusion HF buffer (Thermo Fisher Scientific) with primers (the RP1 forward primer and RPI-X reverse primer). The PCR-amplified cDNA was gel purified using a 4% NuSieve 3-1 Agarose gel (Lonza) to remove adapter dimers and sequenced on an Illumina HiSeq X Ten platform with paired-end reads (150 bp or 125 bp) at Annoroad Gene Technology Company. The adapters and primers are listed in Supplementary Table 9.

RIP-PEN-SHAPE-MaP library construction

RIP-PEN-SHAPE-MaP was performed on HEK293T-F-15.5K cells as previously described²¹ with some modifications. In brief, RIP experiments were carried out as described above until the last NT2 buffer wash. After the last wash, the beads were incubated with SHAPE modification buffer (100 mM HEPES pH 8.0, 6 mM MgCl₂, 150 mM NaCl, 50 mM NAI-N₃ (MedChemExpress, HY-103006))⁵⁶ at 37 °C for 12 minutes on a thermomixer at 1,000 r.p.m. The control group was prepared using DMSO following the same procedure as described above. RNA was extracted using RNAzol, treated with DNase I and purified with RNA Clean & Concentrator-5 (Zymo Research) in accordance with the manufacturer’s protocol. Then, approximately 300 ng of NAI-N3-treated or DMSO-treated RNA sample was subjected to the PEN-seq procedure with some modifications. (1) The 3′ randomized RNA adaptor and 5′ randomized RNA adaptor were replaced with randomized adaptors containing six degenerate nucleotides. (2) To increase the read coverage on bktRNAs, we designed probes⁵⁷ for RNase H-based high-abundance snoRNA subtraction in addition to rRNA and snRNA depletion. The high-abundance snoRNAs were determined according to the top 1,000 reads in 15.5K RIP-PEN-seq datasets. (3) Reverse transcription was performed using SuperScript II Reverse Transcriptase (Thermo Fisher Scientific) in MnCl₂-containing 1× reverse transcription buffer (50 mM Tris-HCl pH 8.0, 75 mM KCl, 6 mM MnCl₂, 10 mM DTT)²⁰. The whole reaction system in the tube was incubated at 42 °C for 3 hours. (4) For cDNA purification, Exo I (NEB) was used to digest excess reverse transcription primers at 37 °C for 15 minutes. Then, 7 μl of 1 M NaOH and 5 μl of 0.5 M EDTA per 20-μl reaction volume were added to remove RNA templates at 70 °C for 12 minutes, after which cleanup of the cDNA was performed with Oligo Clean & Concentrator (Zymo Research). (5) Library amplification was performed using NEBNext Ultra II Q5 Master Mix (NEB) and was programmed as follows: stage I: 98 °C for 30 s; stage II: 98 °C for 10 s, 65 °C for 75 s, with a limited cycle; stage III: 65 °C for 5 minutes. The number of cycles was 16–20. The adapters, primers and probes are listed in Supplementary Table 9.

Mapping of 2′-O-methylation residues

The positions of 2′-O-methyls were identified by dNTP concentration-dependent primer extension (PE) assays. The primer used for U12 primer extension (irPE-U12) was synthesized by Sangon. To label the azide-modified oligonucleotides with IR dye, 0.4 nmol of oligonucleotides was mixed with 10 nmol IRDye 680RD DBCO (LI-COR Biosciences) in 1× PBS (Thermo Fisher Scientific) at 25 °C overnight. The IR dye-labeled oligonucleotides were then purified with Oligo Clean & Concentrator kits (Zymo Research). The 5′ end labeled primers were extended by SuperScript III Reverse Transcriptase (Invitrogen) in the presence of decreasing amounts (0.5 mM, 0.05 mM and 0.01 mM) of dNTPs using 30 μg of total RNA to determine modifications in U12 snRNA. Sequence analysis was carried out according to the dideoxy method with a Sequenase Version 2.0 DNA Sequencing Kit (USB). The products were separated on a 10% urea-polyacrylamide gel and visualized on a LI-COR Odyssey Infrared Imager with Odyssey software version 3.0.

ChIRP

ChIRP was performed according to previous studies^58,59,60 with some modifications. HCT116 cells were crosslinked with 3% formaldehyde at room temperature for 30 minutes, followed by 125 mM glycine quenching at room temperature for 5 minutes. Cells were centrifuged, and the pellet was washed with ice-cold PBS and then snap frozen in liquid nitrogen and stored at −80 °C. Crosslinked cells (~2 × 10⁷) were resuspended with 1 ml of ChIRP pre-digestion buffer (20 mM Tris-HCl pH 7.5, 5 mM MgCl₂, 0.5 mM CaCl₂, 0.5% Triton X-100) supplemented with 1× protease inhibitor cocktail (Roche), 1 mM DTT, 100 U ml⁻¹ RNase inhibitor, 10 mM RVC (NEB)n and 12 U of DNase I (Promega) and rotated at 37 °C for 10 minutes. The reaction was stopped with 20 mM EDTA, and cells were pelleted, washed once with 1× PBS and resuspended in ChIRP cell lysis buffer (50 mM Tris-HCl pH 7.0, 10 mM EDTA, 1% SDS) with 1× protease inhibitor cocktail (Roche), 1 mM DTT and 100 U ml⁻¹ RNase inhibitor. Cells were sonicated using Sonic Vibra-Cell VCX130 for 20 cycles with the following conditions: 30% amplitude, 10 s on, 10 s off. After centrifugation at 16,000g for 10 minutes at 4 °C, the supernatant was collected and pre-cleared with Dynabeads MyOne Streptavidin C1 beads (Thermo Fisher Scientific) for 30 minutes at 37 °C. For hybridization, the pre-cleared cell lysates were diluted in two times volume of ChIRP hybridization buffer (50 mM Tris-HCl, pH 7.0, 750 mM NaCl, 1 mM EDTA, 1% SDS, 15% formamide) with 1× protease inhibitor cocktail (Roche), 1 mM DTT and 100 U ml⁻¹ RNase inhibitor and incubated with 100 pmol of U12-targeted biotinylated probe mix (ChIRP-U12-1, 2, 3) or negative control probe (ChIRP-NC) at 37 °C overnight. The next day, 100 μl of pre-washed Dynabeads MyOne Streptavidin C1 beads were added and rotated for an additional 30 minutes. The beads were washed five times with ChIRP washing buffer (2× SSC, 0.5% SDS) at 37 °C. A quarter of the ChIRP products were used for RNA isolation, and the remaining were subjected to protein elution. For RNA isolation, input and ChIRP samples were resuspended in ChIRP RNA-isolation buffer (10 mM Tris-HCl pH 7.0, 100 mM NaCl, 1 mM EDTA, 0.5% SDS) supplemented with 1,000 mg ml⁻¹ proteinase K and incubated at 50 °C for 45 minutes, followed by 95 °C for 15 minutes and RNAzol extraction. Eluted RNA was subjected to ChIRP-qPCR for detection of related transcripts. For protein extraction, input and ChIRP samples were resuspended in ChIRP biotin elution buffer (7.5 mM HEPES pH 7.5, 12.5 mM D-biotin, 1.5 mM EDTA, 75 mM NaCl, 0.15% SDS, 0.075% Sarkosyl, 0.02% Na-deoxycholate), mixed at 25 °C for 20 minutes and at 65 °C for 10 minutes. After collecting the eluents, the beads were eluted again. The two eluents were pooled and precipitated with 20 μg ml⁻¹ protein carrier (BSA) and four times volume of ice-cold acetone at −20 °C overnight. The next day, proteins were pelleted and dissolved in 1× SDS loading buffer (Beyotime) and boiled at 95 °C for 30 minutes. Final protein samples were subjected to SDS-PAGE for western blotting. The probes used for ChIRP are listed in Supplementary Table 9.

Expression and purification of ZCRB1 protein

We use the E. coli system to produce and purify specific ZCRB1 protein following the HaloTag protein purification system (Promega) with pieces of modifications. In brief, pH6HTN-ZCRB1 vectors that expressed HaloTag fusion ZCRB1 protein were transformed into E. coli Transetta DE3 chemically competent cells (TransGen Biotech). A fresh colony was inoculated into 2YT medium containing the antibiotic and grown at 37 °C for overnight. On the second day, the starter culture was diluted as 1:100 in fresh 2YT medium with antibiotic and cultured at 37 °C for about 6 hours, reaching the logarithmic growth phase. Then, 1 ml of culture was seeded into 100 ml of 2YT medium with IPTG at 0.4 nM final concentration and grown for 4 hours at 25 °C for harvesting. According to the HaloTag protein purification system instructions, cells in 100 ml of medium were harvested by centrifugation at 4,000g for 10 minutes at 4 °C, and then the cell pellets were resuspended and lysed by sonication (Sonic Vibra-Cell VCX130) in 10 ml of Halo purification buffer (50 mM HEPES pH 7.5, 150 mM NaCl, 0.5 mM EDTA, 0.005% NP-40) plus with 1 mM DTT, complete protease inhibitor cocktail (Roche), 1 mg ml⁻¹ lysozyme and 0.01 U of DNase I for 5 s on, 5 s off, for 36 cycles with 35% amplitude. After centrifugation at 10,000g for 15 minutes at 4 °C, the supernatant was transferred for HaloTag protein purification by HaloTag Resin, which had been pre-washed using HaloTag purification buffer. With binding the HaloTag-ZCRB1-fused protein and centrifugation, the supernatant (the sample flowthrough) was discarded. By washing with HaloTag purification buffer for a total of three times, the HaloTag Resin bound with ZCRB1 was clove with TEV protease cleavage solution (60 μl of TEV protease supplied in 1 ml of HaloTag purification buffer), which was removed from ZCRB1 elution by HisLink Resin. Finally, 2 ml of eluted ZCRB1 was concentrated with an Amicon Ultra-4 Centrifugal Filter device (10 K_d) for 30 minutes up to 100-μl volume. The final ZCRB1 protein and the reservations at every purified step were analyzed by Coomassie blue staining and western blot, simultaneously.

RNA electrophoretic mobility shift assays

RNA oligonucleotides labeled Cy5 were synthesized by RiboBio. The REMSA was carried out per the instructions of the LightShift Chemiluminescent RNA EMSA Kit (Thermo Fisher Scientific) with some modifications. In brief, RNA probes in annealing buffer (20 mM Tris pH 7.5, 200 mM KCl) were denatured and annealed by incubation at 95 °C for 4 minutes and then slowly cooled down to 25 °C at 0.05 °C per second. Recombinant ZCRB1 was diluted to concentration series of 0 nM, 50 nM, 100 nM, 200 nM and 300 nM in Halo purification buffer. For each reaction, 1 μl of RNA probes with 6.25 nM final concentration and 1 μl of ZCRB1 protein (0 nM, 50 nM, 100 nM, 200 nM and 300 nM) were incubated in 2 μl of 10× REMSA Binding Buffer (supplied in the RNA EMSA Kit) plus 5% glycerol and 2 μg of transfer RNA (tRNA) at room temperature for 30 minutes after adding 5 μl of 5× loading buffer (50 mM HEPES pH 7.5, 80% glycerol, 0.25% bromophenol blue) after separation by native PAGE. The fluorescence signal was visualized by Odyssey Imaging Systems and quantified by Image Studio, and then the dissociation constant (K_d) was calculated with nonlinear curve fitting (function one-site specific binding) using GraphPad Prism with Y = B_max × X / (K_d + X), where Y is the ratio of [RNA–protein] / ([free RNA] + [RNA–protein]), X is the input protein concentration and B_max is set to 1.

Identification of ktRNAs from RIP-PEN-seq or PEN-seq data

We clustered the overlapping paired-end reads that mapped to the genomes. Specifically, we defined the start and end sites with the highest abundance in the cluster as the transcription start site (TSS) and transcription terminal site (TTS), respectively. The sequences between TSSs and TTSs co-occurring within 500 bp were extracted as candidate transcripts. Each candidate transcript with a precise TSS and TTS was input to kturnSeeker. A new software program, kturnSeeker, was developed to discover RNAs with fktRNAs and bktRNAs. kturnSeeker searches A•G, G•A and G•C base pairs (the NC-stem), the 3-nt bulge and a section with regular base pairing (the C-stem) step by step in candidate sequences and scores the C-stem with a strict scoring scheme (a G•C pair is bonus 3, an A•U pair is bonus 2 and mismatch is minus 5). Only those with a total score (the C-stem) greater than 7 were retained as candidate ktRNAs. kturnSeeker can, for the first time, screen bktRNAs as well as fktRNAs by reverse searching the K-turn structure. To identify novel ktRNAs, ktRNAs overlapping with known annotations were discarded. Moreover, all novel ktRNAs must be present in at least two sequencing libraries.

Annotation of ktRNAs

The genome sequence of humans (hg38) was downloaded from the UCSC Genome Browser site⁶¹. Human gene annotations were acquired from GENCODE⁶² in August 2018. The repeat elements in RepeatMasker were downloaded from the UCSC Genome Browser site⁶¹. The sequences and annotation data for canonical human and mouse box C/D snoRNAs were downloaded from snoRNA-LBME-db⁶³, deepBase⁶⁴, GENCODE⁶², snoRNAome^65,66 and RefSeq⁶⁷ in August 2018. All C/D RNAs were intersected with canonical C/D box snoRNAs using BEDTools software⁶⁸, and only C/D box RNAs that did not overlap with canonical C/D box snoRNAs were identified as novel ktRNAs. All novel ktRNAs were further intersected with canonical gene annotations using BEDTools software⁶⁸.

Identification of homologous bktRNA families in other species with different evolutionary clades

We constructed homologous bktRNA families based on DNA sequence similarity. To identify evolutionary conservation of human bktRNA families, we used the UCSC liftOver tool (version 396)⁶¹ and BLAST+ (version 2.11.0)⁶⁹ to obtain the homologous regions and sequences of the human bktRNAs in other species with different evolutionary clades. The homologous sequences from other species were input into kturnSeeker software to identify candidate bktRNAs. The genomes included the Pan troglodytes reference genome (chimpanzee, panTro6), the Pan paniscus reference genome (bonobo, panPan3), the Gorilla gorilla gorilla reference genome (gorilla, gorGor6), the Pongo pygmaeus abelii reference genome (orangutan, ponAbe3), the Macaca mulatta reference genome (rhesus, rheMac10), the Mus musculus reference genome (mouse, mm10), the Monodelphis domestica reference genome (opossum, monDom5), the Gallus gallus reference genome (chicken, galGal6) and the Danio rerio reference genome (zebrafish, danRer10).

Processing of 15.5K and FBL CLASH sequencing data

First, Cutadapt (version 2.8)⁷⁰ was used to cut the adapters of the paired-end reads. The paired-end reads without adapters were further merged using FLASH software (version 1.2.11)⁷¹ with the following parameters: -z -m 10 -M 135 -x 0.25. fastx_collapser software was used to collapse the reads with the default parameters for removal of PCR duplicates. Reads with fully identical sequences, including the same barcode and insert fragment, were marked as PCR duplicates, and only one of the identical sequences was retained for subsequent analysis. Reads shorter than 30 nt were removed.

Differential splicing analysis

Adaptor sequences were trimmed from raw RNA-seq data using Cutadapt (version 2.8)⁷⁰. The clean reads were mapped to the reference genome (hg38) using STAR software (2.7.1a)⁷² with the genome index built from GENCODE version 32 (ref. ⁷³) annotation and with the following additional parameters: –alignEndsType EndToEnd–outSAMstrandField intronMotif–outFilterMismatchNmax 5. The genomic coordinates of introns were extracted from GENCODE version 32 (ref. ⁷³) annotation, and duplicates were further removed. For each intron, the numbers of reads that supported either the spliced isoform or the retained isoform were counted. Reads that skipped the intron and spanned at least 10 bp in both exons were thought to support the spliced isoform, and reads that included the intron with at least 10 bp overlapping between the exon side and the intron side were thought to support the retained isoform. The read counts were normalized by effective length, defined as the number of possible positions for supporting reads, to estimate the abundance of the spliced isoform and the retained isoform. Then, the inclusion level was calculated as the abundance of the retained isoform divided by the abundance of both isoforms. The statistical method and codes from rMATS⁷⁴ were used to look for introns with significant inclusion level differences between WT and KO cells. For each pair of compared groups, introns with a sum of the number of reads supporting spliced isoforms or retained isoforms less than 20 in either sample were filtered before statistical testing to remove potential false positives. The annotation of minor introns was downloaded from MIDB³⁸. The P value on the cumulative plots of inclusion level differences were calculated using two-sided Mann–Whitney–Wilcoxon tests.

Statistics and reproducibility

Data are presented as the mean values ± s.e.m. We used paired Student’s t-tests for comparisons between the two experimental groups. All statistics were performed using GraphPad Prism 6 or R (3.6.3). The numbers of biological replicates for the experiment are indicated in the figure legends. At least four independent experiments of RIP-PEN-seq were performed for both HEK293T-FLAG-15.5K and Hepa1-6-FLAG-15.5K cells. Four independent experiments of RIP-PEN-SHAPE-MaP were performed for HEK293T-FLAG-15.5K cells. The PEN-seq analysis for cellular fractions in HEK293T and HCT116 cells was performed four times, respectively. Two or three independent sets of HEK293T, HCT116, U-87 MG, HepG2, HeLa and K562 RNA samples were used for PEN-seq analysis. All images of polyacrylamide gels, agarose gels and northern blots are representative of at least two biological replicates. No statistical method was used to predetermine the sample size.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All sequencing data that support the findings of this study have been deposited in the National Center for Biotechnology Information’s Gene Expression Omnibus with the following accession numbers: GSE160970 for all HEK293T 15.5K RIP-PEN-seq; GSE182757 for all Hepa1-6 RIP-PEN-seq; GSE160636 for FBL and 15.5K CLASH-seq in HEK293T cells; GSE160887 for PEN-seq in HCT116, U-87 MG, Hela, HEK293T, HepG2 and K562 cells; GSE186849 for PEN-seq in 15.5K knockdown HEK293T cells; GSE182843 for PEN-seq in HEK293T and HCT116 cell fractions; GSE160515 for RNA-seq in bktRNA1 KO HCT116 cells; GSE182830 for RNA-seq in ZCRB1 knockdown HCT116 cells; GSE182759 for RNA-seq in 15.5K knockdown HEK293T cells; and GSE220470 for RIP-PEN-SHAPE-MaP in HEK293T cells. All data are available in the manuscript and in Supplementary Information and Source data files. There are no restriction on data availability. Source data are provided with this paper.

Code availability

The program kturnSeeker was written in the C++ programming language and is available from GitHub with no restrictions or conditions on access: https://github.com/sysu-software/kturnSeeker.

References

Butcher, S. E. & Pyle, A. M. The molecular interactions that stabilize RNA tertiary structure: RNA motifs, patterns, and networks. ACC Chem. Res. 44, 1302–1311 (2011).
Article CAS PubMed Google Scholar
Klein, D. J., Schmeing, T. M., Moore, P. B. & Steitz, T. A. The kink-turn: a new RNA secondary structure motif. EMBO J. 20, 4214–4221 (2001).
Article CAS PubMed PubMed Central Google Scholar
Lilley, D. M. The K-turn motif in riboswitches and other RNA species. Biochim. Biophys. Acta 1839, 995–1004 (2014).
Article CAS PubMed Google Scholar
Schroeder, K. T., McPhee, S. A., Ouellet, J. & Lilley, D. M. A structural database for k-turn motifs in RNA. RNA 16, 1463–1468 (2010).
Article CAS PubMed PubMed Central Google Scholar
Rozhdestvensky, T. S. et al. Binding of L7Ae protein to the K-turn of archaeal snoRNAs: a shared RNA binding motif for C/D and H/ACA box snoRNAs in Archaea. Nucleic Acids Res. 31, 869–877 (2003).
Article CAS PubMed PubMed Central Google Scholar
Baird, N. J., Zhang, J., Hamma, T. & Ferré-D’Amaré, A. R. YbxF and YlxQ are bacterial homologs of L7Ae and bind K-turns but not K-loops. RNA 18, 759–770 (2012).
Article CAS PubMed PubMed Central Google Scholar
Nottrott, S. et al. Functional interaction of a novel 15.5kD [U4/U6.U5] tri-snRNP protein with the 5′ stem-loop of U4 snRNA. EMBO J. 18, 6119–6133 (1999).
Article CAS PubMed PubMed Central Google Scholar
Szewczak, L. B. W., DeGregorio, S. J., Strobel, S. A. & Steitz, J. A. Exclusive interaction of the 15.5 kD protein with the terminal box C/D motif of a methylation guide snoRNP. Chem. Biol. 9, 1095–1107 (2002).
Article CAS PubMed Google Scholar
Szewczak, L. B. W., Gabrielsen, J. S., Degregorio, S. J., Strobel, S. A. & Steitz, J. A. Molecular basis for RNA kink-turn recognition by the h15.5K small RNP protein. RNA 11, 1407–1419 (2005).
Article CAS PubMed PubMed Central Google Scholar
Chawla, M., Oliva, R., Bujnicki, J. M. & Cavallo, L. An atlas of RNA base pairs involving modified nucleobases with optimal geometries and accurate energies. Nucleic Acids Res. 43, 6714–6729 (2015).
Article CAS PubMed PubMed Central Google Scholar
Polikanov, Y. S., Melnikov, S. V., Söll, D. & Steitz, T. A. Structural insights into the role of rRNA modifications in protein synthesis and ribosome assembly. Nat. Struct. Mol. Biol. 22, 342–344 (2015).
Article CAS PubMed PubMed Central Google Scholar
Liu, S., Ghalei, H., Lührmann, R. & Wahl, M. C. Structural basis for the dual U4 and U4atac snRNA-binding specificity of spliceosomal protein hPrp31. RNA 17, 1655–1663 (2011).
Article CAS PubMed PubMed Central Google Scholar
Vidovic, I., Nottrott, S., Hartmuth, K., Lührmann, R. & Ficner, R. Crystal structure of the spliceosomal 15.5kD protein bound to a U4 snRNA fragment. Mol. Cell 6, 1331–1342 (2000).
Article CAS PubMed Google Scholar
Jafarifar, F., Dietrich, R. C., Hiznay, J. M. & Padgett, R. A. Biochemical defects in minor spliceosome function in the developmental disorder MOPD I. RNA 20, 1078–1089 (2014).
Article CAS PubMed PubMed Central Google Scholar
Edery, P. et al. Association of TALS developmental disorder with defect in minor splicing component U4atac snRNA. Science 332, 240–243 (2011).
Article CAS PubMed Google Scholar
He, H. et al. Mutations in U4atac snRNA, a component of the minor spliceosome, in the developmental disorder MOPD I. Science 332, 238–240 (2011).
Article CAS PubMed PubMed Central Google Scholar
Merico, D. et al. Compound heterozygous mutations in the noncoding RNU4ATAC cause Roifman Syndrome by disrupting minor intron splicing. Nat. Commun. 6, 8718 (2015).
Article CAS PubMed Google Scholar
Farach, L. S. et al. The expanding phenotype of RNU4ATAC pathogenic variants to Lowry Wood syndrome. Am. J. Med. Genet. A 176, 465–469 (2018).
Article CAS PubMed Google Scholar
Lapinaite, A. et al. The structure of the box C/D enzyme reveals regulation of RNA methylation. Nature 502, 519–523 (2013).
Article CAS PubMed Google Scholar
Siegfried, N. A., Busan, S., Rice, G. M., Nelson, J. A. E. & Weeks, K. M. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat. Methods 11, 959–965 (2014).
Article CAS PubMed PubMed Central Google Scholar
Luo, Q.-J. et al. RNA structure probing reveals the structural basis of Dicer binding and cleavage. Nat. Commun. 12, 3397 (2021).
Article CAS PubMed PubMed Central Google Scholar
McPhee, S. A., Huang, L. & Lilley, D. M. A critical base pair in k-turns that confers folding characteristics and correlates with biological function. Nat. Commun. 5, 5127 (2014).
Article CAS PubMed Google Scholar
Huang, L. et al. Structure and folding of four putative kink turns identified in structured RNA species in a test of structural prediction rules. Nucleic Acids Res. 49, 5916–5924 (2021).
Article CAS PubMed PubMed Central Google Scholar
Liu, J. & Lilley, D. M. The role of specific 2′-hydroxyl groups in the stabilization of the folded conformation of kink-turn RNA. RNA 13, 200–210 (2007).
Article CAS PubMed PubMed Central Google Scholar
Huang, L., Wang, J. & Lilley, D. M. A critical base pair in k-turns determines the conformational class adopted, and correlates with biological function. Nucleic Acids Res. 44, 5390–5398 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ashraf, S., Huang, L. & Lilley, D. M. J. Effect of methylation of adenine N⁶ on kink turn structure depends on location. RNA Biol. 16, 1377–1385 (2019).
Article PubMed PubMed Central Google Scholar
Huang, L., Ashraf, S., Wang, J. & Lilley, D. M. Control of box C/D snoRNP assembly by N⁶-methylation of adenine. EMBO Rep. 18, 1631–1645 (2017).
Article CAS PubMed PubMed Central Google Scholar
Xuan, J. J. et al. RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Res. 46, D327–D334 (2018).
Article CAS PubMed Google Scholar
Zhipeng, L. et al. RNA duplex map in living cells reveals higher-order transcriptome structure. Cell 165, 1267–1279 (2016).
Article Google Scholar
Tycowski, K. T., Aab, A. & Steitz, J. A. Guide RNAs with 5′ caps and novel box C/D snoRNA-like domains for modification of snRNAs in metazoa. Curr. Biol. 14, 1985–1995 (2004).
Article CAS PubMed Google Scholar
Cologne, A. et al. New insights into minor splicing-a transcriptomic analysis of cells derived from TALS patients. RNA 25, 1130–1149 (2019).
Article CAS PubMed PubMed Central Google Scholar
Will, C. L. et al. The human 18S U11/U12 snRNP contains a set of novel proteins not found in the U2-dependent spliceosome. RNA 10, 929–941 (2004).
Article CAS PubMed PubMed Central Google Scholar
Dominguez, D. et al. Sequence, structure, and context preferences of human RNA binding proteins. Mol. Cell 70, 854–867 e859 (2018).
Article CAS PubMed PubMed Central Google Scholar
Jolma, A. et al. Binding specificities of human RNA-binding proteins toward structured and linear RNA sequences. Genome Res. 30, 962–973 (2020).
Article CAS PubMed PubMed Central Google Scholar
Engreitz, J. M. et al. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539, 452–455 (2016).
Article CAS PubMed PubMed Central Google Scholar
Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).
Article CAS PubMed PubMed Central Google Scholar
Alioto, T. S. U12DB: a database of orthologous U12-type spliceosomal introns. Nucleic Acids Res. 35, D110–D115 (2007).
Article CAS PubMed Google Scholar
Olthof, A. M., Hyatt, K. C. & Kanadia, R. N. Minor intron splicing revisited: identification of new minor intron-containing genes and tissue-dependent retention and alternative splicing of minor introns. BMC Genomics 20, 686 (2019).
Article PubMed PubMed Central Google Scholar
Madan, V. et al. Aberrant splicing of U12-type introns is the hallmark of ZRSR2 mutant myelodysplastic syndrome. Nat. Commun. 6, 6042 (2015).
Article CAS PubMed Google Scholar
Reber, S. et al. Minor intron splicing is regulated by FUS and affected by ALS-associated FUS mutants. EMBO J. 35, 1504–1521 (2016).
Article CAS PubMed PubMed Central Google Scholar
Verberne, E. A., Faries, S., Mannens, M., Postma, A. V. & van Haelst, M. M. Expanding the phenotype of biallelic RNPC3 variants associated with growth hormone deficiency. Am. J. Med. Genet. A 182, 1952–1956 (2020).
Article CAS PubMed PubMed Central Google Scholar
Argente, J. et al. Defective minor spliceosome mRNA processing results in isolated familial growth hormone deficiency. EMBO Mol. Med. 6, 299–306 (2014).
Article CAS PubMed PubMed Central Google Scholar
Martos-Moreno, G. et al. Response to growth hormone in patients with RNPC3 mutations. EMBO Mol. Med. 10, e9143 (2018).
Article PubMed PubMed Central Google Scholar
Elsaid, M. F. et al. Mutation in noncoding RNA RNU12 causes early onset cerebellar ataxia. Ann. Neurol. 81, 68–78 (2017).
Article CAS PubMed Google Scholar
Burns, R. et al. Homozygous splice mutation in CWF19L1 in a Turkish family with recessive ataxia syndrome. Neurology 83, 2175–2182 (2014).
Article CAS PubMed PubMed Central Google Scholar
Evers, C. et al. Exome sequencing reveals a novel CWF19L1 mutation associated with intellectual disability and cerebellar atrophy. Am. J. Med. Genet. A 170, 1502–1509 (2016).
Article CAS PubMed Google Scholar
Nguyen, M. et al. Pathogenic CWF19L1 variants as a novel cause of autosomal recessive cerebellar ataxia and atrophy. Eur. J. Hum. Genet. 24, 619–622 (2016).
Article CAS PubMed Google Scholar
Bailey, T. L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).
CAS PubMed Google Scholar
Kramer, K. et al. Photo-cross-linking and high-resolution mass spectrometry for assignment of RNA-binding sites in RNA-binding proteins. Nat. Methods 11, 1064–1070 (2014).
Article CAS PubMed PubMed Central Google Scholar
Morlan, J. D., Qu, K. & Sinicropi, D. V. Selective depletion of rRNA enables whole transcriptome profiling of archival fixed tissue. PLoS ONE 7, e42882 (2012).
Article CAS PubMed PubMed Central Google Scholar
Adiconis, X. et al. Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat. Methods 10, 623–629 (2013).
Article CAS PubMed PubMed Central Google Scholar
Zinshteyn, B., Wangen, J. R., Hua, B. & Green, R. Nuclease-mediated depletion biases in ribosome footprint profiling libraries. RNA 26, 1481–1488 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gillen, A. E., Yamamoto, T. M., Kline, E., Hesselberth, J. R. & Kabos, P. Improvements to the HITS-CLIP protocol eliminate widespread mispriming artifacts. BMC Genomics 17, 338 (2016).
Article PubMed PubMed Central Google Scholar
Keene, J. D., Komisarow, J. M. & Friedersdorf, M. B. RIP-Chip: the isolation and identification of mRNAs, microRNAs and protein components of ribonucleoprotein complexes from cell extracts. Nat. Protoc. 1, 302–307 (2006).
Article CAS PubMed Google Scholar
Deng, B. et al. An LTR retrotransposon-derived lncRNA interacts with RNF169 to promote homologous recombination. EMBO Rep. 20, e47650 (2019).
Article CAS PubMed PubMed Central Google Scholar
Spitale, R. C. et al. Structural imprints in vivo decode RNA regulatory mechanisms. Nature 519, 486–490 (2015).
Article CAS PubMed PubMed Central Google Scholar
Phelps, W. A., Carlson, A. E. & Lee, M. T. Optimized design of antisense oligomers for targeted rRNA depletion. Nucleic Acids Res. 49, e5 (2021).
Article CAS PubMed Google Scholar
Chu, C., Qu, K., Zhong, F. L., Artandi, S. E. & Chang, H. Y. Genomic maps of long noncoding RNA occupancy reveal principles of RNA-chromatin interactions. Mol. Cell 44, 667–678 (2011).
Article CAS PubMed PubMed Central Google Scholar
Percharde, M. et al. A LINE1-nucleolin partnership regulates early development and ESC identity. Cell 174, 391–405 (2018).
Article CAS PubMed PubMed Central Google Scholar
Chu, C. et al. Systematic discovery of Xist RNA binding proteins. Cell 161, 404–416 (2015).
Article CAS PubMed PubMed Central Google Scholar
Haeussler, M. et al. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res. 47, D853–d858 (2019).
Article CAS PubMed Google Scholar
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lestrade, L. & Weber, M. J. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res. 34, D158–D162 (2006).
Article CAS PubMed Google Scholar
Yang, J. H., Shao, P., Zhou, H., Chen, Y. Q. & Qu, L. H. deepBase: a database for deeply annotating and mining deep sequencing data. Nucleic Acids Res. 38, D123–D130 (2010).
Article CAS PubMed Google Scholar
Jorjani, H. et al. An updated human snoRNAome. Nucleic Acids Res. 44, 5068–5082 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kishore, S. et al. Insights into snoRNA biogenesis and processing from PAR-CLIP of snoRNA core proteins and small RNA sequencing. Genome Biol. 14, R45 (2013).
Article PubMed PubMed Central Google Scholar
Pruitt, K. D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–D763 (2014).
Article CAS PubMed Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Article PubMed PubMed Central Google Scholar
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).
Article Google Scholar
Magoc, T. & Salzberg, S. L. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957–2963 (2011).
Article CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).
Article CAS PubMed Google Scholar
Shen, S. et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-seq data. Proc. Natl Acad. Sci. USA 111, E5593–E5601 (2014).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank L. Huang from Sun Yat-sen Memorial Hospital for valuable suggestions on this manuscript. We thank Y. Zhang for sharing Sleeping Beauty transposon system. We thank all the staff from Sun Yat-sen University for their support and hard work during the COVID-19 pandemic. This work was supported, in part, by the National Key R&D Program of China (2019YFA0802202 (to J.Y.) and 2022YFA1303300 (to J.Y.)); the National Natural Science Foundation of China (32225011 (to J.Y.), 91940304 (to J.Y.), 31971228 (to J.Y.), 31770879 (to J.Y.), 31970604 (to L.Q.), 31900903 (to B.L.) and 32100467(to S.L.)); the Youth Science and Technology Innovation Talent of Guangdong TeZhi Plan (2019TQ05Y181 (to J.Y.)); funds from Guangzhou City (202002030351 (to J.Y.)); and Fundamental Research Funds for the Central Universities, Sun Yat-sen University (20lgpy112 (to B.L.) and 2021qntd26 (to B.L.)).

Author information

These authors contributed equally: Bin Li, Shurong Liu, Wujian Zheng, Anrui Liu, Peng Yu.

Authors and Affiliations

MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
Bin Li, Shurong Liu, Wujian Zheng, Anrui Liu, Peng Yu, Di Wu, Jie Zhou, Ping Zhang, Chang Liu, Qiao Lin, Jiayi Ye, Qiaojuan Huang, Hui Zhou, Lianghu Qu & Jianhua Yang
The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, China
Wujian Zheng, Anrui Liu, Di Wu, Jie Zhou, Qiao Lin, Jiayi Ye, Simeng He & Jianhua Yang
Department of Systems Biology, Beckman Research Institute of City of Hope, Monrovia, CA, USA
Jianjun Chen

Authors

Bin Li
View author publications
You can also search for this author in PubMed Google Scholar
Shurong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wujian Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Anrui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Di Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ping Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qiao Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jiayi Ye
View author publications
You can also search for this author in PubMed Google Scholar
Simeng He
View author publications
You can also search for this author in PubMed Google Scholar
Qiaojuan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jianjun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lianghu Qu
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.Y., B.L. and L.Q. conceived and designed the entire project. J.Y. and L.Q. designed and supervised the research. B.L., S.L., W.Z., A.L., P.Y., D.W., J.Z., P.Z., C.L., Q.L., J.Y., S.H., Q.H., H.Z. and J.Y. performed the experiments and/or data analyses. J.Y., B.L. and A.L. performed the genome-wide or transcriptome-wide data analyses. J.Y., B.L., S.L. and L.Q. contributed reagents/analytic tools and/or grant support. J.C. provided helpful discussions. J.Y., L.Q., B.L., S.L. and A.L. wrote and revised the paper. All authors discussed the results and commented on the manuscript.

Corresponding authors

Correspondence to Lianghu Qu or Jianhua Yang.

Ethics declarations

Competing interests

J.C. is a scientific advisory board member of Race Oncology. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Characterization of the RIP-PEN-seq technique.

a, Diagram of RNase H-based rRNA depletion for the construction of RIP-PEN-seq library. b, Western blotting analysis of the overexpression of 15.5K-FLAG protein in HEK293T cells. GAPDH serves as the loading control. c, RIP from HEK293T cells was performed using anti-FLAG and IgG. d, e, Meta-analyses of the RIP-PEN-seq results for the start (d) and end (e) sites of the forward K-turn RNAs (box C/D ncRNAs) in HEK293T cells. f, Genome-browser plot of RIP-PEN-seq (coverage, green; 5’-start, red; 3’-end, purple) for representative forward K-turn RNAs (box C/D snoRNAs) in the introns of GAS5. g, Computational workflow for analysis of the RIP-PEN-seq sequencing data and identification of candidate transcripts. h, KturnSeeker core algorithm workflow. KturnSeeker was developed to identify and quantify forward (fktRNAs) and backward ktRNAs (bktRNAs) from RIP-PEN-seq data. KturnSeeker can screen bktRNAs as well as fktRNAs by reverse searching the K-turn structure. i, Gene type distribution of forward ktRNAs identified from RIP-PEN-seq. CD-type represents C/D box-containing snoRNAs and scaRNAs. ACA-type represents snoRNAs or scaRNAs that only contain H/ACA boxes.

Source data

Extended Data Fig. 2 Structural characterization of bktRNAs.

a, The position and significance of motifs located within bktRNAs. The enriched motifs were identified by MEME software. The K-turn structural motif of bktRNAs is composed of two conserved sequence motifs: a CUGA motif often 4 nt downstream of the 5’ end and a UGAUG motif often 2 nt upstream of the 3’ end. The position p-value is defined as the probability that a random sequence would have a motif match score greater or equal to the sequence under test. b, Another twelve novel bktRNAs discovered from RIP-PEN-seq data. All novel bktRNAs had a CUGA motif that was often 4 nt downstream of the 5’ end and a UGAUG motif that was often 2 nt upstream of the 3’ end. These two sequence motifs were located within the K-turn structural motifs of bktRNAs. The 5’ motif (CUGA) and 3’ motif (UGAUG) are marked with black rectangles. The non-canonical A•G and G•A base pairs and the mismatch in the backward K-turn structure are also marked. c, Schematic overview of 15.5 K RIP-PEN-SHAPE-MaP. d, Secondary structure of consensus fktRNA (left panel, n = 98). Violin plots displaying the SHAPE reactivity across the forward K-turn structure (including 5’ C stem, internal (internal loop), 5’ NC stem, Loop, 3’ NC stem, and 3’ C stem), averaged across all known box C/D snoRNAs (right panel). The boxplots indicate the median and the upper and lower quartiles. e, The predicted secondary structure (upper panel) and SHAPE reactivity signal (lower panel) on fktRNA SNORD102 (also known as U102). The forward K-turn structure is indicated in the structure figure. The NC-stem and C-stem are marked with black and red underlines in the bar plot, respectively. The SHAPE reactivity signal was determined by RIP-PEN-SHAPE-MaP in this study.

Extended Data Fig. 3 The backward K-turn sequence composition, predicted functions and tissue-specific expression profile of bktRNAs.

a, The secondary structure of consensus forward K-turn RNA (fktRNA) and backward K-turn RNA (bktRNA). The nucleotide positions in the K-turn structure are named according to the nomenclature rules for the forward K-turn structure. b, Matrix plot showing the number of human bktRNAs with the indicated nucleotide in the 3b:3n sequences. c, Number of human bktRNAs with the four possible Watson-Crick base pairs in the -1b:-1n position. d, Matrix plot showing the number of mouse bktRNAs with the indicated nucleotide in the 3b:3n sequences. e, Number of mouse bktRNAs with the four possible Watson-Crick base pairs in the -1b:-1n position. f, Number of bktRNAs with or without m⁶A modification in humans and mice. g, Number of bktRNAs with or without m⁶A modification at the 1n position in humans and mice. h, Enrichment analysis of the bktRNA host protein-coding genes by Metascape software. i, Tissue-specific expression profiles of bktRNAs. The expression levels of bktRNAs are displayed in the rows and the tissues are shown in the columns. The rows and columns are sorted based on k-means clustering of bktRNA genes. The colour intensity represents the tissue-specific score (JS score) as calculated for each bktRNA using the csSpecificity function. Representative bktRNAs are indicated in the right panel.

Extended Data Fig. 4 Genomic characterization, expression, conservation, and secondary structure of bktRNA1.

a, Genome-browser plot of RIP-PEN-seq (coverage, blue; 5’-start, red; 3’-end, yellow) for bktRNA1, as well as the evolutionary conservation across 100 vertebrates (green). b, Secondary structure of bktRNA1 in the human genome was predicted by R-scape software. The SHAPE reactivities for each nucleotide were mapped to secondary structures using R2R software. The box H/ACA domain is indicated with a black dashed box, and the backward K-turn structure and the potential K-turn-like structure are marked with green dashed boxes. The NC stem and C stem are indicated with black lines. The blue boxes show the representative motifs. CAB, Cajal body box. c, The SHAPE reactivity signal on bktRNA1. The representative motifs are underlined in the bar plot. The SHAPE reactivity signal was determined by RIP-PEN-SHAPE-MaP in this study.

Extended Data Fig. 5 Secondary structures and subcellular localization of bktRNA1 and its interacting partner U12 snRNA.

a, Predicted conserved RNA structure of bktRNA1 determined by measuring pairwise covariations with R-scape software. The H/ACA domain is indicated with a black dashed box. The functional region paired with U12 snRNA is indicated with a blue dashed box. b, In situ co-localization of bktRNA1 with 15.5 K proteins and U12 snRNAs in HEK293T cells by fluorescent in situ hybridization (FISH) and immunofluorescence (IF) microscopy. White arrows indicate the signal detected by probes or antibody. c, In situ co-localization of bktRNA1 with 15.5 K proteins and U12 snRNAs in HCT116 cells by fluorescent in situ hybridization (FISH) and immunofluorescence (IF) microscopy. White arrows indicate the signal detected by probes or antibody.

Extended Data Fig. 6 Splicing efficiency analysis for wild-type and knockout bktRNA1.

a, Workflow for intron retention analysis in HCT116 and KO-bktRNA1 cells. b, Proportion of aberrantly retained U12- and U2-type introns (filtered by p < 0.05) in bktRNA1-deficient cells. c, Proportion of statistically significant changes (filtered by p < 0.05) in U12- and U2-type genes in bktRNA1-deficient cells. d, The ratio of spliced to unspliced pre-mRNA for U12-type introns was determined by qPCR in bktRNA1-deficient cells. Data are presented as mean values +/− SEM (n = 3, biological replicates), two-tailed, paired t-test. e, The ratio of spliced to unspliced pre-mRNA for U12-type and U2-type (GAPDH) introns was determined by qPCR in bktRNA1-rescued HCT116 KO-4 cells. f, The ratio of spliced to unspliced pre-mRNA for U12-type and U2-type (GAPDH) introns was determined by qPCR in SNORA12-rescued HCT116 KO-4 cells. Data are presented as mean values +/− SEM (n = 3, biological replicates), two-tailed, paired t-test. ns, no significance. g, The ratio of spliced to unspliced pre-mRNA for U12-type and U2-type (GAPDH) introns was determined by qPCR in artificial scaRNA-overexpressing HCT116 KO-4 cells. Data are presented as mean values +/- SEM (n = 3, biological replicates), two-tailed, paired t-test. ns, no significance.

Extended Data Fig. 7 Depletion of bktRNA1 affects the interaction between U12 and ZCRB1.

a, Western blots showing precipitation with each indicated antibody in wild-type (WT) and bktRNA1-deficient KO-4 (KO) cells. b, Native RIP was performed in wild-type (WT) and bktRNA1-deficient KO-4 (KO) cells using each indicated antibody or normal IgG antibody, after which qPCR was performed with primers recognizing minor splice snRNAs (U11, U12, U4atac, U5, U6atac). The percentage of RIP-enriched snRNAs was calculated relative to the input RNA. Data are presented as mean values +/− SEM (n = 3, biological replicates), two-tailed, paired t-test. ns, no significance. c, ZCRB1 RIP-enriched snRNAs were detected by Northern blotting in wild-type (WT) and bktRNA1-deficient KO-4 (KO) cells. U6 snRNA served as a negative control. d, ZCRB1 RIP-enriched snRNAs were detected by Northern blotting in bktRNA1-deficient and bktRNA1-rescued cells. U6 snRNA served as a negative control.

Source data

Extended Data Fig. 8 ZCRB1 knockdown affects U12-type intron splicing.

a, qPCR (upper panel) and western blotting analysis (lower panel) of Dox-inducible ZCRB1 knockdown in HCT116 cells. GAPDH was used as an internal reference gene for qPCR, and GAPDH served as the loading control for western blotting. Data are presented as mean values +/− SEM (n = 3, biological replicates), two-tailed, paired t-test. b, c, Dot plots displaying the intron retention levels in a representative pairwise analysis of ZCRB1 knockdown and negative control cells. The red dots represent U12-type introns, and the blue dots represent U2-type introns. d, Proportion of aberrantly retained U2- and U12- type introns in ZCRB1 knockdown cells. The red boxes represent retained introns, and the blue boxes represent unretained introns. e, f, Cumulative fraction of the inclusion level difference between U12-type and U2-type introns in ZCRB1 knockdown and negative control cells. The P value on the cumulative plots of inclusion level differences were calculated using a two-sided Mann-Whitney-Wilcoxon test. g, Venn diagram showing the numbers of overlapping retained introns across four bktRNA1-deficient HCT116 cell lines and ZCRB1 knockdown cells. h, CCK-8 assay of HCT116 cells with bktRNA1 knockout. Data are presented as mean values +/− SEM (n = 3, biological replicates), two-tailed, paired t-test. i, Colony formation assay of HCT116 cells with bktRNA1 knockout. j, Quantitative analysis of colony formation assay in the indicated lines. Data are presented as mean values +/- SEM (n = 3, biological replicates), two-tailed, paired t-test. k, CCK-8 assay of HCT116 cells with ZCRB1 knockdown. Data are presented as mean values +/− SEM (n = 3, biological replicates), two-tailed, paired t-test. l, Colony formation assay of HCT116 cells with ZCRB1 knockdown. m, Quantitative analysis of colony formation assay in the indicated lines. Data are presented as mean values +/− SEM (n = 3, biological replicates), two-tailed, paired t-test.

Source data

Supplementary information

Supplementary Information

Supplementary Notes 1–3, Supplementary Figs. 1–15, Supplementary Methods and captions for Supplementary Tables.

Reporting Summary

Supplementary Table

Supplementary Tables 1–9; combined tables are separated by tabs.

Supplementary Data

Statistical Source Data for Supplementary Figs. 8, 11, 13 and 14.

Source data

Source Data Figures

Statistical Source Data for Figs. 3–6.

Source Data Extended Data Figures

Statistical Source Data for Extended Data Figs. 6–8.

Source Data Fig. 3

Unprocessed northern blots and gels.

Source Data Fig. 4

Unprocessed gels.

Source Data Fig. 5

Unprocessed western blots and gels.

Source Data Extended Data Fig. 1

Unprocessed western blots.

Source Data Extended Data Fig. 7

Unprocessed western blots and northern blots.

Source Data Extended Data Fig. 8

Unprocessed western blots.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, B., Liu, S., Zheng, W. et al. RIP-PEN-seq identifies a class of kink-turn RNAs as splicing regulators. Nat Biotechnol 42, 119–131 (2024). https://doi.org/10.1038/s41587-023-01749-0

Download citation

Received: 02 August 2022
Accepted: 13 March 2023
Published: 10 April 2023
Issue Date: January 2024
DOI: https://doi.org/10.1038/s41587-023-01749-0

This article is cited by

NAP-seq reveals multiple classes of structured noncoding RNAs with regulatory functions
- Shurong Liu
- Junhong Huang
- Jianhua Yang
Nature Communications (2024)
A twist in the tale: global insights into archaeal box C/D RNA-guided 2′-O-methylation
- Bin Li
- Jianhua Yang
Science China Life Sciences (2024)

Subjects

Abstract

Similar content being viewed by others

Main

Results

bktRNAs with consensus motifs identified by RIP-PEN-seq

bktRNAs have complex folding, expression and evolution patterns

U12 snRNA is the direct target of bktRNA1

bktRNA1 is indispensable for 2′-O-methylation of U12 snRNA

Depletion of bktRNA1 causes dysregulation of U12-type introns

bktRNA1 modulates the recruitment of ZCRB1 to minor spliceosome

bktRNAs regulate the splicing of local introns

Discussion

Methods

Construction of RIP-PEN-seq and PEN-seq libraries

RIP-PEN-SHAPE-MaP library construction

Mapping of 2′-O-methylation residues

ChIRP

Expression and purification of ZCRB1 protein

RNA electrophoretic mobility shift assays

Identification of ktRNAs from RIP-PEN-seq or PEN-seq data

Annotation of ktRNAs

Identification of homologous bktRNA families in other species with different evolutionary clades

Processing of 15.5K and FBL CLASH sequencing data

Differential splicing analysis

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links