Determination of isoform-specific RNA structure with nanopore long reads

Aw, Jong Ghut Ashley; Lim, Shaun W.; Wang, Jia Xu; Lambert, Finnlay R. P.; Tan, Wen Ting; Shen, Yang; Zhang, Yu; Kaewsapsak, Pornchai; Li, Chenhao; Ng, Sarah B.; Vardy, Leah A.; Tan, Meng How; Nagarajan, Niranjan; Wan, Yue

doi:10.1038/s41587-020-0712-z

Article
Published: 26 October 2020

Determination of isoform-specific RNA structure with nanopore long reads

Nature Biotechnology volume 39, pages 336–346 (2021)Cite this article

16k Accesses
60 Citations
139 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 23 March 2021

An Author Correction to this article was published on 12 November 2020

This article has been updated

Abstract

Current methods for determining RNA structure with short-read sequencing cannot capture most differences between distinct transcript isoforms. Here we present RNA structure analysis using nanopore sequencing (PORE-cupine), which combines structure probing using chemical modifications with direct long-read RNA sequencing and machine learning to detect secondary structures in cellular RNAs. PORE-cupine also captures global structural features, such as RNA-binding-protein binding sites and reactivity differences at single-nucleotide variants. We show that shared sequences in different transcript isoforms of the same gene can fold into different structures, highlighting the importance of long-read sequencing for obtaining phase information. We also demonstrate that structural differences between transcript isoforms of the same gene lead to differences in translation efficiency. By revealing isoform-specific RNA structure, PORE-cupine will deepen understanding of the role of structures in controlling gene regulation.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: PORE-cupine leverages machine learning to profile RNA secondary structure.**

**Fig. 2: PORE-cupine performs accurately in vitro and in vivo and captures riboswitch structural dynamics.**

**Fig. 3: Genome-wide structure analysis in hESCs with PORE-cupine confirms known global structural features.**

**Fig. 4: PORE-cupine reveals structural differences in shared exons between alternative isoforms.**

**Fig. 5: Structure differences between isoforms are correlated with translation efficiency.**

Single-cell long-read sequencing-based mapping reveals specialized splicing patterns in developing and adult mouse and human brain

Article Open access 09 April 2024

Improving prime editing with an endogenous small RNA-binding protein

Article Open access 03 April 2024

An open source knowledge graph ecosystem for the life sciences

Article Open access 11 April 2024

Data availability

Raw sequencing data and reactivity profiles can be downloaded from the Gene Expression Omnibus under accession number GSE133361. Source data are provided with this paper.

Code availability

Source code for all scripts (R version 3.4.1) and commands used for analysis can be found at http://github.com/awjga/PORE-cupine.

Change history

12 November 2020
A Correction to this paper has been published: https://doi.org/10.1038/s41587-020-00755-w
23 March 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41587-021-00889-5

References

Wan, Y., Kertesz, M., Spitale, R. C., Segal, E. & Chang, H. Y. Understanding the transcriptome through RNA structure. Nat. Rev. Genet. 12, 641–655 (2011).
CAS PubMed Google Scholar
Kertesz, M. et al. Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103–107 (2010).
CAS PubMed Google Scholar
Wan, Y. et al. Landscape and variation of RNA secondary structure across the human transcriptome. Nature 505, 706–709 (2014).
CAS PubMed PubMed Central Google Scholar
Wan, Y. et al. Genome-wide measurement of RNA folding energies. Mol. Cell 48, 169–181 (2012).
PubMed PubMed Central Google Scholar
Siegfried, N. A., Busan, S., Rice, G. M., Nelson, J. A. & Weeks, K. M. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat. Methods 11, 959–965 (2014).
CAS PubMed PubMed Central Google Scholar
Spitale, R. C. et al. Structural imprints in vivo decode RNA regulatory mechanisms. Nature 519, 486–490 (2015).
CAS PubMed PubMed Central Google Scholar
Lucks, J. B. et al. Multiplexed RNA structure characterization with selective 2′-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc. Natl Acad. Sci. USA 108, 11063–11068 (2011).
CAS PubMed PubMed Central Google Scholar
Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. S. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705 (2014).
CAS PubMed Google Scholar
Zubradt, M. et al. DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo. Nat. Methods 14, 75–82 (2017).
CAS PubMed Google Scholar
Ding, Y. et al. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505, 696–700 (2013).
PubMed Google Scholar
Strobel, E. J., Yu, A. M. & Lucks, J. B. High-throughput determination of RNA structures. Nat. Rev. Genet. 19, 615–634 (2018).
CAS PubMed PubMed Central Google Scholar
Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 31, 1009–1014 (2013).
CAS PubMed PubMed Central Google Scholar
Au, K. F. et al. Characterization of the human ESC transcriptome by hybrid sequencing. Proc. Natl Acad. Sci. USA 110, E4821–E4830 (2013).
CAS PubMed PubMed Central Google Scholar
Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).
CAS PubMed PubMed Central Google Scholar
Tilgner, H. et al. Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events. Nat. Biotechnol. 33, 736–742 (2015).
CAS PubMed PubMed Central Google Scholar
Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206 (2018).
CAS PubMed Google Scholar
Oikonomopoulos, S., Wang, Y. C., Djambazian, H., Badescu, D. & Ragoussis, J. Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations. Sci. Rep. 6, 31602 (2016).
CAS PubMed PubMed Central Google Scholar
Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).
CAS PubMed Google Scholar
Parker, M. T. et al. Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m⁶A modification. eLife 9, e49658 (2020).
Liu, H. et al. Accurate detection of m⁶A RNA modifications in native RNA sequences. Nat. Commun. 10, 4079 (2019).
PubMed PubMed Central Google Scholar
Weeks, K. M. Advances in RNA structure analysis by chemical probing. Curr. Opin. Struct. Biol. 20, 295–304 (2010).
CAS PubMed PubMed Central Google Scholar
Spitale, R. C. et al. RNA SHAPE analysis in living cells. Nat. Chem. Biol. 9, 18–20 (2013).
CAS PubMed Google Scholar
Sachsenmaier, N., Handl, S., Debeljak, F. & Waldsich, C. Mapping RNA structure in vitro using nucleobase-specific probes. Methods Mol. Biol. 1086, 79–94 (2014).
CAS PubMed Google Scholar
Guo, F., Gooding, A. R. & Cech, T. R. Structure of the Tetrahymena ribozyme: base triple sandwich and metal ion at the active site. Mol. Cell 16, 351–362 (2004).
CAS PubMed Google Scholar
Winkler, W., Nahvi, A. & Breaker, R. R. Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature 419, 952–956 (2002).
CAS PubMed Google Scholar
Jambhekar, A. et al. Unbiased selection of localization elements reveals cis-acting determinants of mRNA bud localization in Saccharomyces cerevisiae. Proc. Natl Acad. Sci. USA 102, 18005–18010 (2005).
CAS PubMed PubMed Central Google Scholar
Sexton, A. N., Wang, P. Y., Rutenberg-Schoenberg, M. & Simon, M. D. Interpreting reverse transcriptase termination and mutation events for greater insight into the chemical probing of RNA. Biochemistry 56, 4713–4721 (2017).
CAS PubMed Google Scholar
Li, F. et al. Global analysis of RNA secondary structure in two metazoans. Cell. Rep. 1, 69–82 (2012).
CAS PubMed Google Scholar
Sun, L. et al. RNA structure maps across mammalian cellular compartments. Nat. Struct. Mol. Biol. 26, 322–330 (2019).
CAS PubMed PubMed Central Google Scholar
Wilbert, M. L. et al. LIN28 binds messenger RNAs at GGAGA motifs and regulates splicing factor abundance. Mol. Cell 48, 195–206 (2012).
CAS PubMed PubMed Central Google Scholar
Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
CAS PubMed PubMed Central Google Scholar
Pan, Q., Shai, O., Lee, L. J., Frey, B. J. & Blencowe, B. J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).
CAS PubMed Google Scholar
Moqtaderi, Z., Geisberg, J. V. & Struhl, K. Extensive structural differences of closely related 3′ mRNA isoforms: links to Pab1 binding and mRNA stability. Mol. Cell 72, 849–861 (2018).
CAS PubMed PubMed Central Google Scholar
Floor, S. N. & Doudna, J. A. Tunable protein synthesis by transcript isoforms in human cells. eLife 5, e10921 (2016).
Aw, J. G. et al. In vivo mapping of eukaryotic RNA interactomes reveals principles of higher-order organization and regulation. Mol. Cell 62, 603–617 (2016).
CAS PubMed Google Scholar
Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
CAS PubMed PubMed Central Google Scholar
Mustoe, A. M. et al. Pervasive regulatory functions of mRNA structure revealed by high-resolution SHAPE probing. Cell 173, 181–195 (2018).
CAS PubMed PubMed Central Google Scholar
Das, R., Laederach, A., Pearlman, S. M., Herschlag, D. & Altman, R. B. SAFA: semi-automated footprinting analysis software for high-throughput quantification of nucleic acid footprinting experiments. RNA 11, 344–354 (2005).
CAS PubMed PubMed Central Google Scholar
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
CAS PubMed PubMed Central Google Scholar
Sovic, I. et al. Fast and sensitive mapping of nanopore sequencing reads with GraphMap. Nat. Commun. 7, 11307 (2016).
CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
PubMed PubMed Central Google Scholar
Shah, A., Qian, Y., Weyn-Vanhentenryck, S. M. & Zhang, C. CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data. Bioinformatics 33, 566–567 (2017).
CAS PubMed Google Scholar
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
CAS PubMed PubMed Central Google Scholar
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank members of the Wan and Tan labs and F. Yao, H. M. Loh, C. C. Khor and M. Sikic for helpful discussions. Y.W. is supported by funding from A*STAR (A*STAR investigatorship 1630700155), the National Research Foundation Singapore (NRF2019-NRF-ISF003-2970 and CRP21-2018-0101), the EMBO Young Investigatorship and the CIFAR global scholarship. F.R.P.L. is supported by a doctoral scholarship from the Warwick-A*STAR research attachment programme.

Author information

These authors contributed equally: Jong Ghut Ashley Aw, Shaun W. Lim, Jia Xu Wang.

Authors and Affiliations

Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, Singapore
Jong Ghut Ashley Aw, Shaun W. Lim, Jia Xu Wang, Finnlay R. P. Lambert, Wen Ting Tan, Yu Zhang, Pornchai Kaewsapsak, Meng How Tan & Yue Wan
Division of Biomedical Sciences, Warwick Medical School, University of Warwick, Coventry, UK
Finnlay R. P. Lambert
Computational and Systems Biology, Genome Institute of Singapore, A*STAR, Singapore, Singapore
Yang Shen, Chenhao Li & Niranjan Nagarajan
Genome Technologies Platform, Genome Institute of Singapore, A*STAR, Singapore, Singapore
Sarah B. Ng
Skin Research Institute of Singapore, A*STAR, Immunos, Singapore
Leah A. Vardy
School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore, Singapore
Meng How Tan
Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Niranjan Nagarajan & Yue Wan
School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
Yue Wan

Authors

Jong Ghut Ashley Aw
View author publications
You can also search for this author in PubMed Google Scholar
Shaun W. Lim
View author publications
You can also search for this author in PubMed Google Scholar
Jia Xu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Finnlay R. P. Lambert
View author publications
You can also search for this author in PubMed Google Scholar
Wen Ting Tan
View author publications
You can also search for this author in PubMed Google Scholar
Yang Shen
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Pornchai Kaewsapsak
View author publications
You can also search for this author in PubMed Google Scholar
Chenhao Li
View author publications
You can also search for this author in PubMed Google Scholar
Sarah B. Ng
View author publications
You can also search for this author in PubMed Google Scholar
Leah A. Vardy
View author publications
You can also search for this author in PubMed Google Scholar
Meng How Tan
View author publications
You can also search for this author in PubMed Google Scholar
Niranjan Nagarajan
View author publications
You can also search for this author in PubMed Google Scholar
Yue Wan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.W. conceived the project. Y.W., N.N., M.H.T., B.S.N. and L.V. designed the experiments and analysis. S.W.L. and J.X.W. performed the experiments with help from P.K. J.G.A.A. and Y.S. performed the computational analysis with help from C.L. and E.P.K. Y.W. organized and wrote the paper with J.G.A.A., S.W.L. and all other authors.

Corresponding authors

Correspondence to Niranjan Nagarajan or Yue Wan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Chemical structures of RNA structure probing compounds, associated reaction products, mapped length and statistics of error rates.

a, Chemical structures of RNA structure probing compounds. Side chains for the carbodiimide of CMCT are highlighted and abbreviated as R’ and R’ for part (b). b, RNA nucleotide triphosphates with chemical adducts formed from reaction with structure probing compounds. Adducts are highlighted in green. c, Median lengths of mapped nanopore reads for unmodified and modified Tetrahymena RNA with different structure probing compounds. d, e, f, Boxplots showing the frequency of mismatch (d), deletion (e) and insertion (f) rates for different structure probing chemicals on Tetrahymena RNA, as compared to unmodified RNA. P-values were calculated using the two-sided Wilcoxon Rank Sum test. h–j, Boxplots showing the AUC-ROC performance of mismapping (h), deletion (i) and insertion (j) rates for the different compounds on the Tetrahymena RNA secondary structure. P-values were calculated using two-sided Wilcoxon rank-sum test. c-j, 6962-42107 reads from different libraries were used for comparisons (Supplementary Table 1). The middle, lower and upper boundary lines in the boxplot correspond to median, first and third quartiles. The upper whisker extends to the largest value no further than 1.5 × IQR from the hinge (where IQR is the inter-quartile range) and the lower whisker extends to the smallest value at most 1.5 × IQR of the hinge.

Source data

Extended Data Fig. 2 Distribution of mismatches, insertions, deletions along Tetrahymena RNA sequence.

Line plots of normalized number of mismatches (a), deletions (b) and insertions (c) caused by the different compounds and unmodified, along the length of the Tetrahymena RNA sequence. The red bars on top of the plots indicate the location of single-stranded bases in the secondary structure.

Extended Data Fig. 3 Error characterization of the modifications along the secondary structure of the Tetrahymena RNA.

a, Positions and intensity of mismatches (red), deletions (green) and insertions (purple) caused by the different chemical compounds are mapped along the secondary structure of the Tetrahymena RNA. b, Percentage of observed bases (upper number) and corresponding P-values were shown (lower number) for each observation. P-value was calculated using two-sided chi-square test for all modified versus unmodified comparisons.

Source data

Extended Data Fig. 4 Schematic of the bioinformatic workflow of PORE-cupine and characteristics of direct RNA sequencing signal.

Sequenced reads were basecalled using Albacore or Guppy, and mapped to the reference sequences using Graphmap. We used Nanopolish to align the raw signals and to extract current features which were used to train unmodified data using SVM. We then filter for reads that are longer than 50% of annotated lengths and for transcripts that have at least 100 reads in the modified library and 200 reads in the unmodified library for downstream analysis. b, Normalized current dwell time for single-stranded regions on Tetrahymena RNA modified with NAI-N3. With footprinting gels as a guide, the top 10% of the single-stranded regions on Tetrahymena RNA were selected for these plots. c-e, Normalized current mean (c), standard deviation (d) and dwell time (e) distributions for all positions on unmodified Tetrahymena RNA and RNA modified with NAI-N3. f, Bioanalyzer traces of in vitro transcribed, full-length, unmodified, and NAI-N3 (100 mM for 5 mins or 25 mins) modified Tetrahymena RNA. g, Mapping rates for modified versus unmodified Tetrahymena RNA. The total number of sequenced reads for unmodified, NAI-N3 (5 mins), and NAI-N3(25 mins) are 20149, 51760, and 22155 reads respectively. The percentage of mapped reads for unmodified, NAI-N3 (5 mins), and NAI-N3 (25 mins) are 75%, 81%, and 17% respectively. h, Density plots showing the distribution of lengths of sequenced unmodified and modified Tetrahymena RNA. Top: unmodified and NAI-N3 modified (100 mM, 5 min) RNA. Bottom: unmodified and NAI-N3 modified (100 mM, 25 min) RNA. i, Coverage of reads mapping to Tetrahymena RNA along its length, for unmodified (top), NAI-N3 modified (100 mM, 5 min)(middle) and extended NAI-N3 modified (100 mM, 25 min) RNA (bottom).

Extended Data Fig. 5 Optimization of PORE-cupine using 11 RNAs as training set.

a, Scatterplot showing the distribution of normalized base reactivity between N = 2 biological replicates of modified Tetrahymena RNA. R = 0.97, CI_95% = [0.97,0.98] (Pearson correlation). P-value=2.5×10^-262, two-tailed Student’s T-test. b, Distribution of current mean and standard deviation for a unimodal (left) and bimodal (right) position in two biological replicates. c, AUC-ROC performance of the correlation of NAI-N3 reactivities of the training set based on PORE-cupine versus footprinting from 11 transcripts. d,e, Comparison of PORE-cupine reactivity and traditional footprinting. Two replicates of gels were shown for Tetrahymena RNA (d, R = 0.80) and lysine riboswitch (e, R = 0.74). Lane 1 of the footprinting gels show A (left, Tetrahymena) or G (right, Tetrahymena and lysine) ladder. Lane 2 shows unmodified RNA, and lane 3 shows NAI-N3 modified RNA. Quantification of the bands on the gels was done using SAFA. Pearson correlation was used to compare between SAFA and PORE-cupine signals. f, List of RNAs used for training and test. g, Scatter plot of per-base reactivity in two biological replicates of the three test RNAs. P-value = 0 using two-tailed Student’s T-test. R = 0.877, CI_95% = [0.87, 0.89], by Pearson correlation. h-j, Line plots showing the per-base reactivity along the length of three test RNAs, for two biological replicates. R > = 0.89, using Pearson correlation. k, Boxplot showing the performance of the SVM parameters on the 3 test RNAs, based on training on the Tetrahymena RNA (left) or on 11 RNAs (right, footprinting gels). l, AUC-ROC performance of SVM parameters on 3 test RNAs (red, based on our current 11 training RNAs) versus test RNAs after random selection of 11/14 RNAs as training, for 20 times. m, Boxplot showing the performance of all, unimodal and bimodal positions on test RNAs using AUC-ROC based on footprinting gels from 3 transcripts. In c, k-m, the middle, lower and upper boundary lines in the boxplot correspond to median, first and third quartiles. The upper whisker extends to the largest value no further than 1.5 × IQR from the hinge (where IQR is the inter-quartile range) and the lower whisker extends to the smallest value at most 1.5 × IQR of the hinge. Outliers are shown as dots.

Source data

Extended Data Fig. 6 Comparison between PORE-cupine and footprinting signals.

a, Bioanalyzer traces of unmodified and in vivo NAI-N3 modified (100 mM, 5 min) total B. subtilis RNA. b, Secondary structure model of B. subtilis 16 S rRNA. The structure probed regions are boxed in pink, green and blue. c-e, Comparisons between PORE-cupine and footprinting. Two replicates of gels are shown for each of the three regions along B. subtilis RNA. The gels show G ladder (lane 1), unmodified RNA (lane 2) and NAI-N3 modified RNA (lane 3) and a correlation of R (Pearson)= 0.91 (c), 0.74 (d) and 0.24 (e) between the gels. Quantification of the bands on the gels were done using SAFA. Comparison between SAFA quantification and PORE-cupine for each of the regions is shown as a line plot to the right of the gels. R = 0.52 (c), 0.76 (d), 0.62 (e) by Pearson correlation. f,g, Bioanalyzer traces of unmodified and in vitro NAI-N3 modified RPS29 (100 mM, 5 min) (f) and Adocbl riboswitch (g). h-i, Comparisons between PORE-cupine and footprinting. Two replicates of gels are shown for along RPS29 and Adocbl riboswitch RNA, R(Pearson)= 0.93(h) and 0.73(i). The gels show G ladder (lane 1), unmodified RNA (lane 2) and NAI-N3 modified RNA (lane 3). Quantification of the bands on the gels were done using SAFA. Comparison between SAFA quantification and PORE-cupine for each of the regions is shown to the right of the gels. R(Pearson)=0.68 (h) and 0.69 (i).

Source data

Extended Data Fig. 7 PORE-cupine reactivity signals on TPP.

Line plots showing 2 replicates of PORE-cupine reactivities along TPP riboswitch in the presence of water (R = 0.86), 250 nM TPP (R = 0.87), 750 nM TPP (R = 0.83), and 10 μM TPP (R = 0.94). Pearson correlation is used to calculate the similarities between the reactivities of the replicates.

Source data

Extended Data Fig. 8 PORE-cupine results on the hESC transcriptome.

a, b, Bioanalyzer traces of unmodified and modified total (a) and polyA(+) selected (b) hESC. c, Barplots showing the number of reads after basecalling (12007032 and 10118432) and mapping (86% and 60%) in unmodified and modified hESC samples respectively. d, Histogram showing the distribution of reads with different amounts of modification in hESC. e, Boxplots showing the performance of reads with different amounts of modification, calculated using AUC-ROC on the test set of 3 RNAs, based on 10 footprinting regions. Reads were grouped into different classes: all reads (current, 147670 reads), with only 1 modification for the strand (only 1, 15461 reads), with 0-1% modification (68025 reads), with 1-2% modification (9771 reads), with 2-3% modification (803 reads), and with 3-4% modifications (76 reads). P-value is calculated using two-sided Wilcoxon Rank Sum test. The middle, lower and upper boundary of the boxplot correspond to the median, first and third quartiles, while the upper and lower whiskers extend from the hinge to the largest and smallest value at most 1.5 × IQR of the hinge respectively. f, Top: Fraction of modified reads mapped across exon-exon junctions (position 0) in hESC (black) and across artificial junctions with 50 base insertions (red). Bottom: Difference between mapping rates across normal versus artificial exon-exon junctions at each base; p-value was calculated using two-tailed Wilcoxon Rank-Sum Test. g, Line graph showing the percentage of bimodal positions observed (for both current standard deviation and mean) for all 1024 kmers. The orange line indicates the top 1% of bimodal signals across all kmers and the identity of the corresponding kmers are labelled above. h-j, Base composition along positions 1,2,3,4,5 of unimodal (left) and bimodal kmers (right). These include kmers that show bimodal current mean (h), bimodal current standard deviation (i), and both bimodal current mean and current standard deviation (j). k, Coverage of unmodified (left) and modified (right) reads along the hESC transcriptome using direct RNA sequencing.

Source data

Extended Data Fig. 9 Structural properties of the hESC transcriptome.

Scatterplot showing the distribution of total reads per position for each hESC transcript, for N = 2 biologically independent replicates, of unmodified (left, p-value =0, using two-tailed Student T test, CI_95% = [0.98,0.98], 1613 transcripts) and modified transcripts (right, 1751 transcripts, p-value =0 using two-tailed Student T test, CI_95% = [0.97,0.97]). R (Pearson)=0.98 (left) and 0.97 (right). b, Barplot showing the number of transcripts left after abundance and length filter. The number of transcripts in each group is shown above the plot. c, Boxplots showing the distribution of median mapped lengths of unmodified (left) and NAI-N3 modified (middle) hESC mRNAs (1751 trancripts). Annotated refers to the distribution of expected lengths for each transcript based on ENSEMBL GRCh38 annotation (right). d, Histogram showing the distribution of transcripts having different fractions of annotated length in unmodified and modified samples. e, Distribution of Pearson correlations between full-length (>99% of known length) and partial transcripts in hESC (83 transcripts from N = 2 two biological replicates were used). The Y-axis shows the fraction of transcripts with a particular correlation. The X-axis depicts Pearson correlation coefficients. f, Boxplot showing PORE-cupine reactivity of different classes of transcripts. P-values were calculated using two-sided Wilcoxon Rank Sum test.1584 coding genes, 67 pseudogenes, 81 non-coding genes and 4 rRNAs were used. g, Top, Metagene analysis of PORE-cupine-derived mean reactivities aligned according to start (Upper) and stop (Lower) codons for all 559 transcripts. Bottom, Metagene autocorrelation function (ACF) plot for the 5’ UTR, CDS and 3’ UTR. In c and f, the middle, lower and upper boundary of the boxplot correspond to the median, first and third quartiles. The upper and lower whisker extends from the hinge to the largest and smallest value at most 1.5 × IQR of the hinge. Outliers are shown as dots.

Source data

Extended Data Fig. 10 RNA structures in gene-linked isoforms.

a, Upper: Transcript organization of different RPLP0 isoforms. Alternative exons seen in our structural data are highlighted in red. Lower, normalized reactivity profiles for the different isoforms and their aggregate signal. b, Upper, Transcript organization of different RACK1 isoforms. Alternative exon is shown in red (also in inset). Lower, Line plots for the aggregate reactivity signal between the two isoforms are shown (Top). Middle, Line plots showing the expanded view of the reactivity difference between the isoforms. Bottom, Line plots showing the individual reactivity information for each isoform along its length. c, No. of transcripts with two structure changing regions that are more than 100, 200, 300, 400 or 500 bases apart. The value for each group is shown above the bar. d, Schematic of the TrIP-seq workflow. e, Line plot showing the absorbance A260 of each fraction (2-12) after polysome fractionation. f, Pair-wise correlations (Spearman correlation) of the read-counts/transcript for each fraction between two biological replicates. Fractions and batches are denoted as F2-12 and B1-2 respectively. g, Distribution of read-counts across different polysome fractions for two biological replicates of Actin B (left) and Activating transcription factor 4 (right).

Source data

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aw, J.G.A., Lim, S.W., Wang, J.X. et al. Determination of isoform-specific RNA structure with nanopore long reads. Nat Biotechnol 39, 336–346 (2021). https://doi.org/10.1038/s41587-020-0712-z

Download citation

Received: 28 May 2019
Accepted: 18 September 2020
Published: 26 October 2020
Issue Date: March 2021
DOI: https://doi.org/10.1038/s41587-020-0712-z

This article is cited by

Isoform-specific RNA structure determination using Nano-DMS-MaP
- Anne-Sophie Gribling-Burrer
- Patrick Bohn
- Redmond P. Smyth
Nature Protocols (2024)
NAP-seq reveals multiple classes of structured noncoding RNAs with regulatory functions
- Shurong Liu
- Junhong Huang
- Jianhua Yang
Nature Communications (2024)
Direct RNA sequencing coupled with adaptive sampling enriches RNAs of interest in the transcriptome
- Jiaxu Wang
- Lin Yang
- Yue Wan
Nature Communications (2024)
Advances in environmental DNA monitoring: standardization, automation, and emerging technologies in aquatic ecosystems
- Suxiang Lu
- Honghui Zeng
- Shunping He
Science China Life Sciences (2024)
Flexible and efficient handling of nanopore sequencing signal data with slow5tools
- Hiruna Samarakoon
- James M. Ferguson
- Ira W. Deveson
Genome Biology (2023)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

Change history

12 November 2020

23 March 2021

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links