Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Direct identification of A-to-I editing sites with nanopore native RNA sequencing

Abstract

Inosine is a prevalent RNA modification in animals and is formed when an adenosine is deaminated by the ADAR family of enzymes. Traditionally, inosines are identified indirectly as variants from Illumina RNA-sequencing data because they are interpreted as guanosines by cellular machineries. However, this indirect method performs poorly in protein-coding regions where exons are typically short, in non-model organisms with sparsely annotated single-nucleotide polymorphisms, or in disease contexts where unknown DNA mutations are pervasive. Here, we show that Oxford Nanopore direct RNA sequencing can be used to identify inosine-containing sites in native transcriptomes with high accuracy. We trained convolutional neural network models to distinguish inosine from adenosine and guanosine, and to estimate the modification rate at each editing site. Furthermore, we demonstrated their utility on the transcriptomes of human, mouse and Xenopus. Our approach expands the toolkit for studying adenosine-to-inosine editing and can be further extended to investigate other RNA modifications.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Dinopore exploits deviations in ionic current signal and base-calling errors to predict inosines in RNA transcripts directly.
Fig. 2: Development and benchmarking of Dinopore for inosine detection.
Fig. 3: Evaluation of Dinopore on previously unseen organisms and cell types.
Fig. 4: Multi-class predictions by Dinopore.
Fig. 5: Further evaluation of Dinopore.
Fig. 6: Estimation of editing levels with a regression model.

Data availability

Raw nanopore sequencing data have been deposited in the NCBI Sequence Read Archive under accession number SRP363295.

Genome references are publicly available and can be downloaded from the following links:

GRCh37, mm10 and xenlae2.

Code availability

The computational code used in all the analysis is hosted on GitHub (https://github.com/darelab2014/Dinopore). A pre-built computing environment as well as the source code and source data are also available in a Code Ocean capsule (https://doi.org/10.24433/CO.2180901.v1).

References

  1. Nishikura, K. Functions and regulation of RNA editing by ADAR deaminases. Annu. Rev. Biochem. 79, 321–349 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  2. Burns, C. M. et al. Regulation of serotonin-2C receptor G-protein coupling by RNA editing. Nature 387, 303–308 (1997).

    CAS  Article  PubMed  Google Scholar 

  3. Hoopengardner, B., Bhalla, T., Staber, C. & Reenan, R. Nervous system targets of RNA editing identified by comparative genomics. Science 301, 832–836 (2003).

    CAS  Article  PubMed  Google Scholar 

  4. Sommer, B., Kohler, M., Sprengel, R. & Seeburg, P. H. RNA editing in brain controls a determinant of ion flow in glutamate-gated channels. Cell 67, 11–19 (1991).

    CAS  Article  PubMed  Google Scholar 

  5. Hsiao, Y. E. et al. RNA editing in nascent RNA affects pre-mRNA splicing. Genome Res. 28, 812–823 (2018).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. Zhang, Z. & Carmichael, G. G. The fate of dsRNA in the nucleus: a p54(nrb)-containing complex mediates the nuclear retention of promiscuously A-to-I edited RNAs. Cell 106, 465–475 (2001).

    CAS  Article  PubMed  Google Scholar 

  7. Stellos, K. et al. Adenosine-to-inosine RNA editing controls cathepsin S expression in atherosclerosis by enabling HuR-mediated post-transcriptional regulation. Nat. Med. 22, 1140–1150 (2016).

    CAS  Article  PubMed  Google Scholar 

  8. Bahn, J. H. et al. Genomic analysis of ADAR1 binding and its involvement in multiple RNA processing pathways. Nat. Commun. 6, 6355 (2015).

    CAS  Article  PubMed  Google Scholar 

  9. Yang, W. et al. Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nat. Struct. Mol. Biol. 13, 13–21 (2006).

    CAS  Article  PubMed  Google Scholar 

  10. Kawahara, Y. et al. Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 315, 1137–1140 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  11. Ivanov, A. et al. Analysis of intron sequences reveals hallmarks of circular RNA biogenesis in animals. Cell Rep. 10, 170–177 (2015).

    CAS  Article  PubMed  Google Scholar 

  12. Rybak-Wolf, A. et al. Circular RNAs in the mammalian brain are highly abundant, conserved, and dynamically expressed. Mol. Cell 58, 870–885 (2015).

    CAS  Article  PubMed  Google Scholar 

  13. Wang, Q., Khillan, J., Gadue, P. & Nishikura, K. Requirement of the RNA editing deaminase ADAR1 gene for embryonic erythropoiesis. Science 290, 1765–1768 (2000).

    CAS  Article  PubMed  Google Scholar 

  14. Wang, Q. et al. Stress-induced apoptosis associated with null mutation of ADAR1 RNA editing deaminase gene. J. Biol. Chem. 279, 4952–4961 (2004).

    CAS  Article  PubMed  Google Scholar 

  15. Hartner, J. C. et al. Liver disintegration in the mouse embryo caused by deficiency in the RNA-editing enzyme ADAR1. J. Biol. Chem. 279, 4894–4902 (2004).

    CAS  Article  PubMed  Google Scholar 

  16. Liddicoat, B. J. et al. RNA editing by ADAR1 prevents MDA5 sensing of endogenous dsRNA as nonself. Science 349, 1115–1120 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  17. Mannion, N. M. et al. The RNA-editing enzyme ADAR1 controls innate immune responses to RNA. Cell Rep. 9, 1482–1494 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  18. Pestal, K. et al. Isoforms of RNA-editing enzyme ADAR1 independently control nucleic acid sensor MDA5-driven autoimmunity and multi-organ development. Immunity 43, 933–944 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  19. Gacem, N. et al. ADAR1 mediated regulation of neural crest derived melanocytes and Schwann cell development. Nat. Commun. 11, 198 (2020).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  20. Rice, G. I. et al. Mutations in ADAR1 cause Aicardi-Goutieres syndrome associated with a type I interferon signature. Nat. Genet. 44, 1243–1248 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. Roth, S. H. et al. Increased RNA editing may provide a source for autoantigens in systemic lupus erythematosus. Cell Rep. 23, 50–57 (2018).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  22. Shallev, L. et al. Decreased A-to-I RNA editing as a source of keratinocytes’ dsRNA in psoriasis. RNA 24, 828–840 (2018).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. Tran, S. S. et al. Widespread RNA editing dysregulation in brains from autistic individuals. Nat. Neurosci. 22, 25–36 (2019).

    CAS  Article  PubMed  Google Scholar 

  24. Khermesh, K. et al. Reduced levels of protein recoding by A-to-I RNA editing in Alzheimer’s disease. RNA 22, 290–302 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. Breen, M. S. et al. Global landscape and genetic regulation of RNA editing in cortical samples from individuals with schizophrenia. Nat. Neurosci. 22, 1402–1412 (2019).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. Han, L. et al. The genomic landscape and clinical relevance of A-to-I RNA editing in human cancers. Cancer Cell 28, 515–528 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  27. Ishizuka, J. J. et al. Loss of ADAR1 in tumours overcomes resistance to immune checkpoint blockade. Nature 565, 43–48 (2019).

    CAS  Article  PubMed  Google Scholar 

  28. Liu, H. et al. Tumor-derived IFN triggers chronic pathway agonism and sensitivity to ADAR loss. Nat. Med. 25, 95–102 (2019).

    CAS  Article  PubMed  Google Scholar 

  29. Gannon, H. S. et al. Identification of ADAR1 adenosine deaminase dependency in a subset of cancer cells. Nat. Commun. 9, 5450 (2018).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  30. Pinto, Y. & Levanon, E. Y. Computational approaches for detection and quantification of A-to-I RNA editing. Methods 156, 25–31 (2019).

    CAS  Article  PubMed  Google Scholar 

  31. Mansi, L. et al. REDIportal: millions of novel A-to-I RNA editing events from thousands of RNAseq experiments. Nucleic Acids Res. 49, D1012–D1019 (2021).

    CAS  Article  PubMed  Google Scholar 

  32. Ramaswami, G. & Li, J. B. RADAR: a rigorously annotated database of A-to-I RNA editing. Nucleic Acids Res. 42, D109–D113 (2014).

    CAS  Article  PubMed  Google Scholar 

  33. Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206 (2018).

    CAS  Article  PubMed  Google Scholar 

  34. Liu, H. et al. Accurate detection of m6A RNA modifications in native RNA sequences. Nat. Commun. 10, 4079 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Parker, M. T. et al. Nanopore direct RNA-sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification. Elife https://doi.org/10.7554/eLife.49658 (2020).

  36. Price, A. M. et al. Direct RNA sequencing reveals m6A modifications on adenovirus RNA are necessary for efficient splicing. Nat. Commun. 11, 6016 (2020).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  37. Jenjaroenpun, P. et al. Decoding the epitranscriptional landscape from native RNA sequences. Nucleic Acids Res. 49, e7 (2021).

    CAS  Article  PubMed  Google Scholar 

  38. Lorenz, D. A., Sathe, S., Einstein, J. M. & Yeo, G. W. Direct RNA sequencing enables m6A detection in endogenous transcript isoforms at base-specific resolution. RNA 26, 19–28 (2020).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  39. Begik, O. et al. Quantitative profiling of pseudouridylation dynamics in native RNAs with nanopore sequencing. Nat. Biotechnol. https://doi.org/10.1038/s41587-021-00915-6 (2021).

    Article  PubMed  Google Scholar 

  40. Pratanwanich, P. N. et al. Identification of differential RNA modifications from nanopore direct RNA sequencing with xPore. Nat. Biotechnol. https://doi.org/10.1038/s41587-021-00949-w (2021).

    Article  PubMed  Google Scholar 

  41. Leger, A. et al. RNA modifications detection by comparative Nanopore direct RNA sequencing. Nat. Commun. 12, 7198 (2021).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  42. Yoshida, M. & Ukita, T. Modification of nucleosides and nucleotides. VII. Selective cyanoethylation of inosine and pseudouridine in yeast transfer ribonucleic acid. Biochim. Biophys. Acta 157, 455–465 (1968).

    CAS  Article  PubMed  Google Scholar 

  43. Sakurai, M., Yano, T., Kawabata, H., Ueda, H. & Suzuki, T. Inosine cyanoethylation identifies A-to-I RNA editing sites in the human transcriptome. Nat. Chem. Biol. 6, 733–740 (2010).

    CAS  Article  PubMed  Google Scholar 

  44. Loman, N. J., Quick, J. & Simpson, J. T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods 12, 733–735 (2015).

    CAS  Article  PubMed  Google Scholar 

  45. Ding, H., Bailey, A. D., Jain, M., Olsen, H. & Paten, B. Gaussian mixture model-based unsupervised nucleotide modification number detection using nanopore-sequencing readouts. Bioinformatics 36, 4928–4934 (2020).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  46. Picardi, E. et al. Profiling RNA editing in human tissues: towards the inosinome Atlas. Sci. Rep. 5, 14941 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  47. Tan, M. H. et al. Dynamic landscape and regulation of RNA editing in mammals. Nature 550, 249–254 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Wick, R. R., Judd, L. M. & Holt, K. E. Deepbinner: demultiplexing barcoded Oxford Nanopore reads with deep convolutional neural networks. PLoS Comput. Biol. 14, e1006583 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Bazak, L. et al. A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes. Genome Res. 24, 365–376 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  50. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (2016).

  51. Nguyen, A. T., Xu, J., Luu, D. K., Zhao, Q. & Yang, Z. Advancing system performance with redundancy: from biological to artificial designs. Neural Comput. 31, 555–573 (2019).

    Article  PubMed  Google Scholar 

  52. Porath, H. T., Knisbacher, B. A., Eisenberg, E. & Levanon, E. Y. Massive A-to-I RNA editing is common across the Metazoa and correlates with dsRNA abundance. Genome Biol. 18, 185 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Lo Giudice, C., Tangaro, M. A., Pesole, G. & Picardi, E. Investigating RNA editing in deep transcriptome datasets with REDItools and REDIportal. Nat. Protoc. 15, 1098–1131 (2020).

    CAS  Article  PubMed  Google Scholar 

  54. Chalk, A. M., Taylor, S., Heraud-Farlow, J. E. & Walkley, C. R. The majority of A-to-I RNA editing is not required for mammalian homeostasis. Genome Biol. 20, 268 (2019).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  55. Sherry, S. T., Ward, M. & Sirotkin, K. dbSNP database for single-nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. 9, 677–679 (1999).

    CAS  Article  PubMed  Google Scholar 

  56. Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  57. Ghandi, M. et al. Next-generation characterization of the cancer cell line encyclopedia. Nature 569, 503–508 (2019).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  58. Lo Giudice, C. et al. Quantifying RNA editing in deep transcriptome datasets. Front. Genet. 11, 194 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Polson, A. G. & Bass, B. L. Preferential selection of adenosines for modification by double-stranded RNA adenosine deaminase. EMBO J. 13, 5701–5711 (1994).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  60. Eggington, J. M., Greene, T. & Bass, B. L. Predicting sites of ADAR editing in double-stranded RNA. Nat. Commun. 2, 319 (2011).

    Article  CAS  PubMed  Google Scholar 

  61. Lehmann, K. A. & Bass, B. L. Double-stranded RNA adenosine deaminases ADAR1 and ADAR2 have overlapping specificities. Biochemistry 39, 12875–12884 (2000).

    CAS  Article  PubMed  Google Scholar 

  62. Buchumenski, I. et al. Systematic identification of A-to-I RNA editing in zebrafish development and adult organs. Nucleic Acids Res. 49, 4325–4337 (2021).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  63. Athanasiadis, A., Rich, A. & Maas, S. Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol. 2, e391 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Xiong, F. et al. RNA m6A modification orchestrates a LINE-1–host interaction that facilitates retrotransposition and contributes to long gene vulnerability. Cell Res. 31, 861–885 (2021).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  65. Liu, J. et al. The RNA m6A reader YTHDC1 silences retrotransposons and guards ES cell identity. Nature 591, 322–326 (2021).

    CAS  Article  PubMed  Google Scholar 

  66. Chen, C. et al. Nuclear m6A reader YTHDC1 regulates the scaffold function of LINE1 RNA in mouse ESCs and early embryos. Protein Cell 12, 455–474 (2021).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  67. Xu, W. et al. METTL3 regulates heterochromatin in mouse embryonic stem cells. Nature 591, 317–321 (2021).

    CAS  Article  PubMed  Google Scholar 

  68. Jain, M., Jantsch, M. F. & Licht, K. The Editor’s I on disease development. Trends Genet. 35, 903–913 (2019).

    CAS  Article  PubMed  Google Scholar 

  69. Garrett, S. & Rosenthal, J. J. RNA editing underlies temperature adaptation in K+ channels from polar octopuses. Science 335, 848–851 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  70. Alon, S. et al. The majority of transcripts in the squid nervous system are extensively recoded by A-to-I RNA editing. Elife https://doi.org/10.7554/eLife.05198 (2015).

  71. Liscovitch-Brauer, N. et al. Trade-off between transcriptome plasticity and genome evolution in cephalopods. Cell 169, 191–202 (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  72. Cox, D. B. T. et al. RNA editing with CRISPR–Cas13. Science 358, 1019–1027 (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  73. Merkle, T. et al. Precise RNA editing by recruiting endogenous ADARs with antisense oligonucleotides. Nat. Biotechnol. 37, 133–138 (2019).

    CAS  Article  PubMed  Google Scholar 

  74. Qu, L. et al. Programmable RNA editing by recruiting endogenous ADAR using engineered RNAs. Nat. Biotechnol. 37, 1059–1069 (2019).

    CAS  Article  PubMed  Google Scholar 

  75. Boccaletto, P. et al. MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res. 46, D303–D307 (2018).

    CAS  Article  PubMed  Google Scholar 

  76. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  Article  PubMed  Google Scholar 

  77. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  78. Marić, J., Sović, I., Križanović, K., Nagarajan, N. & Šikić, M. Graphmap2—splice-aware RNA-seq mapper for long reads https://doi.org/10.1101/720458 (2019).

Download references

Acknowledgements

We thank members of the DaRE laboratory for helpful discussions. M.H.T. is supported by a National Research Foundation Singapore grant (NRF2017-NRF-ISF002–2673), an Open Fund - Individual Research Grant from the National Medical Research Council (NMRC/OFIRG/0017/2016), an EMBO Global Investigatorship, an ASPIRE League seed grant from Nanyang Technological University, core funds from the Genome Institute of Singapore, and funds for Final Year Project (FYP) and the International Genetically Engineering Machine (iGEM) competition from the School of Chemical and Biomedical Engineering. J.W.J.H. is supported by a Ph.D. research scholarship from the School of Chemical and Biomedical Engineering. Y.S.H. is supported by core funds from the Bioprocessing Technology Institute. We also acknowledge the funding support for this project from Nanyang Technological University under the URECA Undergraduate Research Programme.

Author information

Authors and Affiliations

Authors

Contributions

M.H.T. conceived the project and designed the study. T.A.N. led the computational analysis, with active participation from J.W.J.H., P.K. and M.H.T. E.P.L.K., D.S., J.G.A.A., M.S. and Y.W. contributed to the analysis. J.W.J.H. and P.K. performed the sequencing experiments, with help from H.L., A.C., A.P., Z.Y. and M.L. Y.Y.H., K.L.E.P. and Y.S.H. performed the mass spectrometry experiments. Y.M.W., Q.Z., J.H.-F., S.X., B.R. and C.W. provided samples. T.A.N. and M.H.T. organized and wrote the manuscript, with help from J.W.J.H. and P.K. All authors read and approved the paper.

Corresponding author

Correspondence to Meng How Tan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Lei Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Percentages of nanopore direct RNA sequencing reads that could be aligned to the reference synthetic sequences.

a, The library preparation protocol from Oxford Nanopore Technologies (ONT) contains an optional reverse transcription (RT) step to generate a second cDNA strand, which is not sequenced but improves the throughput. We found that while the extra RT step did not affect the mapping rate of sequencing reads containing only canonical nucleotides, it enhanced the mappability of inosine-containing reads, although statistical significance was not reached. P-values were calculated using two-tailed Student’s t-test (N = 3 [no RT] or 4 [with RT]). b, Reaction of inosines with acrylonitrile (ACN) results in the introduction of a chemical adduct, which blocks the progression of a reverse transcriptase. The altered base, N1-cyanoethylinosine, is bulkier and is predicted to perturb the ionic current more dramatically than inosine, potentially rendering detection by direct RNA sequencing easier. However, we found that ACN treatment greatly reduced the throughput as numerous strands appeared to be ejected from the pores and the obtained reads were also significantly harder to align to the reference sequences than untreated inosine-containing reads. P-values were calculated using two-tailed Student’s t-test (N = 6). All box plots: Box, first to last quartiles; whiskers, 1.5× interquartile range; center line, median; points, outliers.

Extended Data Fig. 2 Inosines in the H9 transcriptome.

a, Histogram showing the distribution of editing levels in H9 human embryonic stem cells (hESCs), as calculated from Illumina RNA-seq data. Although thousands of A-to-I editing events could be detected, most of them occurred at low frequencies. b, Signal-level features of adenosine (A), inosine (I), and guanosine (G) in nanopore direct RNA sequencing data generated from H9 cells. c, Frequency of base-calling errors in nanopore data generated from H9 cells. The mismatch frequency was high at SNP positions as the reads were mapped against the reference genome. d, Base qualities of adenosine (A), inosine (I), and guanosine (G) in nanopore data generated from H9 cells. (In b-d, N = 2410 [A], 5613 [I] or 1297 [G].). All box plots: Box, first to last quartiles; whiskers, 1.5× interquartile range; center line, median; points, outliers.

Extended Data Fig. 3 Reproducibility of features in H9 nanopore data.

a, Scatterplots showing the reproducibility of event parameters (mean, standard deviation, and length) across replicates. The Pearson correlation coefficients (R) were all above 0.5. b, Scatterplots showing the reproducibility of base-calling errors (insertion, deletion, and mismatch) across replicates. There was more variability in the base-calling errors compared to the event parameters. While the Pearson correlation coefficients for deletion and mismatch were moderate (between 0.4–0.5), they were appreciably lower for insertion (less than 0.3). c, Scatterplots showing the reproducibility of base quality across replicates. Like the event parameters, the Pearson correlation coefficients for base quality were also above 0.5.

Extended Data Fig. 4 Evaluation of different CNN architectures.

a, A plain architecture with no shortcut connections. b, Comparing the performance of the plain architecture with a ResNet-based architecture shown in Fig. 2b using the same set of training and test data generated from wild type and ADAR1-null human H9 cells.

Extended Data Fig. 5 De novo discovery of RNA editing sites in Xenopus embryos.

a-c, Stranded RNA-seq libraries were constructed out of (a) Stage 1, (b) Stage 9, and (c) Stage 28 Xenopus laevis embryos and sequenced on the Illumina platform. There were three biological replicates for each developmental stage. The software, REDItools, was then used to identify RNA editing sites sample-by-sample. In every sample, A-to-G variants represented the dominant mismatch type as expected. The specificity of detection was also higher in repetitive regions than non-repetitive regions, as indicated by the higher percentages of A-to-G mismatches in all samples. d, Locations of A-to-I RNA editing sites in the Xenopus transcriptome. We examined the genomic locations of editing sites identified from Illumina RNA-seq data using GTF annotation files from NCBI. Consistent with previous studies in other vertebrates, only a small fraction of the Xenopus editing sites was found in protein-coding regions. Majority of the sites also appear to be intergenic, possibly because the frog transcriptome is not fully annotated.

Extended Data Fig. 6 Reproducibility of features in Xenopus nanopore data.

a, Scatterplots showing the reproducibility of event parameters (mean, standard deviation, and length) across replicates. The Pearson correlation coefficients (R) were all above 0.5. b, Scatterplots showing the reproducibility of base-calling errors (insertion, deletion, and mismatch) across replicates. There was more variability in the base-calling errors compared to the event parameters. While the Pearson correlation coefficients for deletion and mismatch were moderate (between 0.4–0.5), they were appreciably lower for insertion (less than 0.3). c, Scatterplots showing the reproducibility of base quality across replicates. Like the event parameters, the Pearson correlation coefficients for base quality were also above 0.5.

Extended Data Fig. 7 Classification of SNPs using a two-class model.

We tested how Dinopore, when trained only on two classes (A and I), would handle A/G SNPs. If it had labelled the SNPs primarily as unmodified, then the two-class model would be sufficient for inosine detection. However, when we evaluated the model on known A/G SNPs in human (H9 and HCT116), mouse, and Xenopus, we found that it predicted most of the SNPs to be inosines instead, possibly because the genetic variants gave a high mismatch frequency. Hence, the result suggested that a three-class model would be required to discriminate between the reference A, I (which was base-called by Guppy as a mixture of A and G), and A/G SNPs.

Extended Data Fig. 8 Detection sensitivity of Dinopore.

a, We stratified the test sites based on their editing levels and examined how accurately our method could identify the sites in each bin. Here, we required a minimum coverage of 20 nanopore reads. Unsurprisingly, the detection sensitivity was poorer for sites with low editing levels (0–10%) in all the biological systems studied. b, Motif sequence logos of A-to-I editing sites. We examined the upstream and downstream nucleotides surrounding each editing site in the test data from various biological systems. In human and mouse, the motif resembled the known ADAR sequence preference, whereby a guanosine is depleted 5’ of and enriched 3’ of the target adenosine. However, we did not observe as strong an enrichment for guanosine 3’ of the editing sites in Xenopus. c, Motifs obtained from the set of sites that were missed by Dinopore. There were very few false negatives in H9, so the leftmost motif is probably not meaningful. Interestingly, for Xenopus, our CNN model appeared to be more likely to miss bona fide editing sites with a downstream uracil and more particularly sites in a UAU sequence context.

Extended Data Fig. 9 Performance of Dinopore in repetitive and non-repetitive regions.

ROC and PR curves for (a) H9, (b) Xenopus, (c) HCT116, and (d) mouse test data. For each biological system, the various CNN models were evaluated on all the test sites (red curves), on only the sites in non-repetitive genomic regions (green curves), or on only the sites in repeats (blue curves). The training data used to develop the models were completely separate from the test datasets and were derived from H9 and Xenopus only. Strikingly, in HCT116 and the mouse, which the models had not previously encountered, the test sites in repeat regions always yielded appreciably lower AUC values.

Extended Data Fig. 10 Quantification of editing levels.

a, We wondered if A-to-I editing levels could be quantified on the ONT platform by cDNA-PCR sequencing. In this method, the libraries are made by reverse transcription, strand-switching and second-strand synthesis, and PCR amplification before attachment of sequencing adapters. We generated these libraries from H9 hESC RNA and sequenced them on the MinION device. Subsequently, we quantified the editing frequencies of known sites and compared the values obtained from nanopore sequencing with those obtained from Illumina sequencing. Overall, we observed a good correlation (R > 0.8) in editing levels between the two methods. Hence, editing may be quantified on the ONT platform by cDNA-PCR sequencing. b, Architecture of regression model to predict editing levels. We utilized CNN for regression analysis of our nanopore direct RNA sequencing data to estimate the modification rate of each inosine-containing site. As before, the input was a two-dimensional matrix with each row corresponding to a different 5-mer. The features included event parameters, base-calling errors, and base quality.

Supplementary information

Supplementary Information

Supplementary Fig. 1 and Supplementary Tables 1–6

Reporting Summary

Peer Review File

Supplementary Data

Mass spectrometry analysis of inosine incorporation

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nguyen, T.A., Heng, J.W.J., Kaewsapsak, P. et al. Direct identification of A-to-I editing sites with nanopore native RNA sequencing. Nat Methods (2022). https://doi.org/10.1038/s41592-022-01513-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41592-022-01513-3

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing