High-throughput complementary DNA sequencing technologies have advanced our understanding of transcriptome complexity and regulation. However, these methods lose information contained in biological RNA because the copied reads are often short and modifications are not retained. We address these limitations using a native poly(A) RNA sequencing strategy developed by Oxford Nanopore Technologies. Our study generated 9.9 million aligned sequence reads for the human cell line GM12878, using thirty MinION flow cells at six institutions. These native RNA reads had a median length of 771 bases, and a maximum aligned length of over 21,000 bases. Mitochondrial poly(A) reads provided an internal measure of read-length quality. We combined these long nanopore reads with higher accuracy short-reads and annotated GM12878 promoter regions to identify 33,984 plausible RNA isoforms. We describe strategies for assessing 3′ poly(A) tail length, base modifications and transcript haplotypes.
Subscribe to Journal
Get full journal access for 1 year
only $20.17 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Sequence data including raw signal files (FAST5), event-level data (FAST5), base-calls (FASTQ) and alignments (BAM) are available as an Amazon Web Services Open Data set, for download from https://github.com/nanopore-wgs-consortium/NA12878. The scripts used for various analyses are also available from the same GitHub under nanopore-human-transcriptome/scripts.
General scripts available at: https://github.com/nanopore-wgs-consortium/NA12878/tree/master/nanopore-human-transcriptome/scripts. Poly(A) caller (‘nanopolish-polya’, https://github.com/jts/nanopolish) and isoform analysis code for FLAIR (https://github.com/BrooksLabUCSC/flair).
Adams, M. D. Complementary DNA sequencing: expressed sequenced tags and human genome project. Science 252, 1651–1656 (1991).
Temin, H. M. & Mizutani, S. RNA-dependent DNA polymerase in virions of Rous sarcoma virus. Nature 226, 1211–1213 (1970).
Baltimore, D. Viral RNA-dependent DNA polymerase: RNA-dependent DNA polymerase in virions of RNA tumour viruses. Nature 226, 1209 (1970).
Saiki, R. K. et al. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239, 487–491 (1988).
Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206 (2018).
Jenjaroenpun, P. et al. Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D. Nucleic Acids Res. 46, e38 (2018).
Smith, A. M., Jain, M., Mulroney, L., Garalde, D. R. & Akeson, M. Reading canonical and modified nucleobases in 16S ribosomal RNA using nanopore native RNA sequencing. PLoS One 14, e0216709 (2019).
Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10, 1177–1184 (2013).
Venturini, L., Caim, S., Kaithakottil, G. G., Mapleson, D. L. & Swarbreck, D. Leveraging multiple transcriptome assembly methods for improved gene structure annotation. Gigascience 7, giy093 (2018).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Jain, M. et al. Improved data analysis for the MinION nanopore sequencer. Nat. Methods 12, 351–356 (2015).
Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338 (2018).
Szczesny, R. J. et al. RNA degradation in yeast and human mitochondria. Biochim. Biophys. Acta 1819, 1027–1034 (2012).
Payne, A., Holmes, N., Rakyan, V. & Loose, M. BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files. Bioinformatics 35, 2193–2198 (2018).
Tilgner, H., Grubert, F., Sharon, D. & Snyder, M. P. Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc. Natl Acad. Sci. USA 111, 9869–9874 (2014).
Cho, H. et al. High-resolution transcriptome analysis with long-read RNA sequencing. PLoS ONE 9, e108095 (2014).
Bernstein, B. E. et al. Genomic maps and comparative analysis of histone modifications in human and mouse. Cell 120, 169–181 (2005).
Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).
Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).
Deveson, I. W. et al. Universal alternative splicing of noncoding exons. Cell Syst. 6, 245–255 (2018).
Gonzàlez-Porta, M., Frankish, A., Rung, J., Harrow, J. & Brazma, A. Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene. Genome Biol. 14, R70 (2013).
Baralle, F. E. & Giudice, J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18, 437–451 (2017).
Edge, P., Bafna, V. & Bansal, V. HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. 27, 801–812 (2017).
Rozowsky, J. et al. AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol. Syst. Biol. 7, 522 (2011).
Brown, C. J. et al. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature 349, 38 (1991).
Eckmann, C. R., Rammelt, C. & Wahle, E. Control of poly(A) tail length. Wiley Interdiscip. Rev. RNA 2, 348–361 (2011).
Subtelny, A. O., Eichhorn, S. W., Chen, G. R., Sive, H. & Bartel, D. P. Poly(A)-tail profiling reveals an embryonic switch in translational control. Nature 508, 66–71 (2014).
Chang, H., Lim, J., Ha, M. & Kim, V. N. TAIL-seq: genome-wide determination of poly(A) tail length and 3’ end modifications. Mol. Cell 53, 1044–1052 (2014).
Temperley, R. J., Wydro, M., Lightowlers, R. N. & Chrzanowska-Lightowlers, Z. M. Human mitochondrial mRNAs—like members of all families, similar but different. Biochim. Biophys. Acta Bioenerg. 1797, 1081–1085 (2010).
Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).
Rand, A. C. et al. Mapping DNA methylation with high-throughput nanopore sequencing. Nat. Methods 14, 411–413 (2017).
Liu, N. & Pan, T. N6-methyladenosine–encoded epitranscriptomics. Nat. Struct. Mol. Biol. 23, 98–102 (2016).
Dai, D., Wang, H., Zhu, L., Jin, H. & Wang, X. N6-methyladenosine links RNA metabolism to cancer progression. Cell Death Dis. 9, 124 (2018).
Sibbritt, T., Patel, H. R. & Preiss, T. Mapping and significance of the mRNA methylome. Wiley Interdiscip. Rev. RNA 4, 397–422 (2013).
Meyer, K. D. et al. Comprehensive analysis of mRNA methylation reveals enrichment in 3’ UTRs and near stop codons. Cell 149, 1635–1646 (2012).
Roost, C. et al. Structure and thermodynamics of N6-methyladenosine in RNA: a spring-loaded base modification. J. Am. Chem. Soc. 137, 2107–2115 (2015).
Licht, K., Kapoor, U., Mayrhofer, E. & Jantsch, M. F. Adenosine to Inosine editing frequency controlled by splicing efficiency. Nucleic Acids Res. 44, 6398–6408 (2016).
Nishikura, K. Functions and regulation of RNA editing by ADAR deaminases. Annu. Rev. Biochem. 79, 321–349 (2010).
Tajaddod, M., Jantsch, M. F. & Licht, K. The dynamic epitranscriptome: A to I editing modulates genetic information. Chromosoma 125, 51–63 (2016).
Tardaguila, M. et al. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res. 28, 396–411 (2018).
Anvar, S. Y. et al. Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing. Genome Biol. 19, 46 (2018).
Wang, L. et al. Transcriptomic characterization of SF3B1 mutation reveals its pleiotropic effects in chronic lymphocytic leukemia. Cancer Cell 30, 750–763 (2016).
Bradley, R. K., Merkin, J., Lambert, N. J. & Burge, C. B. Alternative splicing of RNA triplets is often regulated and accelerates proteome evolution. PLoS Biol. 10, e1001229 (2012).
Bresson, S. M., Hunter, O. V., Hunter, A. C. & Conrad, N. K. Canonical Poly(A) polymerase activity promotes the decay of a wide variety of mammalian nuclear RNAs. PLoS Genet. 11, e1005610 (2015).
Yi, H. et al. PABP cooperates with the CCR4-NOT complex to promote mRNA deadenylation and block precocious decay. Mol. Cell 70, 1081–1088 (2018).
Parker, R. & Song, H. The enzymes and control of eukaryotic mRNA turnover. Nat. Struct. Mol. Biol. 11, 121–127 (2004).
Li, X., Xiong, X. & Yi, C. Epitranscriptome sequencing technologies: decoding RNA modifications. Nat. Methods 14, 23–31 (2016).
Roundtree, I. A., Evans, M. E., Pan, T. & He, C. Dynamic RNA modifications in gene expression regulation. Cell 169, 1187–1200 (2017).
Lee, M., Kim, B. & Kim, V. N. Emerging roles of RNA modification: m(6)A and U-tail. Cell 158, 980–987 (2014).
Tang, A. D. et al. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Preprint at bioRxiv https://doi.org/10.1101/410183 (2018).
Hinrichs, A. S. et al. The UCSC genome browser database: update 2006. Nucleic Acids Res. 34, D590–D598 (2006).
Eberle, M. A. et al. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27, 157–164 (2016).
Molinie, B. et al. m6A-LAIC-seq reveals the census and complexity of the m6A epitranscriptome. Nat. Methods 13, 692 (2016).
The authors are grateful for support from the following individuals. L. Snell, B. Sipos and D. Turner (ONT) provided materials and advice relevant to the 3′ poly(A) standards used to test nanopolish-polya. D. Garalde (ONT) provided early advice on use of the MinION for RNA sequencing. N. Conrad gave insight into the correlation of intron retention and poly(A) tail length. M. Diekhans reviewed the isoform analysis. Z. M. Chrzanowska-Lightowlers, T. Suzuki and S. Okada commented on early drafts of the manuscript. A. Beggs, L. Tee and T. Nieto (University of Birmingham, UK) provided cell cultures used in the Birmingham sequencing runs. The project was supported by the following grants: NIH HG010053 (A.N.B., B.P. and M.A.), NIH 5T32HG008345 (A.D.T.), NIH HG010538 (W.T.), NIH U54HG007990 (B.P.), U01 HL137183-02 (B.P.), Oxford Nanopore Research Grant SC20130149 (M.A.), National Institutes of Health Research Surgical Reconstruction and Microbiology Research Centre (J.Q.), Medical Research Council CLIMB Fellowship (N.L.), Wellcome Trust 204843/Z/16/Z (M.L.), BBSRC BB/N017099/1 and BB/M020061/1 (M.L.), the Canada Research Chair in Biotechnology and Genomics-Neurobiology (T.P.S.), the Canadian Institutes of Health Research (no. 10677; T.P.S.), the Canadian Epigenetics, Environment and Health Research Consortium (T.P.S.), the Koerner Foundation (T.P.S.), Genome Canada (OGI-136, J.T.S.), and the Ontario Institute for Cancer Research through funds provided by the Government of Ontario (J.T.S.), Pew Charitable Trust (A.N.B.).
M.A. holds options in Oxford Nanopore Technologies (ONT). M.A. is a paid consultant to ONT. R.E.W., W.T., T.G., J.R.T., J.Q., N.J.L., J.T.S., N.S., A.N.B., M.A., H.E.O., M.J. and M.L. received reimbursement for travel, accommodation and conference fees to speak at events organised by ONT. N.L. has received an honorarium to speak at an ONT company meeting. W.T. has two patents (8,748,091 and 8,394,584) licensed to ONT. M.A. is an inventor on 11 UC patents licensed to ONT (6,267,872, 6,465,193, 6,746,594, 6,936,433, 7,060,50, 8,500,982, 8,679,747, 9,481,908, 9,797,013, 10,059,988, and 10,081,835). J.T.S., M.L. and M.A. received research funding from ONT.
Peer review information Nicole Rusk was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A correction to this article is available online at https://doi.org/10.1038/s41592-019-0697-z.
Integrated supplementary information
a, Observed vs expected k-mers (k = 5) for ~2.9 million native RNA reads. b, Observed versus expected k-mers (k = 5) for ~3.9 million cDNA reads.
Supplementary Figure 2 Nanopore RNA reads recapitulate known features of the human MT-transcriptome.
a, Nanopore poly(A) RNA read coverage is consistent with the tRNA punctuation model of mitochondrial-RNA processing. Dark gray pattern represents read coverage along the heavy strand of the human mitochondrial genome. Labelled colored bars represent protein-coding genes, including known UTRs, or ribosomal RNAs. Regular font text below the colored bars identifies each gene. Yellow bars and red arrows represent the position of tRNA genes along the MT H strand. These tRNA genes are denoted by italicized text. b–d, Evidence for 3′ UTRs in MT-CO1 (b), MT-CO2 (c) and MT-ND5 (d) based on nanopore native RNA sequencing coverage. Dark gray pattern represents base coverage along specific mitochondrial transcripts. In b–d, numbering is base position relative to the human mitochondrial chromosome (chrM) for the hg38 reference. Colored horizontal lines represent protein coding sequences as in Fig. 3a of the main text. Red lines represent known 3′ UTR1. For MT-ND5, the 3′ UTR indicated by nanopore read coverage is 26 nt longer than documented1. This extension is denoted by a dashed red line in d.
Supplementary Figure 3 Commonly observed bicistronic human MT-RNA transcripts documented by nanopore sequencing.
a, Nanopore read-coverage plot of the mitochondrial heavy strand (gray pattern). Dotted red lines mark the predicted limits of bicistronic transcripts that encode mitochondrial ATP synthase protein 8 plus ATP synthase protein 6 (MT-ATP8/MT-ATP6), and mitochondrial NADH-ubiquinone oxidoreductase chain 4L plus NADH-ubiquinone oxidoreductase chain 4 (MT-ND4L/MT-ND4). MT-ATP8/MT-ATP6 and MT-ND4L/MT-ND4 are 841 nt and 1,667 nt long, respectively. b, Detailed view of MT-ATP8/MT-ATP6 coverage. Blue lines represent nominal length of individual gene products MT-ATP8 and MT-ATP6 within contiguous transcripts. Neighboring genes for MT-tRNA lysine (MT-TK) and Cytochrome C Oxidase Subunit 3 (MT-CO3) are marked by yellow and beige lines, respectively. c, Detailed view of MT-ND4L/MT-ND4 coverage. Blue lines represent nominal length of individual gene products MT-ND4L and MT-ND4 within contiguous transcripts. Yellow lines are neighboring genes for MT-tRNA arginine (MT-TR) and MT-tRNA histidine (MT-TH).
Supplementary Figure 4 Nanopore read coverage proximal to polycistronic RNA19 (MT-RNR2+ MT-TL1+MT-ND1).
a, Nanopore poly(A) read coverage of the human MT-RNA genome H strand. Expanded section includes reads that align only to MT-RNR2 or MT-ND1, as well as a smaller number of reads that align to MT-RNR2+MT-TL1+MT-ND1. b, Twenty-five examples of near full length (>2,600 nt) RNA19 transcript reads (total in study = 508 reads). c, Examples of nanopore poly(A) reads that cover full length RNA19 and attached unprocessed MT tRNA at the 5′ end (MT-TV) and at the 3′ end (MT-TI) of RNA19. There were a total of ten reads of this sort in the complete poly(A) dataset. Blue represents base matches to reference; red represents base mismatches to reference; orange represents insertions relative to reference.
Supplementary Figure 5 Example polycistronic transcripts that are observed by nanopore sequencing, that are otherwise difficult to detect.
a, MT-CO1 transcripts bearing OriL nucleotides at their 5′ ends. Gray bars in the panel at top represent base coverage. Most coverage is for either MT-CO1 or for strands corresponding to OriL which is approximately equivalent to the reverse complement of MT-TA, MT-TN, MT-TC, MT-TY which are encoded on the mitochondrial L strand (yellow bars). We could not find documentation for the exact nt limits of OriL. A limited number of transcript reads bridged the gap between OriL and MT-CO1 (red arrow). The multi-colored lines correspond to individual nanopore reads that aligned to the entire length of OriL+MT-CO1. b, Nanopore read of an 8.8 kb polycistronic mitochondrial L strand transcript. The read extends from the 3′ end of MT-TC (position 5,891, MT-genome) to the boundary between MT-ND6 and MT-TE (position 14,673, Mt-genome). An unprocessed tRNA (MT-TS1) is internal to the strand.
Supplementary Figure 6 Recovery of truncated ionic current signal from continuous fast5 files yields more alignable sequence for some RNA strands.
a, Ionic current signal for translocation of a MT-CO1 transcript. It is representative of traces where the read was artificially truncated by a signal anomaly. The red line represents the MinKNOW segmented read (positions 474–1,532 of the MT-CO1 gene), and the magenta line represents the manually segmented and rescued read (positions 27–1,532 of the MT-CO1 gene). The signal in blue was present in the MinKNOW output read fast5 file. Signal in gray was not present in the MinKNOW output read fast5 file, but could be extracted from the continuous fast5 file using BulkVis. The time bar is two seconds. b, Recovery of data at the 3′ end of a read (shaded) using BulkVis. c, Recovery of data at the 5′ end of a read (shaded) using BulkVis. d,e, Effect of additional ionic current data on the mapping coordinates (start and end positions for an alignment) relative to the reference transcript for all MT-CO1 reads in bulk files from Lab 1. A detailed summary of data shown in d and e can be found in Supplementary Table 7. The analysis was performed using 5 experiments that delivered 5 bulk files representing approximately 2 h of continuous data each.
The panel at left is from Lab 1 (12,565 reads) and is representative of results for Labs 2–5. The panel at right is from Lab 6 (17,859 reads). The intensity of blue shading represents the density of the data distribution. The discrete dots at the edge of the distributions represent regions where there is one or only a few data points.
Using FLAIR-correct, misaligned splice sites were corrected to splice sites supported by short-read sequencing. The x axis is the distance from the aligned splice site to the closest annotated splice site in GENCODE v27. The y axis is the number of aligned sites (log-scale) with raw alignment distance counts in blue and corrected counts in yellow.
a, Two candidate isoforms assembled using FLAIR. Each block represents either a complete or a partial exon (numbers 1–4). b, Reads that align to a candidate isoform. Light gray bars represent 25 nt coverage into first and last exons. c, FLAIR-sensitive isoform set that passed criteria shown at arrow. d, FLAIR-stringent isoform set that passed criteria shown at arrow. Isoform 1 failed FLAIR-stringent isoform test (X); isoform 2 passed FLAIR-stringent isoform test.
The y axis is the number of isoforms detected in the FLAIR and GENCODE isoform sets; the x axis is the number of reads subsampled in 10% increments from a total of 8.17 million native RNA reads.
Supplementary Figure 11 Allele-specific expression of XIST. IGV view of reads assigned to the paternal allele (top) or the maternal allele (middle).
Colored bars in the coverage plots indicate SNPs present in the paternal (top) or maternal (middle) allele relative the GRCh38 XIST reference. Purple boxes (insets) highlight two of the numerous SNPs used to assign allele specificity. Lower panel shows a gene model for XIST.
Supplementary Figure 13 Kruskal–Wallis test for poly(A) tail length variance between isoforms of the same gene.
Only genes with at least 500 reads, and isoforms with at least 25 reads were considered. The 50 lowest statistically significant P values out of 215 total are shown. The P values were between 4.41 × 10–106 and 4.17× 10–25.
Gene models for SNHG8 isoforms ENST00000602819 and ENST00000602520 identified in this study.
a, Genome browser view of a segment of the AHR gene in the GRCh38 reference. The top row shows nucleotide position and base sequence. Magenta squares (below the top row) represent putative inosine positions as characterized by RADAR1. Blue lines denote read alignments for nanopore native RNA, and brown lines denote read alignments for nanopore cDNA. White letters are mismatches relative to the GRCh38 AHR reference. White spaces with connecting black lines represent deletions in the alignment. Base miscalls occur in native RNA data at or near putative A-to-I editing sites. G base variants occur at corresponding positions in cDNA data. b, Summary of alignment information using WebLogo 2 for native RNA and cDNA data. Top row is the same ten base motif of the AHR gene as in a, with asterisks denoting putative inosines. Letter size in the logo depicts relative frequency of occurrence.
Supplementary Figures 1–15, Supplementary Note, Supplementary Tables 1, 4, 8–10 and 13.
Native RNA reads by gene. 9.7 million individual pass native RNA reads were aligned to genes in GENCODE v27 using minimap2 (splice aware setting). 20,289 separate genes were identified in these alignments.
Native RNA reads by isoform assignment. 9.7 million individual pass native RNA reads were aligned to isoforms in GENCODE v27 using minimap2 (splice aware setting). 64,241 separate isoforms were identified in these alignments.
k-mer coverage for nanopore native RNA reads aligned to GENCODE isoforms. The read sequences were filtered by length, and only reads that covered 90% or more of the respective reference sequence were chosen. Expected k-mer counts were calculated from the set of reference sequences, and observed k-mer counts were calculated from the set of read sequences.
k-mer coverage for nanopore cDNA reads aligned to GENCODE isoforms. The read sequences were filtered by length and only reads that covered 90% or more of the respective reference sequence were chosen. Expected k-mer counts were calculated from the set of reference sequences and observed k-mer counts were calculated from the set of read sequences.
MT-CO1 reads for which signal was recovered from either the start or end of the original read file. Reads were mapped using minimap2 (standard parameters). Only the subset of reads for which read mappings were improved are shown.
Summary of allele-specificity data for reads containing at least 2 haplotype-informative variants.
Unique isoforms expressed from each of the parental alleles.
Statistics for poly(A) tail length of GENCODE-sensitive genes with greater than 500 reads.
About this article
Cite this article
Workman, R.E., Tang, A.D., Tang, P.S. et al. Nanopore native RNA sequencing of a human poly(A) transcriptome. Nat Methods 16, 1297–1305 (2019) doi:10.1038/s41592-019-0617-2
Genome Biology (2019)