Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A single-molecule long-read survey of the human transcriptome

An Erratum to this article was published on 10 March 2014

This article has been updated

Abstract

Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5′ to 3′ end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5′ ends. For longer RNA molecules more 5′ nucleotides are missing, but complete intron structures are often preserved. In total, we identify 14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Completeness of cDNA molecules.
Figure 2: Assessment of completeness of CCS reads in controlled environments.
Figure 3: Exon-intron structure of molecules.
Figure 4: Analysis of unannotated transcripts.

Similar content being viewed by others

Accession codes

Primary accessions

European Nucleotide Archive

Change history

  • 25 November 2013

    In the version of this article initially published, the accession code for data was left out. The error has been corrected in the HTML and PDF versions of the article.

References

  1. Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008).

    Article  CAS  Google Scholar 

  2. Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

    Article  CAS  Google Scholar 

  3. Sultan, M. et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321, 956–960 (2008).

    Article  CAS  Google Scholar 

  4. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

    Article  CAS  Google Scholar 

  5. Wilhelm, B.T. et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453, 1239–1243 (2008).

    Article  CAS  Google Scholar 

  6. Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

    Article  CAS  Google Scholar 

  7. Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).

    Article  CAS  Google Scholar 

  8. Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).

    Article  CAS  Google Scholar 

  9. Quail, M.A. et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13, 341 (2012).

    Article  CAS  Google Scholar 

  10. Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).

    Article  CAS  Google Scholar 

  11. Au, K.F., Underwood, J.G., Lee, L. & Wong, W.H. Improving PacBio long read accuracy by short read alignment. PLoS ONE 7, e46679 (2012).

    Article  CAS  Google Scholar 

  12. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    Article  CAS  Google Scholar 

  13. Travers, K.J., Chin, C.S., Rank, D.R., Eid, J.S. & Turner, S.W. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 38, e159 (2010).

    Article  Google Scholar 

  14. Tilgner, H. et al. Accurate identification and analysis of human mRNA isoforms using deep long read sequencing. G3 (Bethesda) 3, 387–397 (2013).

    Article  CAS  Google Scholar 

  15. Wu, T.D. & Watanabe, C.K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).

    Article  CAS  Google Scholar 

  16. R Development Core Team. R: A language and environment for statistical computing http://www.R-project.org/ (R Foundation for Statistical Computing, Vienna, Austria, 2012).

  17. van Bakel, H., Nislow, C., Blencowe, B.J. & Hughes, T.R. Most “dark matter” transcripts are associated with known genes. PLoS Biol. 8, e1000371 (2010).

    Article  Google Scholar 

  18. Daley, T. & Smith, A.D. Predicting the molecular complexity of sequencing libraries. Nat. Methods 10, 325–327 (2013).

    Article  CAS  Google Scholar 

  19. Pickrell, J.K., Pai, A.A., Gilad, Y. & Pritchard, J.K. Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet. 6, e1001236 (2010).

    Article  Google Scholar 

  20. Fagnani, M. et al. Functional coordination of alternative splicing in the mammalian central nervous system. Genome Biol. 8, R108 (2007).

    Article  Google Scholar 

  21. Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).

    Article  CAS  Google Scholar 

  22. Parra, G., Blanco, E. & Guigó, R. GeneID in Drosophila. Genome Res. 10, 511–515 (2000).

    Article  CAS  Google Scholar 

  23. Eyras, E., Caccamo, M., Curwen, V. & Clamp, M. ESTGenes: alternative splicing from ESTs in Ensembl. Genome Res. 14, 976–987 (2004).

    Article  CAS  Google Scholar 

  24. Cabili, M.N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).

    Article  CAS  Google Scholar 

  25. Gingeras, T. Missing lincs in the transcriptome. Nat. Biotechnol. 27, 346–347 (2009).

    Article  CAS  Google Scholar 

  26. Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).

    Article  CAS  Google Scholar 

  27. Himmelmann, L. & R Development Core Team. R: A language and environment for statistical computing http://cran.r-project.org/web/packages/HMM/HMM.pdf (R Foundation for Statistical Computing, Vienna, Austria, 2010).

Download references

Acknowledgements

We thank J. Eid and L. Hickey at Pacific Biosciences for providing alignment statistics of reads to reference genomes. We thank J. Kelley, C. Araya, D. Phanstiel, S. Shringarpure and M. Sikora at Stanford as well as J. Korlach at Pacific Biosciences for comments on this manuscript. We would like to also thank T. Daley and A. Smith at USC for advice on modeling library complexity. This work was supported by US National Institutes of Health grants 5P01GM099130-02, 5U54HG00699602-02 and 5U01HL107393-03, and by the US National Institues of Health training grant 5 T32 HD07149.

Author information

Authors and Affiliations

Authors

Contributions

All authors proposed the project. D.S. devised and performed experiments and wrote the first version of the introduction. H.T. devised and performed analysis, prepared figures and wrote the first version of results and discussion. All authors discussed experiments and analysis and collaborated on the final version.

Corresponding author

Correspondence to Michael Snyder.

Ethics declarations

Competing interests

M.S. is on the scientific advisory board of Personalis and GenapSys. All other authors declare no competing interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6 (PDF 904 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sharon, D., Tilgner, H., Grubert, F. et al. A single-molecule long-read survey of the human transcriptome. Nat Biotechnol 31, 1009–1014 (2013). https://doi.org/10.1038/nbt.2705

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.2705

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing