A single-molecule long-read survey of the human transcriptome

Sharon, Donald; Tilgner, Hagen; Grubert, Fabian; Snyder, Michael

doi:10.1038/nbt.2705

Article
Published: 01 November 2013

A single-molecule long-read survey of the human transcriptome

Donald Sharon^1,2^na1,
Hagen Tilgner¹^na1,
Fabian Grubert¹ &
…
Michael Snyder¹

Nature Biotechnology volume 31, pages 1009–1014 (2013)Cite this article

20k Accesses
410 Citations
83 Altmetric
Metrics details

Subjects

An Erratum to this article was published on 10 March 2014

This article has been updated

Abstract

Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5′ to 3′ end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5′ ends. For longer RNA molecules more 5′ nucleotides are missing, but complete intron structures are often preserved. In total, we identify ∼14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Completeness of cDNA molecules.**

**Figure 2: Assessment of completeness of CCS reads in controlled environments.**

**Figure 3: Exon-intron structure of molecules.**

**Figure 4: Analysis of unannotated transcripts.**

The RNA Atlas expands the catalog of human non-coding RNAs

Article 17 June 2021

A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes

Article Open access 31 July 2019

Partitioning RNAs by length improves transcriptome reconstruction from short-read RNA-seq data

Article 10 January 2022

Accession codes

Primary accessions

European Nucleotide Archive

PRJEB3969

Change history

25 November 2013
In the version of this article initially published, the accession code for data was left out. The error has been corrected in the HTML and PDF versions of the article.

References

Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008).
Article CAS Google Scholar
Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Article CAS Google Scholar
Sultan, M. et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321, 956–960 (2008).
Article CAS Google Scholar
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
Article CAS Google Scholar
Wilhelm, B.T. et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453, 1239–1243 (2008).
Article CAS Google Scholar
Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).
Article CAS Google Scholar
Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).
Article CAS Google Scholar
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).
Article CAS Google Scholar
Quail, M.A. et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13, 341 (2012).
Article CAS Google Scholar
Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).
Article CAS Google Scholar
Au, K.F., Underwood, J.G., Lee, L. & Wong, W.H. Improving PacBio long read accuracy by short read alignment. PLoS ONE 7, e46679 (2012).
Article CAS Google Scholar
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Article CAS Google Scholar
Travers, K.J., Chin, C.S., Rank, D.R., Eid, J.S. & Turner, S.W. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 38, e159 (2010).
Article Google Scholar
Tilgner, H. et al. Accurate identification and analysis of human mRNA isoforms using deep long read sequencing. G3 (Bethesda) 3, 387–397 (2013).
Article CAS Google Scholar
Wu, T.D. & Watanabe, C.K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Article CAS Google Scholar
R Development Core Team. R: A language and environment for statistical computing http://www.R-project.org/ (R Foundation for Statistical Computing, Vienna, Austria, 2012).
van Bakel, H., Nislow, C., Blencowe, B.J. & Hughes, T.R. Most “dark matter” transcripts are associated with known genes. PLoS Biol. 8, e1000371 (2010).
Article Google Scholar
Daley, T. & Smith, A.D. Predicting the molecular complexity of sequencing libraries. Nat. Methods 10, 325–327 (2013).
Article CAS Google Scholar
Pickrell, J.K., Pai, A.A., Gilad, Y. & Pritchard, J.K. Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet. 6, e1001236 (2010).
Article Google Scholar
Fagnani, M. et al. Functional coordination of alternative splicing in the mammalian central nervous system. Genome Biol. 8, R108 (2007).
Article Google Scholar
Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).
Article CAS Google Scholar
Parra, G., Blanco, E. & Guigó, R. GeneID in Drosophila. Genome Res. 10, 511–515 (2000).
Article CAS Google Scholar
Eyras, E., Caccamo, M., Curwen, V. & Clamp, M. ESTGenes: alternative splicing from ESTs in Ensembl. Genome Res. 14, 976–987 (2004).
Article CAS Google Scholar
Cabili, M.N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).
Article CAS Google Scholar
Gingeras, T. Missing lincs in the transcriptome. Nat. Biotechnol. 27, 346–347 (2009).
Article CAS Google Scholar
Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).
Article CAS Google Scholar
Himmelmann, L. & R Development Core Team. R: A language and environment for statistical computing http://cran.r-project.org/web/packages/HMM/HMM.pdf (R Foundation for Statistical Computing, Vienna, Austria, 2010).

Download references

Acknowledgements

We thank J. Eid and L. Hickey at Pacific Biosciences for providing alignment statistics of reads to reference genomes. We thank J. Kelley, C. Araya, D. Phanstiel, S. Shringarpure and M. Sikora at Stanford as well as J. Korlach at Pacific Biosciences for comments on this manuscript. We would like to also thank T. Daley and A. Smith at USC for advice on modeling library complexity. This work was supported by US National Institutes of Health grants 5P01GM099130-02, 5U54HG00699602-02 and 5U01HL107393-03, and by the US National Institues of Health training grant 5 T32 HD07149.

Author information

Donald Sharon and Hagen Tilgner: These authors contributed equally to this work.

Authors and Affiliations

Department of Genetics, Stanford University, Stanford, California, USA
Donald Sharon, Hagen Tilgner, Fabian Grubert & Michael Snyder
Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut, USA
Donald Sharon

Authors

Donald Sharon
View author publications
You can also search for this author in PubMed Google Scholar
Hagen Tilgner
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Grubert
View author publications
You can also search for this author in PubMed Google Scholar
Michael Snyder
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors proposed the project. D.S. devised and performed experiments and wrote the first version of the introduction. H.T. devised and performed analysis, prepared figures and wrote the first version of results and discussion. All authors discussed experiments and analysis and collaborated on the final version.

Corresponding author

Correspondence to Michael Snyder.

Ethics declarations

Competing interests

M.S. is on the scientific advisory board of Personalis and GenapSys. All other authors declare no competing interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6 (PDF 904 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sharon, D., Tilgner, H., Grubert, F. et al. A single-molecule long-read survey of the human transcriptome. Nat Biotechnol 31, 1009–1014 (2013). https://doi.org/10.1038/nbt.2705

Download citation

Received: 01 March 2013
Accepted: 03 September 2013
Published: 01 November 2013
Issue Date: November 2013
DOI: https://doi.org/10.1038/nbt.2705

This article is cited by

Full-length transcriptome and RNA-Seq analyses reveal the resistance mechanism of sesame in response to Corynespora cassiicola
- Min Jia
- Yunxia Ni
- Hongyan Liu
BMC Plant Biology (2024)
SQANTI-SIM: a simulator of controlled transcript novelty for lrRNA-seq benchmark
- Jorge Mestre-Tomás
- Tianyuan Liu
- Ana Conesa
Genome Biology (2023)
Full-length transcriptome characterization and comparative analysis of Gleditsia sinensis
- Feng Xiao
- Yang Zhao
- Xueyan Jian
BMC Genomics (2023)
De novo full-length transcriptome analysis of two ecotypes of Phragmites australis (swamp reed and dune reed) provides new insights into the transcriptomic complexity of dune reed and its long-term adaptation to desert environments
- Jipeng Cui
- Tianhang Qiu
- Suxia Cui
BMC Genomics (2023)
NtMYB12 requires for competition between flavonol and (pro)anthocyanin biosynthesis in Narcissus tazetta tepals
- Jingwen Yang
- Xi Wu
- Ying Miao
Molecular Horticulture (2023)