Targeted RNA sequencing reveals the deep complexity of the human transcriptome

Mercer, Tim R; Gerhardt, Daniel J; Dinger, Marcel E; Crawford, Joanna; Trapnell, Cole; Jeddeloh, Jeffrey A; Mattick, John S; Rinn, John L

doi:10.1038/nbt.2024

Letter
Published: 13 November 2011

Targeted RNA sequencing reveals the deep complexity of the human transcriptome

Tim R Mercer¹,
Daniel J Gerhardt²,
Marcel E Dinger¹,
Joanna Crawford¹,
Cole Trapnell³,
Jeffrey A Jeddeloh²,
John S Mattick¹ &
…
John L Rinn³

Nature Biotechnology volume 30, pages 99–104 (2012)Cite this article

13k Accesses
342 Citations
55 Altmetric
Metrics details

Subjects

Abstract

Transcriptomic analyses have revealed an unexpected complexity to the human transcriptome, whose breadth and depth exceeds current RNA sequencing capability^1,2,3,4. Using tiling arrays to target and sequence select portions of the transcriptome, we identify and characterize unannotated transcripts whose rare or transient expression is below the detection limits of conventional sequencing approaches. We use the unprecedented depth of coverage afforded by this technique to reach the deepest limits of the human transcriptome, exposing widespread, regulated and remarkably complex noncoding transcription in intergenic regions, as well as unannotated exons and splicing patterns in even intensively studied protein-coding loci such as p53 and HOX. The data also show that intermittent sequenced reads observed in conventional RNA sequencing data sets, previously dismissed as noise, are in fact indicative of unassembled rare transcripts. Collectively, these results reveal the range, depth and complexity of a human transcriptome that is far from fully characterized.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Circle plots illustrating the prevalence and complexity of captured transcripts at genic (a) and intergenic (b) loci.**

**Figure 2: Resolution of unannotated p53 isoforms.**

**Figure 3: Identification of unannotated exon variants and rare intergenic noncoding RNAs by targeted RNA capture and sequencing.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Single-cell long-read sequencing-based mapping reveals specialized splicing patterns in developing and adult mouse and human brain

Article Open access 09 April 2024

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Accession codes

Accessions

Gene Expression Omnibus

GSE29041

References

Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).
Article CAS Google Scholar
Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).
Article CAS Google Scholar
Katayama, S. et al. Antisense transcription in the mammalian transcriptome. Science 309, 1564–1566 (2005).
Article Google Scholar
van Bakel, H., Nislow, C., Blencowe, B.J. & Hughes, T.R. Response to “the reality of pervasive transcription”. PLoS Biol. 9, e1001102 (2011).
Article CAS Google Scholar
Clark, M.B. et al. The reality of pervasive transcription. PLoS Biol. 9, e1000625 (2011).
Article CAS Google Scholar
van Bakel, H., Nislow, C., Blencowe, B.J. & Hughes, T.R. Most “dark matter” transcripts are associated with known genes. PLoS Biol. 8, e1000371 (2010).
Article Google Scholar
Levin, J.Z. et al. Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts. Genome Biol. 10, R115 (2009).
Article Google Scholar
Teer, J.K. et al. Systematic comparison of three genomic enrichment methods for massively parallel DNA sequencing. Genome Res. 20, 1420–1431 (2010).
Article CAS Google Scholar
Yehle, C.O. et al. A solution hybridization assay for ribosomal RNA from bacteria using biotinylated DNA probes and enzyme-labeled antibody to DNA:RNA. Mol. Cell. Probes 1, 177–193 (1987).
Article CAS Google Scholar
Crider-Miller, S.J. et al. Novel transcribed sequences within the BWS/WT2 region in 11p15.5: tissue-specific expression correlates with cancer type. Genomics 46, 355–363 (1997).
Article CAS Google Scholar
Rinn, J.L., Bondre, C., Gladstone, H.B., Brown, P.O. & Chang, H.Y. Anatomic demarcation by positional variation in fibroblast gene expression programs. PLoS Genet. 2, e119 (2006).
Article Google Scholar
Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).
Article CAS Google Scholar
Rinn, J.L. et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007).
Article CAS Google Scholar
Li, Y. et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat. Genet. 42, 969–972 (2010).
Article CAS Google Scholar
Ng, S.B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).
Article CAS Google Scholar
Kapranov, P. et al. Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays. Genome Res. 15, 987–997 (2005).
Article CAS Google Scholar
Pan, Q., Shai, O., Lee, L.J., Frey, B.J. & Blencowe, B.J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).
Article CAS Google Scholar
Khoury, M.P. & Bourdon, J.C. The isoforms of the p53 protein. Cold Spring Harb. Perspect. Biol. 2, a000927 (2010).
Article Google Scholar
Olivares-Illana, V. & Fahraeus, R. p53 isoforms gain functions. Oncogene 29, 5113–5119 (2010).
Article CAS Google Scholar
Kloc, M., Foreman, V. & Reddy, S.A. Binary function of mRNA. Biochimie 93, 1955–1961 (2011).
Article CAS Google Scholar
Mercer, T.R., Dinger, M.E. & Mattick, J.S. Long non-coding RNAs: insights into functions. Nat. Rev. Genet. 10, 155–159 (2009).
Article CAS Google Scholar
Tsai, M.C. et al. Long noncoding RNA as modular scaffold of histone modification complexes. Science 329, 689–693 (2010).
Article CAS Google Scholar
Hiller, M. & Platzer, M. Widespread and subtle: alternative splicing at short-distance tandem sites. Trends Genet. 24, 246–255 (2008).
Article CAS Google Scholar
Schatz, M.C., Delcher, A.L. & Salzberg, S.L. Assembly of large genomes using second-generation sequencing. Genome Res. 20, 1165–1173 (2010).
Article CAS Google Scholar
Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 11667–11672 (2009).
Article CAS Google Scholar
Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Article CAS Google Scholar
Carter, M.G. et al. Transcript copy number estimation using a mouse whole-genome oligonucleotide microarray. Genome Biol. 6, R61 (2005).
Article Google Scholar
Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).
Article CAS Google Scholar
Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 1160–1167 (2011).
Article CAS Google Scholar
Hah, N. et al. A rapid, extensive, and transient transcriptional response to estrogen signaling in breast cancer cells. Cell 145, 622–634 (2011).
Article CAS Google Scholar
Morgulis, A., Gertz, E.M., Schaffer, A.A. & Agarwala, R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–141 (2006).
Article CAS Google Scholar
Fu, Y. et al. Repeat subtraction-mediated sequence capture from a complex genome. Plant J. 62, 898–909 (2010).
Article CAS Google Scholar
Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article Google Scholar
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Article CAS Google Scholar
Kong, L. et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345 (2007).
Article Google Scholar
Lin, M.F. et al. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res. 17, 1823–1836 (2007).
Article CAS Google Scholar
Suzek, B.E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C.H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).
Article CAS Google Scholar

Download references

Acknowledgements

The authors would like to thank M. Garber for his design of the arrays and constructive contribution to the manuscript; M. Koziol and K. Thomas for preparing cDNA and sequencing libraries; T. Albert, T. Arnold, J. Affourtit, B. Dessany, T. Jarvie, D. Green and T. Millard provided sequencing support. The authors would like to thank the following funding sources: Human Frontiers Science Program (to T.R.M.); Queensland Government Department of Employment, Economic Development and Innovation Smart Futures Fellowship (to M.E.D.); Australian Research Council/University of Queensland co-sponsored Federation Fellowship (FF0561986; to J.S.M.); Australian National Health and Medical Research Council Australia Fellowship (631668; to J.S.M.) and Career Development Award (CDA631542; to M.E.D.); Damon Runyon-Rachleff, Searle, Smith Family Foundation and Richard Merkin Foundation Scholar (to J.L.R.); and US National Institutes of Health (1DP2OD00667-01; to J.L.R. and C.T.).

Author information

Authors and Affiliations

Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
Tim R Mercer, Marcel E Dinger, Joanna Crawford & John S Mattick
Roche NimbleGen Inc., Research and Development, Madison, Wisconsin, USA
Daniel J Gerhardt & Jeffrey A Jeddeloh
Department of Stem Cell & Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA
Cole Trapnell & John L Rinn

Authors

Tim R Mercer
View author publications
You can also search for this author in PubMed Google Scholar
Daniel J Gerhardt
View author publications
You can also search for this author in PubMed Google Scholar
Marcel E Dinger
View author publications
You can also search for this author in PubMed Google Scholar
Joanna Crawford
View author publications
You can also search for this author in PubMed Google Scholar
Cole Trapnell
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey A Jeddeloh
View author publications
You can also search for this author in PubMed Google Scholar
John S Mattick
View author publications
You can also search for this author in PubMed Google Scholar
John L Rinn
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.R.M., J.A.J., J.S.M. and J.L.R. designed the experiments. D.J.G. performed array capture, quality assessments and supported the sequencing teams. J.C. performed RT-PCR. M.E.D., T.R.M. and C.T. performed alignment, transcript assembly and analysis. T.R.M., M.E.D., J.A.J., J.S.M. and J.L.R. wrote the manuscript.

Corresponding authors

Correspondence to Jeffrey A Jeddeloh, John S Mattick or John L Rinn.

Ethics declarations

Competing interests

T.R.M., M.E.D., C.T., J.C., J.S.M. and J.L.R. declare no competing financial interests. D.J.G. and J.A.J are employees of Roche NimbleGen, Inc.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mercer, T., Gerhardt, D., Dinger, M. et al. Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat Biotechnol 30, 99–104 (2012). https://doi.org/10.1038/nbt.2024

Download citation

Received: 05 August 2011
Accepted: 04 October 2011
Published: 13 November 2011
Issue Date: January 2012
DOI: https://doi.org/10.1038/nbt.2024

This article is cited by

The BulkECexplorer compiles endothelial bulk transcriptomes to predict functional versus leaky transcription
- James T. Brash
- Guillermo Diez-Pinel
- Christiana Ruhrberg
Nature Cardiovascular Research (2024)
Blocking Abundant RNA Transcripts by High-Affinity Oligonucleotides during Transcriptome Library Preparation
- Celine Everaert
- Jasper Verwilt
- Pieter Mestdagh
Biological Procedures Online (2023)
Evidence for widespread existence of functional novel and non-canonical human transcripts
- Dongyang Xu
- Lu Tang
- Philipp Kapranov
BMC Biology (2023)
Methodological considerations for aqueous environmental RNA collection, preservation, and extraction
- Toshiaki S. Jo
Analytical Sciences (2023)
Non-coding RNAs in human health and disease: potential function as biomarkers and therapeutic targets
- Tamizhini Loganathan
- George Priya Doss C
Functional & Integrative Genomics (2023)