Letter | Published:

Targeted RNA sequencing reveals the deep complexity of the human transcriptome

Nature Biotechnology volume 30, pages 99104 (2012) | Download Citation

Abstract

Transcriptomic analyses have revealed an unexpected complexity to the human transcriptome, whose breadth and depth exceeds current RNA sequencing capability1,2,3,4. Using tiling arrays to target and sequence select portions of the transcriptome, we identify and characterize unannotated transcripts whose rare or transient expression is below the detection limits of conventional sequencing approaches. We use the unprecedented depth of coverage afforded by this technique to reach the deepest limits of the human transcriptome, exposing widespread, regulated and remarkably complex noncoding transcription in intergenic regions, as well as unannotated exons and splicing patterns in even intensively studied protein-coding loci such as p53 and HOX. The data also show that intermittent sequenced reads observed in conventional RNA sequencing data sets, previously dismissed as noise, are in fact indicative of unassembled rare transcripts. Collectively, these results reveal the range, depth and complexity of a human transcriptome that is far from fully characterized.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Gene Expression Omnibus

References

  1. 1.

    et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

  2. 2.

    et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).

  3. 3.

    et al. Antisense transcription in the mammalian transcriptome. Science 309, 1564–1566 (2005).

  4. 4.

    , , & Response to “the reality of pervasive transcription”. PLoS Biol. 9, e1001102 (2011).

  5. 5.

    et al. The reality of pervasive transcription. PLoS Biol. 9, e1000625 (2011).

  6. 6.

    , , & Most “dark matter” transcripts are associated with known genes. PLoS Biol. 8, e1000371 (2010).

  7. 7.

    et al. Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts. Genome Biol. 10, R115 (2009).

  8. 8.

    et al. Systematic comparison of three genomic enrichment methods for massively parallel DNA sequencing. Genome Res. 20, 1420–1431 (2010).

  9. 9.

    et al. A solution hybridization assay for ribosomal RNA from bacteria using biotinylated DNA probes and enzyme-labeled antibody to DNA:RNA. Mol. Cell. Probes 1, 177–193 (1987).

  10. 10.

    et al. Novel transcribed sequences within the BWS/WT2 region in 11p15.5: tissue-specific expression correlates with cancer type. Genomics 46, 355–363 (1997).

  11. 11.

    , , , & Anatomic demarcation by positional variation in fibroblast gene expression programs. PLoS Genet. 2, e119 (2006).

  12. 12.

    et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

  13. 13.

    et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007).

  14. 14.

    et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat. Genet. 42, 969–972 (2010).

  15. 15.

    et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).

  16. 16.

    et al. Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays. Genome Res. 15, 987–997 (2005).

  17. 17.

    , , , & Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).

  18. 18.

    & The isoforms of the p53 protein. Cold Spring Harb. Perspect. Biol. 2, a000927 (2010).

  19. 19.

    & p53 isoforms gain functions. Oncogene 29, 5113–5119 (2010).

  20. 20.

    , & Binary function of mRNA. Biochimie 93, 1955–1961 (2011).

  21. 21.

    , & Long non-coding RNAs: insights into functions. Nat. Rev. Genet. 10, 155–159 (2009).

  22. 22.

    et al. Long noncoding RNA as modular scaffold of histone modification complexes. Science 329, 689–693 (2010).

  23. 23.

    & Widespread and subtle: alternative splicing at short-distance tandem sites. Trends Genet. 24, 246–255 (2008).

  24. 24.

    , & Assembly of large genomes using second-generation sequencing. Genome Res. 20, 1165–1173 (2010).

  25. 25.

    et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 11667–11672 (2009).

  26. 26.

    , & TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

  27. 27.

    et al. Transcript copy number estimation using a mouse whole-genome oligonucleotide microarray. Genome Biol. 6, R61 (2005).

  28. 28.

    et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).

  29. 29.

    et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 1160–1167 (2011).

  30. 30.

    et al. A rapid, extensive, and transient transcriptional response to estrogen signaling in breast cancer cells. Cell 145, 622–634 (2011).

  31. 31.

    , , & WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–141 (2006).

  32. 32.

    et al. Repeat subtraction-mediated sequence capture from a complex genome. Plant J. 62, 898–909 (2010).

  33. 33.

    & BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

  34. 34.

    et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  35. 35.

    et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

  36. 36.

    et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345 (2007).

  37. 37.

    et al. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res. 17, 1823–1836 (2007).

  38. 38.

    , , , & UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).

Download references

Acknowledgements

The authors would like to thank M. Garber for his design of the arrays and constructive contribution to the manuscript; M. Koziol and K. Thomas for preparing cDNA and sequencing libraries; T. Albert, T. Arnold, J. Affourtit, B. Dessany, T. Jarvie, D. Green and T. Millard provided sequencing support. The authors would like to thank the following funding sources: Human Frontiers Science Program (to T.R.M.); Queensland Government Department of Employment, Economic Development and Innovation Smart Futures Fellowship (to M.E.D.); Australian Research Council/University of Queensland co-sponsored Federation Fellowship (FF0561986; to J.S.M.); Australian National Health and Medical Research Council Australia Fellowship (631668; to J.S.M.) and Career Development Award (CDA631542; to M.E.D.); Damon Runyon-Rachleff, Searle, Smith Family Foundation and Richard Merkin Foundation Scholar (to J.L.R.); and US National Institutes of Health (1DP2OD00667-01; to J.L.R. and C.T.).

Author information

Affiliations

  1. Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia.

    • Tim R Mercer
    • , Marcel E Dinger
    • , Joanna Crawford
    •  & John S Mattick
  2. Roche NimbleGen Inc., Research and Development, Madison, Wisconsin, USA.

    • Daniel J Gerhardt
    •  & Jeffrey A Jeddeloh
  3. Department of Stem Cell & Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA.

    • Cole Trapnell
    •  & John L Rinn

Authors

  1. Search for Tim R Mercer in:

  2. Search for Daniel J Gerhardt in:

  3. Search for Marcel E Dinger in:

  4. Search for Joanna Crawford in:

  5. Search for Cole Trapnell in:

  6. Search for Jeffrey A Jeddeloh in:

  7. Search for John S Mattick in:

  8. Search for John L Rinn in:

Contributions

T.R.M., J.A.J., J.S.M. and J.L.R. designed the experiments. D.J.G. performed array capture, quality assessments and supported the sequencing teams. J.C. performed RT-PCR. M.E.D., T.R.M. and C.T. performed alignment, transcript assembly and analysis. T.R.M., M.E.D., J.A.J., J.S.M. and J.L.R. wrote the manuscript.

Competing interests

T.R.M., M.E.D., C.T., J.C., J.S.M. and J.L.R. declare no competing financial interests. D.J.G. and J.A.J are employees of Roche NimbleGen, Inc.

Corresponding authors

Correspondence to Jeffrey A Jeddeloh or John S Mattick or John L Rinn.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–8 and Supplementary Results

Excel files

  1. 1.

    Supplementary Tables 1-4

    Summary of library sequencing and alignment employed within study.

Zip files

  1. 1.

    Supplementary Data

    Size and genome corrdinates of probed regions

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.2024

Further reading