Article | Published:

Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs

Nature Biotechnology volume 28, pages 503510 (2010) | Download Citation

  • A Corrigendum to this article was published on 01 July 2010

This article has been updated

Abstract

Massively parallel cDNA sequencing (RNA-Seq) provides an unbiased way to study a transcriptome, including both coding and noncoding genes. Until now, most RNA-Seq studies have depended crucially on existing annotations and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We applied it to mouse embryonic stem cells, neuronal precursor cells and lung fibroblasts to accurately reconstruct the full-length gene structures for most known expressed genes. We identified substantial variation in protein coding genes, including thousands of novel 5′ start sites, 3′ ends and internal coding exons. We then determined the gene structures of more than a thousand large intergenic noncoding RNA (lincRNA) and antisense loci. Our results open the way to direct experimental manipulation of thousands of noncoding RNAs and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Change history

  • 09 July 2010

    In the version of this article initially published, the fourth sentence in the methods section “RNA extraction and library preparation” instead of saying a “procedure that combines a random priming step with a shearing step8,9,28 and results in fragments of ~700 bp in size” should have read, “procedure that combines fragmentation of mRNA to a peak size of ~750 nucleotides by heating6 followed by random-primed reverse transcription8.”. The error has been corrected in the HTML and PDF versions of the article.

Accessions

Gene Expression Omnibus

References

  1. 1.

    et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).

  2. 2.

    et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316, 1484–1488 (2007).

  3. 3.

    et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).

  4. 4.

    et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).

  5. 5.

    et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 11667–11672 (2009).

  6. 6.

    et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5, 613–619 (2008).

  7. 7.

    et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

  8. 8.

    , , , & Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

  9. 9.

    et al. Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc. Natl. Acad. Sci. USA 106, 3264–3269 (2009).

  10. 10.

    , , , & Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).

  11. 11.

    et al. Transcriptome sequencing to detect gene fusions in cancer. Nature 458, 97–101 (2009).

  12. 12.

    et al. De novo transcriptome assembly with ABySS. Bioinformatics 25, 2872–2877 (2009).

  13. 13.

    , & TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

  14. 14.

    et al. Annotating genomes with massive-scale RNA sequencing. Genome Biol. 9, R175 (2008).

  15. 15.

    , & NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).

  16. 16.

    et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007).

  17. 17.

    , , & Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes. PLOS Comput. Biol. 4, e1000067 (2008).

  18. 18.

    et al. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res. 17, 1823–1836 (2007).

  19. 19.

    et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009).

  20. 20.

    et al. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature 349, 38–44 (1991).

  21. 21.

    et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007).

  22. 22.

    et al. A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science 309, 1570–1573 (2005).

  23. 23.

    , , , & Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750–756 (2008).

  24. 24.

    et al. Antisense transcription in the mammalian transcriptome. Science 309, 1564–1566 (2005).

  25. 25.

    Q. et al. Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proc. Natl. Acad. Sci. USA 107, 5254–5259 (2010).

  26. 26.

    , , & An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data. PLOS Comput. Biol. 5, e1000598 (2009).

  27. 27.

    et al. Niche-independent symmetrical self-renewal of a mammalian tissue stem cell. PLoS Biol. 3, e283 (2005).

  28. 28.

    F. et al. Integrative analysis of the melanoma transcriptome. Genome Res. 20, 413–427 (2010).

  29. 29.

    et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133, 523–536 (2008).

  30. 30.

    , , & Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

  31. 31.

    & Statistical Methods in Bioinformatics: An Introduction 2nd edn. (Springer, 2005).

  32. 32.

    , & Scan Statistics (Springer, 2001).

Download references

Acknowledgements

We thank M. Wernig (MIT) for providing NPC; M. Lin and M. Kellis (MIT) for CSF code; the Broad Sequencing Platform for sample sequencing; L. Gaffney for assistance with graphics; and C. Burge, J. Merkin, R. Bradley and members of Lander and Regev laboratories—in particular, M. Yassour, T. Mikkelsen and I. Amit—for discussions. A.R. and J.L.R. were supported by the Merkin Family Foundation for Stem Cell Research at the Broad Institute. M. Guttman was supported by a Vertex scholarship. Work was supported by a Burroughs Wellcome Fund Career Award at the Scientific Interface, a US National Institutes of Health PIONEER award, a US National Human Genome Research Institute (NHGRI) R01 grant and the Howard Hughes Medical Institute (A.R.), and NHGRI and the Broad Institute of MIT and Harvard (E.S.L.).

Author information

Author notes

    • Mitchell Guttman
    •  & Manuel Garber

    These authors contributed equally to this work.

Affiliations

  1. Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Mitchell Guttman
    • , Manuel Garber
    • , Joshua Z Levin
    • , Julie Donaghey
    • , James Robinson
    • , Xian Adiconis
    • , Lin Fan
    • , Magdalena J Koziol
    • , Andreas Gnirke
    • , Chad Nusbaum
    • , John L Rinn
    • , Eric S Lander
    •  & Aviv Regev
  2. Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Mitchell Guttman
    • , Eric S Lander
    •  & Aviv Regev
  3. Department of Pathology, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA.

    • Magdalena J Koziol
    •  & John L Rinn
  4. Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA.

    • Eric S Lander
  5. Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Aviv Regev

Authors

  1. Search for Mitchell Guttman in:

  2. Search for Manuel Garber in:

  3. Search for Joshua Z Levin in:

  4. Search for Julie Donaghey in:

  5. Search for James Robinson in:

  6. Search for Xian Adiconis in:

  7. Search for Lin Fan in:

  8. Search for Magdalena J Koziol in:

  9. Search for Andreas Gnirke in:

  10. Search for Chad Nusbaum in:

  11. Search for John L Rinn in:

  12. Search for Eric S Lander in:

  13. Search for Aviv Regev in:

Contributions

M. Guttman and M. Garber conceived the project, designed research, implemented Scripture, performed computational analysis and wrote the paper. A.G., C.N. and J.Z.L. oversaw cDNA sequencing, provided molecular biology advice and helped to edit the manuscript. J.D. constructed cDNA libraries, performed validation experiments and helped to edit the manuscript. J.R. implemented components of Scripture and provided computational support and technical advice. X.A., L.F. and M.J.K. constructed cDNA libraries. J.L.R. provided reagents and helped edit the manuscript. E.S.L. designed research direction and wrote the paper. A.R. provided cDNA sequencing guidance, conceived the project, designed research direction and wrote the paper.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Mitchell Guttman or Manuel Garber or Aviv Regev.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Notes 1 and 2, Supplementary Figures 1–7

Excel files

  1. 1.

    Supplementary Table 1

    Number of novel transcriptional events in ES, MLF and NPC

  2. 2.

    Supplementary Table 2

    Primer sequences used for validation of novel events

Zip files

  1. 1.

    Supplementary Software

    scripture.jar scripture.src.tgz

  2. 2.

    Supplementary Data

    ES.gff.gz ESTranscriptGraphs.tar.gz

  3. 3.

    Supplementary Data

    MLF.gff.gz MLFTranscriptGraphs.tar.gz

  4. 4.

    Supplementary Data

    NPC.gff.gz NPCTranscriptGraphs.tar.gz

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.1633

Further reading