Article | Published:

Mapping and quantifying mammalian transcriptomes by RNA-Seq

Nature Methods volume 5, pages 621628 (2008) | Download Citation

Subjects

Abstract

We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 105 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

GenBank/EMBL/DDBJ

References

  1. 1.

    , & In situ analysis of cross-hybridisation on microarrays and inference of expression correlation. BMC Bioinformatics 8, 461 (2007).

  2. 2.

    et al. Replacing cRNA targets with cDNA reduces microarray cross-hybridization. Nat. Biotechnol. 24, 1071–1073 (2006).

  3. 3.

    & Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations. BMC Bioinformatics 7, 276 (2006).

  4. 4.

    , , & Serial analysis of gene expression. Science 270, 484–487 (1995).

  5. 5.

    & Tag-based approaches for transcriptome research and genome annotation. Nat. Methods 2, 495–502 (2005).

  6. 6.

    et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 18, 630–634 (2000).

  7. 7.

    & Gene discovery in dbEST. Science 265, 1993–1994 (1994).

  8. 8.

    et al. The status, quality, and expansion of the NIH full-length cDNA project: The Mammalian Gene Collection (MGC). Genome Res. 14, 2121–2127 (2004).

  9. 9.

    et al. Shotgun sequencing of the human transcriptome with ORF expressed sequence tags. Proc. Natl. Acad. Sci. USA 97, 3491–3496 (2000).

  10. 10.

    et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).

  11. 11.

    et al. Transcription maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149–1154 (2005).

  12. 12.

    et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316, 1484–1488 (2007).

  13. 13.

    et al. Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping. Trends Genet. 21, 466–475 (2005).

  14. 14.

    , & Genome-wide transcription and the implications for genomic organization. Nat. Rev. Genet. 8, 413–423 (2007).

  15. 15.

    et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science published online, doi:10.1126/science.1158441 (1 May 2008).

  16. 16.

    et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single nucleotide resolution. Nature advance online publication, doi:10.1038/nature07002 (2008).

  17. 17.

    et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133, 523–536 (2008).

  18. 18.

    , , , & Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

  19. 19.

    , , & Significance of rare mRNA sequences in liver. Arch. Biochem. Biophys. 179, 584–599 (1977).

  20. 20.

    , , & Exon arrays provide accurate assessments of gene expression. Genome Biol. 8, R82 (2007).

  21. 21.

    & Analysis of alternative splicing with microarrays: successes and challenges. Genome Biol. 5, 231 (2004).

  22. 22.

    et al. Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 302, 2141–2144 (2003).

  23. 23.

    et al. A Mef2 gene that generates a muscle-specific isoform via alternative mRNA splicing. Mol. Cell. Biol. 14, 1647–1656 (1994).

Download references

Acknowledgements

This work was supported by The Beckman Foundation, The Simons Foundation and US National Institutes of Health (NIH) grant U54 HG004576 to B.W. and R. Myers. A.M. was supported by an NIH training grant. The authors especially thank D. Trout and B. King for professional data handling and G. Schroth, I. Khrebtukova and S. Luo, of Illumina, for exchanges of preliminary data and protocols under development. M. Liu and J.L. Riechmann, along with others from the laboratories of B. Wold, R. Myers, J. Allman and P. Sternberg, are gratefully acknowledged for many helpful discussions, as are R. Myers and S. Mango for manuscript assistance.

Author information

Author notes

    • Ali Mortazavi
    •  & Brian A Williams

    These authors contributed equally to this work.

Affiliations

  1. Division of Biology, MC 156-29, California Institute of Technology, Pasadena, California 91125, USA.

    • Ali Mortazavi
    • , Brian A Williams
    • , Kenneth McCue
    • , Lorian Schaeffer
    •  & Barbara Wold

Authors

  1. Search for Ali Mortazavi in:

  2. Search for Brian A Williams in:

  3. Search for Kenneth McCue in:

  4. Search for Lorian Schaeffer in:

  5. Search for Barbara Wold in:

Corresponding author

Correspondence to Barbara Wold.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–6, Supplementary Tables 1–4 and Supplementary Methods

Zip files

  1. 1.

    Supplementary Software

Text files

  1. 1.

    Supplementary Dataset 1

    Intermediate and final RPKM values for mouse brain.

  2. 2.

    Supplementary Dataset 2

    Intermediate and final RPKM values for mouse liver.

  3. 3.

    Supplementary Dataset 3

    Intermediate and final RPKM values for mouse muscle.

Excel files

  1. 1.

    Supplementary Dataset 4

    Top 500 genes with strong multiread contributions in mouse liver.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nmeth.1226

Further reading