Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation

Abstract

High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation1,2,3. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of Cufflinks.
Figure 2: Distinction of transcriptional and post-transcriptional regulatory effects on overall transcript output.
Figure 3: Excluding isoforms discovered by Cufflinks from the transcript abundance estimation affects the abundance estimates of known isoforms, in some cases by orders of magnitude.
Figure 4: Robustness of assembly and abundance estimation as a function of expression level and depth of sequencing.

Accession codes

Accessions

Gene Expression Omnibus

References

  1. Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5, 613–619 (2008).

    Article  CAS  PubMed  Google Scholar 

  2. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

    Article  CAS  PubMed  Google Scholar 

  3. Nagalakshmi, U., Wang, Z., Waern, K., Shou, C. & Raha, D. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Wang, E. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Denoeud, F. et al. Annotating genomes with massive-scale RNA sequencing. Genome Biol. 9, R175 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Maher, C. et al. Transcriptome sequencing to detect gene fusions in cancer. Nature 458, 97–101 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Marioni, J., Mason, C., Mane, S., Stephens, M. & Gilad, Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Hiller, D., Jiang, H., Xu, W. & Wong, W. Identifiability of isoform deconvolution from junction arrays and RNA-Seq. Bioinformatics 25, 3056–3059 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Jiang, H. & Wong, W.H. Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25, 1026–1032 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A. & Dewey, C.N. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26, 493–500 (2010).

    Article  PubMed  Google Scholar 

  11. Mortazavi, A., Williams, B., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

    Article  CAS  PubMed  Google Scholar 

  12. Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-Seq and RNA-Seq studies. Nat. Methods 6, S22–S32 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Yaffe, D. & Saxel, O. A myogenic cell line with altered serum requirements for differentiation. Differentiation 7, 159–166 (1977).

    Article  CAS  PubMed  Google Scholar 

  14. Yun, K. & Wold, B. Skeletal muscle determination and differentiation: story of a core regulatory network and its context. Curr. Opin. Cell Biol. 8, 877–889 (1996).

    Article  CAS  PubMed  Google Scholar 

  15. Tapscott, S.J. The circuitry of a master switch: Myod and the regulation of skeletal muscle gene transcription. Development 132, 2685–2695 (2005).

    Article  CAS  PubMed  Google Scholar 

  16. Trapnell, C., Pachter, L. & Salzberg, S. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Haas, B.J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Dilworth, R. A decomposition theorem for partially ordered sets. Ann. Math. 51, 161–166 (1950).

    Article  Google Scholar 

  19. Eriksson, N. et al. Viral population estimation using pyrosequencing. PLOS Comput. Biol. 4, e1000074 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Cordes, K.R. et al. miR-145 and miR-143 regulate smooth muscle cell fate and plasticity. Nature 460, 705–710 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Lareau, L.F., Inada, M., Green, R.E., Wengrod, J.C. & Brenner, S.E. Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature 446, 926–929 (2007).

    Article  CAS  PubMed  Google Scholar 

  23. Bullard, J., Purdom, E., Hansen, K., Durinck, S. & Dudoit, S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11, 94 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Endo, T. & Nadal-Ginard, B. Transcriptional and posttranscriptional control of c-myc during myogenesis: its mRNA remains inducible in differentiated cells and does not suppress the differentiated phenotype. Mol. Cell. Biol. 6, 1412–1421 (1986).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Fuglede, B. & Topsøe, F. in Proceedings of the IEEE International Symposium on Information Theory, 31 (2004).

  26. Cottle, D.L., McGrath, M.J., Cowling, B.S. & Coghill, I.D. FHL3 binds MyoD and negatively regulates myotube formation. J. Cell Sci. 120, 1423–1435 (2007).

    Article  CAS  PubMed  Google Scholar 

  27. Sammeth, M., Lacroix, V., Ribeca, P. & Guigó, R. The FLUX Simulator. <http://flux.sammeth.net>.

  28. Johnson, D., Mortazavi, A., Myers, R. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

    Article  CAS  PubMed  Google Scholar 

  29. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the US National Institutes of Health (NIH) grants R01-LM006845 and ENCODE U54-HG004576, as well as the Beckman Foundation, the Bren Foundation, the Moore Foundation (Cell Center Program) and the Miller Research Institute. We thank I. Antosechken and L. Schaeffer of the Caltech Jacobs Genome Center for DNA sequencing, and D. Trout, B. King and H. Amrhein for data pipeline and database design, operation and display. We are grateful to R. K. Bradley, K. Datchev, I. Hallgrímsdóttir, J. Landolin, B. Langmead, A. Roberts, M. Schatz and D. Sturgill for helpful discussions.

Author information

Authors and Affiliations

Authors

Contributions

C.T. and L.P. developed the mathematics and statistics and designed the algorithms; B.A.W. and G.K. performed the RNA-Seq and B.A.W. designed and executed experimental validations; C.T. implemented Cufflinks and Cuffdiff; G.P. implemented Cuffcompare; M.J.v.B. and A.M. tested the software; C.T., G.P. and A.M. performed the analysis; L.P., A.M. and B.J.W. conceived the project; C.T., L.P., A.M., B. J.W. and S.L.S. wrote the manuscript.

Corresponding author

Correspondence to Lior Pachter.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Tables 1–3, Supplementary Figs. 1–11 and Supplementary Methods (PDF 2058 kb)

Supplementary Table 4

Genes with complex isoform expression dynamics in C2C12 myogenesis (XLS 80 kb)

Supplementary Data (ZIP 5773 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Trapnell, C., Williams, B., Pertea, G. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511–515 (2010). https://doi.org/10.1038/nbt.1621

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.1621

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing