Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Analysis and design of RNA sequencing experiments for identifying isoform regulation

Abstract

Through alternative splicing, most human genes express multiple isoforms that often differ in function. To infer isoform regulation from high-throughput sequencing of cDNA fragments (RNA-seq), we developed the mixture-of-isoforms (MISO) model, a statistical model that estimates expression of alternatively spliced exons and isoforms and assesses confidence in these estimates. Incorporation of mRNA fragment length distribution in paired-end RNA-seq greatly improved estimation of alternative-splicing levels. MISO also detects differentially regulated exons or isoforms. Application of MISO implicated the RNA splicing factor hnRNP H1 in the regulation of alternative cleavage and polyadenylation, a role that was supported by UV cross-linking–immunoprecipitation sequencing (CLIP-seq) analysis in human cells. Our results provide a probabilistic framework for RNA-seq analysis, give functional insights into pre-mRNA processing and yield guidelines for the optimal design of RNA-seq experiments for studies of gene and isoform expression.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: More accurate inference of splicing levels using MISO.
Figure 2: MISO CIs for Ψ values and qRT-PCR validation.
Figure 3: Bayes factor analysis of hnRNP H regulation of exon splicing.
Figure 4: Bayes factor analysis implicates hnRNP H in alternative cleavage and polyadenylation.
Figure 5: Improved estimation of isoform abundance using paired-end reads.

Similar content being viewed by others

Accession codes

Accessions

Gene Expression Omnibus

References

  1. Matlin, A.J., Clark, F. & Smith, C.W.J. Understanding alternative splicing: towards a cellular code. Nat. Rev. Mol. Cell Biol. 6, 386–398 (2005).

    Article  CAS  Google Scholar 

  2. Christofk, H.R. et al. The M2 splice isoform of pyruvate kinase is important for cancer metabolism and tumour growth. Nature 452, 230–233 (2008).

    Article  CAS  Google Scholar 

  3. Rowen, L. et al. Analysis of the human neurexin genes: alternative splicing and the generation of protein diversity. Genomics 79, 587–597 (2002).

    Article  CAS  Google Scholar 

  4. Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

    Article  CAS  Google Scholar 

  5. Mortazavi, A., Williams, B.A.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods 5, 621–628 (2008).

    Article  CAS  Google Scholar 

  6. Pan, Q., Shai, O., Lee, L.J., Frey, B.J. & Blencowe, B.J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).

    Article  CAS  Google Scholar 

  7. Yassour, M. et al. Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc. Natl. Acad. Sci. USA 106, 3264–3269 (2009).

    Article  CAS  Google Scholar 

  8. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

    Article  CAS  Google Scholar 

  9. Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510 (2010).

    Article  CAS  Google Scholar 

  10. Griffith, M., Griffith, O.L., Mwenifumbo, J., Goya, R. & Morrissy, A.S. Alternative expression analysis by RNA sequencing. Nat. Methods 7, 843–847 (2010).

    Article  CAS  Google Scholar 

  11. Venables, J.P. et al. Identification of alternative splicing markers for breast cancer. Cancer Res. 68, 9525–9531 (2008).

    Article  CAS  Google Scholar 

  12. Jiang, H. & Wong, W.H. Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25, 1026–1032 (2009).

    Article  CAS  Google Scholar 

  13. Venables, J.P. et al. Cancer-associated regulation of alternative splicing. Nat. Struct. Mol. Biol. 16, 670–676 (2009).

    Article  CAS  Google Scholar 

  14. Xiao, X. et al. Splice site strength-dependent activity and genetic buffering by poly-G runs. Nat. Struct. Mol. Biol. 16, 1094–1100 (2009).

    Article  CAS  Google Scholar 

  15. Millevoi, S. & Vagner, S. Molecular mechanisms of eukaryotic pre-mRNA 3′ end processing regulation. Nucleic Acids Res. 38, 2757–2774 (2010).

    Article  CAS  Google Scholar 

  16. Alkan, S.A., Martincic, K. & Milcarek, C. The hnRNPs F and H2 bind to similar sequences to influence gene expression. Biochem. J. 393, 361–371 (2006).

    Article  CAS  Google Scholar 

  17. Millevoi, S. et al. A physical and functional link between splicing factors promotes pre-mRNA 3′ end processing. Nucleic Acids Res. 37, 4672–4683 (2009).

    Article  CAS  Google Scholar 

  18. Honoré, B., Baandrup, U. & Vorum, H. Heterogeneous nuclear ribonucleoproteins F and H/H' show differential expression in normal and selected cancer tissues. Exp. Cell Res. 294, 199–209 (2004).

    Article  Google Scholar 

  19. Sandberg, R., Neilson, J.R., Sarma, A., Sharp, P.A. & Burge, C.B. Proliferating cells express mRNAs with shortened 3′ untranslated regions and fewer microRNA target sites. Science 320, 1643–1647 (2008).

    Article  CAS  Google Scholar 

  20. Mayr, C. & Bartel, D.P. Widespread shortening of 3′UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138, 673–684 (2009).

    Article  CAS  Google Scholar 

  21. Li, J., Jiang, H. & Wong, W.H. Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol. 11, R50 (2010).

    Article  Google Scholar 

  22. Cooper, T.A., Wan, L. & Dreyfuss, G. RNA and disease. Cell 136, 777–793 (2009).

    Article  CAS  Google Scholar 

  23. Zhang, L., Lee, J.E., Wilusz, J. & Wilusz, C.J. The RNA-binding protein CUGBP1 regulates stability of tumor necrosis factor mRNA in muscle cells: implications for myotonic dystrophy. J. Biol. Chem. 283, 22457–22463 (2008).

    Article  CAS  Google Scholar 

  24. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  Google Scholar 

  25. Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A. & Dewey, C.N. Rna-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26, 493–500 (2010).

    Article  Google Scholar 

  26. Zhang, H., Hu, J., Recce, M. & Tian, B. PolyADB: a database for mammalian mRNA polyadenylation. Nucleic Acids Res. 33, D116–D120 (2005).

    Article  CAS  Google Scholar 

  27. Wang, L. et al. A statistical method for the detection of alternative splicing using rna-seq. PLoS ONE 5, e8529 (2010).

    Article  Google Scholar 

  28. Denoeud, F. et al. Annotating genomes with massive-scale RNA sequencing. Genome Biol. 9, R175 (2008).

    Article  Google Scholar 

  29. Trapnell, C., Pachter, L. & Salzberg, S.L. Tophat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

    Article  CAS  Google Scholar 

  30. Ameur, A., Wetterbom, A., Feuk, L. & Gyllensten, U. Global and unbiased detection of splice junctions from rna-seq data. Genome Biol. 11, R34 (2010).

    Article  Google Scholar 

  31. Au, K.F., Jiang, H., Lin, L., Xing, Y. & Wong, W.H. Detection of splice junctions from paired-end rna-seq data by splicemap. Nucleic Acids Res. 38, 4570–4578 (2010).

    Article  CAS  Google Scholar 

  32. Wu, T.D. & Nacu, S. Fast and snp-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).

    Article  CAS  Google Scholar 

  33. O'Hagan, A. & Forster, J. Kendall's advanced theory of statistics, vol. 2b: Bayesian inference. (2nd edn.) J. Am. Stat. Assoc. 100, 1465–1466 (2005).

    Google Scholar 

  34. Liu, J.S. Monte Carlo Strategies in Scientific Computing (Springer Series in Statistics) (Springer, 2008).

  35. Airoldi, E.M. Getting started in probabilistic graphical models. PLOS Comput. Biol. 3, e252 (2007).

    Article  Google Scholar 

  36. Aitchison, J. & Shen, S.M. Logistic-normal distributions: some properties and uses. Biometrika 67, 261–272 (1980).

    Article  Google Scholar 

  37. Chen, M. & Man Shao, Q. Monte Carlo estimation of Bayesian credible and HPD intervals. J. Comput. Graph. Statist. 8, 69–92 (1998).

    Google Scholar 

  38. Kass, R.E. & Raftery, A.E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).

    Article  Google Scholar 

Download references

Acknowledgements

We thank C. Wilusz (Colorado State University) for the gift of the CUGBP1-knockdown and control C2C12 cells; R. Darnell for advice regarding CLIP-seq protocols; S. Abou Elela, V. Butty, R. Nutiu and G. Schroth for sharing RNA-seq data; and J. Ernst, D. Gresham, M. Guttman, F. Jäkel, E. Jonas, F. Markowetz, D. Roy, R. Sandberg, T. Velho, X. Xiao and members of the Burge lab for insightful discussions and comments on the manuscript. This work was supported by grants from the US National Science Foundation (E.M.A.) and the US National Institutes of Health (E.M.A. and C.B.B.).

Author information

Authors and Affiliations

Authors

Contributions

Y.K., development of MISO model and software, analyses involving MISO, writing of main text and methods; E.T.W., hnRNP H CLIP-seq experiments and associated computational analyses, CUGBP1 knockdown RNA-seq experiments and associated computational analyses; E.M.A., development of model and statistical analysis, writing of methods; C.B.B., development of MISO model, contributions to computational analyses, writing of main text.

Corresponding authors

Correspondence to Edoardo M Airoldi or Christopher B Burge.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–12, Supplementary Tables 1 and 2, Supplementary Note (PDF 1935 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Katz, Y., Wang, E., Airoldi, E. et al. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7, 1009–1015 (2010). https://doi.org/10.1038/nmeth.1528

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.1528

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing