Article | Published:

Analysis and design of RNA sequencing experiments for identifying isoform regulation

Nature Methods volume 7, pages 10091015 (2010) | Download Citation

Abstract

Through alternative splicing, most human genes express multiple isoforms that often differ in function. To infer isoform regulation from high-throughput sequencing of cDNA fragments (RNA-seq), we developed the mixture-of-isoforms (MISO) model, a statistical model that estimates expression of alternatively spliced exons and isoforms and assesses confidence in these estimates. Incorporation of mRNA fragment length distribution in paired-end RNA-seq greatly improved estimation of alternative-splicing levels. MISO also detects differentially regulated exons or isoforms. Application of MISO implicated the RNA splicing factor hnRNP H1 in the regulation of alternative cleavage and polyadenylation, a role that was supported by UV cross-linking–immunoprecipitation sequencing (CLIP-seq) analysis in human cells. Our results provide a probabilistic framework for RNA-seq analysis, give functional insights into pre-mRNA processing and yield guidelines for the optimal design of RNA-seq experiments for studies of gene and isoform expression.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from $8.99

All prices are NET prices.

Accessions

Gene Expression Omnibus

References

  1. 1.

    , & Understanding alternative splicing: towards a cellular code. Nat. Rev. Mol. Cell Biol. 6, 386–398 (2005).

  2. 2.

    et al. The M2 splice isoform of pyruvate kinase is important for cancer metabolism and tumour growth. Nature 452, 230–233 (2008).

  3. 3.

    et al. Analysis of the human neurexin genes: alternative splicing and the generation of protein diversity. Genomics 79, 587–597 (2002).

  4. 4.

    et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

  5. 5.

    , , , & Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods 5, 621–628 (2008).

  6. 6.

    , , , & Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).

  7. 7.

    et al. Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc. Natl. Acad. Sci. USA 106, 3264–3269 (2009).

  8. 8.

    et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

  9. 9.

    et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510 (2010).

  10. 10.

    , , , & Alternative expression analysis by RNA sequencing. Nat. Methods 7, 843–847 (2010).

  11. 11.

    et al. Identification of alternative splicing markers for breast cancer. Cancer Res. 68, 9525–9531 (2008).

  12. 12.

    & Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25, 1026–1032 (2009).

  13. 13.

    et al. Cancer-associated regulation of alternative splicing. Nat. Struct. Mol. Biol. 16, 670–676 (2009).

  14. 14.

    et al. Splice site strength-dependent activity and genetic buffering by poly-G runs. Nat. Struct. Mol. Biol. 16, 1094–1100 (2009).

  15. 15.

    & Molecular mechanisms of eukaryotic pre-mRNA 3′ end processing regulation. Nucleic Acids Res. 38, 2757–2774 (2010).

  16. 16.

    , & The hnRNPs F and H2 bind to similar sequences to influence gene expression. Biochem. J. 393, 361–371 (2006).

  17. 17.

    et al. A physical and functional link between splicing factors promotes pre-mRNA 3′ end processing. Nucleic Acids Res. 37, 4672–4683 (2009).

  18. 18.

    , & Heterogeneous nuclear ribonucleoproteins F and H/H' show differential expression in normal and selected cancer tissues. Exp. Cell Res. 294, 199–209 (2004).

  19. 19.

    , , , & Proliferating cells express mRNAs with shortened 3′ untranslated regions and fewer microRNA target sites. Science 320, 1643–1647 (2008).

  20. 20.

    & Widespread shortening of 3′UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138, 673–684 (2009).

  21. 21.

    , & Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol. 11, R50 (2010).

  22. 22.

    , & RNA and disease. Cell 136, 777–793 (2009).

  23. 23.

    , , & The RNA-binding protein CUGBP1 regulates stability of tumor necrosis factor mRNA in muscle cells: implications for myotonic dystrophy. J. Biol. Chem. 283, 22457–22463 (2008).

  24. 24.

    , , & Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

  25. 25.

    , , , & Rna-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26, 493–500 (2010).

  26. 26.

    , , & PolyADB: a database for mammalian mRNA polyadenylation. Nucleic Acids Res. 33, D116–D120 (2005).

  27. 27.

    et al. A statistical method for the detection of alternative splicing using rna-seq. PLoS ONE 5, e8529 (2010).

  28. 28.

    et al. Annotating genomes with massive-scale RNA sequencing. Genome Biol. 9, R175 (2008).

  29. 29.

    , & Tophat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

  30. 30.

    , , & Global and unbiased detection of splice junctions from rna-seq data. Genome Biol. 11, R34 (2010).

  31. 31.

    , , , & Detection of splice junctions from paired-end rna-seq data by splicemap. Nucleic Acids Res. 38, 4570–4578 (2010).

  32. 32.

    & Fast and snp-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).

  33. 33.

    & Kendall's advanced theory of statistics, vol. 2b: Bayesian inference. (2nd edn.) J. Am. Stat. Assoc. 100, 1465–1466 (2005).

  34. 34.

    Monte Carlo Strategies in Scientific Computing (Springer Series in Statistics) (Springer, 2008).

  35. 35.

    Getting started in probabilistic graphical models. PLOS Comput. Biol. 3, e252 (2007).

  36. 36.

    & Logistic-normal distributions: some properties and uses. Biometrika 67, 261–272 (1980).

  37. 37.

    & Monte Carlo estimation of Bayesian credible and HPD intervals. J. Comput. Graph. Statist. 8, 69–92 (1998).

  38. 38.

    & Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).

Download references

Acknowledgements

We thank C. Wilusz (Colorado State University) for the gift of the CUGBP1-knockdown and control C2C12 cells; R. Darnell for advice regarding CLIP-seq protocols; S. Abou Elela, V. Butty, R. Nutiu and G. Schroth for sharing RNA-seq data; and J. Ernst, D. Gresham, M. Guttman, F. Jäkel, E. Jonas, F. Markowetz, D. Roy, R. Sandberg, T. Velho, X. Xiao and members of the Burge lab for insightful discussions and comments on the manuscript. This work was supported by grants from the US National Science Foundation (E.M.A.) and the US National Institutes of Health (E.M.A. and C.B.B.).

Author information

Affiliations

  1. Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA.

    • Yarden Katz
  2. Department of Biology, MIT, Cambridge, Massachusetts, USA.

    • Yarden Katz
    • , Eric T Wang
    •  & Christopher B Burge
  3. Harvard-MIT Division of Health Sciences and Technology, Cambridge, Massachusetts, USA.

    • Eric T Wang
  4. Department of Statistics and FAS Center for Systems Biology, Harvard University, Cambridge, Massachusetts, USA.

    • Edoardo M Airoldi
  5. Department of Biological Engineering, MIT, Cambridge, Massachusetts, USA.

    • Christopher B Burge

Authors

  1. Search for Yarden Katz in:

  2. Search for Eric T Wang in:

  3. Search for Edoardo M Airoldi in:

  4. Search for Christopher B Burge in:

Contributions

Y.K., development of MISO model and software, analyses involving MISO, writing of main text and methods; E.T.W., hnRNP H CLIP-seq experiments and associated computational analyses, CUGBP1 knockdown RNA-seq experiments and associated computational analyses; E.M.A., development of model and statistical analysis, writing of methods; C.B.B., development of MISO model, contributions to computational analyses, writing of main text.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Edoardo M Airoldi or Christopher B Burge.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–12, Supplementary Tables 1 and 2, Supplementary Note

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nmeth.1528