Through alternative splicing, most human genes express multiple isoforms that often differ in function. To infer isoform regulation from high-throughput sequencing of cDNA fragments (RNA-seq), we developed the mixture-of-isoforms (MISO) model, a statistical model that estimates expression of alternatively spliced exons and isoforms and assesses confidence in these estimates. Incorporation of mRNA fragment length distribution in paired-end RNA-seq greatly improved estimation of alternative-splicing levels. MISO also detects differentially regulated exons or isoforms. Application of MISO implicated the RNA splicing factor hnRNP H1 in the regulation of alternative cleavage and polyadenylation, a role that was supported by UV cross-linking–immunoprecipitation sequencing (CLIP-seq) analysis in human cells. Our results provide a probabilistic framework for RNA-seq analysis, give functional insights into pre-mRNA processing and yield guidelines for the optimal design of RNA-seq experiments for studies of gene and isoform expression.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Gene Expression Omnibus
Matlin, A.J., Clark, F. & Smith, C.W.J. Understanding alternative splicing: towards a cellular code. Nat. Rev. Mol. Cell Biol. 6, 386–398 (2005).
Christofk, H.R. et al. The M2 splice isoform of pyruvate kinase is important for cancer metabolism and tumour growth. Nature 452, 230–233 (2008).
Rowen, L. et al. Analysis of the human neurexin genes: alternative splicing and the generation of protein diversity. Genomics 79, 587–597 (2002).
Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Mortazavi, A., Williams, B.A.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods 5, 621–628 (2008).
Pan, Q., Shai, O., Lee, L.J., Frey, B.J. & Blencowe, B.J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).
Yassour, M. et al. Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc. Natl. Acad. Sci. USA 106, 3264–3269 (2009).
Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).
Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510 (2010).
Griffith, M., Griffith, O.L., Mwenifumbo, J., Goya, R. & Morrissy, A.S. Alternative expression analysis by RNA sequencing. Nat. Methods 7, 843–847 (2010).
Venables, J.P. et al. Identification of alternative splicing markers for breast cancer. Cancer Res. 68, 9525–9531 (2008).
Jiang, H. & Wong, W.H. Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25, 1026–1032 (2009).
Venables, J.P. et al. Cancer-associated regulation of alternative splicing. Nat. Struct. Mol. Biol. 16, 670–676 (2009).
Xiao, X. et al. Splice site strength-dependent activity and genetic buffering by poly-G runs. Nat. Struct. Mol. Biol. 16, 1094–1100 (2009).
Millevoi, S. & Vagner, S. Molecular mechanisms of eukaryotic pre-mRNA 3′ end processing regulation. Nucleic Acids Res. 38, 2757–2774 (2010).
Alkan, S.A., Martincic, K. & Milcarek, C. The hnRNPs F and H2 bind to similar sequences to influence gene expression. Biochem. J. 393, 361–371 (2006).
Millevoi, S. et al. A physical and functional link between splicing factors promotes pre-mRNA 3′ end processing. Nucleic Acids Res. 37, 4672–4683 (2009).
Honoré, B., Baandrup, U. & Vorum, H. Heterogeneous nuclear ribonucleoproteins F and H/H' show differential expression in normal and selected cancer tissues. Exp. Cell Res. 294, 199–209 (2004).
Sandberg, R., Neilson, J.R., Sarma, A., Sharp, P.A. & Burge, C.B. Proliferating cells express mRNAs with shortened 3′ untranslated regions and fewer microRNA target sites. Science 320, 1643–1647 (2008).
Mayr, C. & Bartel, D.P. Widespread shortening of 3′UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138, 673–684 (2009).
Li, J., Jiang, H. & Wong, W.H. Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol. 11, R50 (2010).
Cooper, T.A., Wan, L. & Dreyfuss, G. RNA and disease. Cell 136, 777–793 (2009).
Zhang, L., Lee, J.E., Wilusz, J. & Wilusz, C.J. The RNA-binding protein CUGBP1 regulates stability of tumor necrosis factor mRNA in muscle cells: implications for myotonic dystrophy. J. Biol. Chem. 283, 22457–22463 (2008).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A. & Dewey, C.N. Rna-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26, 493–500 (2010).
Zhang, H., Hu, J., Recce, M. & Tian, B. PolyADB: a database for mammalian mRNA polyadenylation. Nucleic Acids Res. 33, D116–D120 (2005).
Wang, L. et al. A statistical method for the detection of alternative splicing using rna-seq. PLoS ONE 5, e8529 (2010).
Denoeud, F. et al. Annotating genomes with massive-scale RNA sequencing. Genome Biol. 9, R175 (2008).
Trapnell, C., Pachter, L. & Salzberg, S.L. Tophat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Ameur, A., Wetterbom, A., Feuk, L. & Gyllensten, U. Global and unbiased detection of splice junctions from rna-seq data. Genome Biol. 11, R34 (2010).
Au, K.F., Jiang, H., Lin, L., Xing, Y. & Wong, W.H. Detection of splice junctions from paired-end rna-seq data by splicemap. Nucleic Acids Res. 38, 4570–4578 (2010).
Wu, T.D. & Nacu, S. Fast and snp-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).
O'Hagan, A. & Forster, J. Kendall's advanced theory of statistics, vol. 2b: Bayesian inference. (2nd edn.) J. Am. Stat. Assoc. 100, 1465–1466 (2005).
Liu, J.S. Monte Carlo Strategies in Scientific Computing (Springer Series in Statistics) (Springer, 2008).
Airoldi, E.M. Getting started in probabilistic graphical models. PLOS Comput. Biol. 3, e252 (2007).
Aitchison, J. & Shen, S.M. Logistic-normal distributions: some properties and uses. Biometrika 67, 261–272 (1980).
Chen, M. & Man Shao, Q. Monte Carlo estimation of Bayesian credible and HPD intervals. J. Comput. Graph. Statist. 8, 69–92 (1998).
Kass, R.E. & Raftery, A.E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).
We thank C. Wilusz (Colorado State University) for the gift of the CUGBP1-knockdown and control C2C12 cells; R. Darnell for advice regarding CLIP-seq protocols; S. Abou Elela, V. Butty, R. Nutiu and G. Schroth for sharing RNA-seq data; and J. Ernst, D. Gresham, M. Guttman, F. Jäkel, E. Jonas, F. Markowetz, D. Roy, R. Sandberg, T. Velho, X. Xiao and members of the Burge lab for insightful discussions and comments on the manuscript. This work was supported by grants from the US National Science Foundation (E.M.A.) and the US National Institutes of Health (E.M.A. and C.B.B.).
The authors declare no competing financial interests.
About this article
Cite this article
Katz, Y., Wang, E., Airoldi, E. et al. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7, 1009–1015 (2010). https://doi.org/10.1038/nmeth.1528
Selective Activation of CNS and Reference PPARGC1A Promoters Is Associated with Distinct Gene Programs Relevant for Neurodegenerative Diseases
International Journal of Molecular Sciences (2021)
HRT Atlas v1.0 database: redefining human and mouse housekeeping genes and candidate reference transcripts by mining massive RNA-seq datasets
Nucleic Acids Research (2021)
Electronic Journal of Biotechnology (2021)
Prediction of RNA-binding protein and alternative splicing event associations during epithelial–mesenchymal transition based on inductive matrix completion
Briefings in Bioinformatics (2021)
Nature Communications (2021)