Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

rMATS-turbo: an efficient and flexible computational tool for alternative splicing analysis of large-scale RNA-seq data

Abstract

Pre-mRNA alternative splicing is a prevalent mechanism for diversifying eukaryotic transcriptomes and proteomes. Regulated alternative splicing plays a role in many biological processes, and dysregulated alternative splicing is a feature of many human diseases. Short-read RNA sequencing (RNA-seq) is now the standard approach for transcriptome-wide analysis of alternative splicing. Since 2011, our laboratory has developed and maintained Replicate Multivariate Analysis of Transcript Splicing (rMATS), a computational tool for discovering and quantifying alternative splicing events from RNA-seq data. Here we provide a protocol for the contemporary version of rMATS, rMATS-turbo, a fast and scalable re-implementation that maintains the statistical framework and user interface of the original rMATS software, while incorporating a revamped computational workflow with a substantial improvement in speed and data storage efficiency. The rMATS-turbo software scales up to massive RNA-seq datasets with tens of thousands of samples. To illustrate the utility of rMATS-turbo, we describe two representative application scenarios. First, we describe a broadly applicable two-group comparison to identify differential alternative splicing events between two sample groups, including both annotated and novel alternative splicing events. Second, we describe a quantitative analysis of alternative splicing in a large-scale RNA-seq dataset (~1,000 samples), including the discovery of alternative splicing events associated with distinct cell states. We detail the workflow and features of rMATS-turbo that enable efficient parallel processing and analysis of large-scale RNA-seq datasets on a compute cluster. We anticipate that this protocol will help the broad user base of rMATS-turbo make the best use of this software for studying alternative splicing in diverse biological systems.

Key points

  • This protocol provides detailed guidelines for using rMATS-turbo, the latest implementation of the popular software for the discovery and quantification of alternative splicing events from RNA sequencing data. The software is exemplified in two representative scenarios.

  • rMATS-turbo incorporates a revamped computational workflow with a substantial improvement in speed and data storage efficiency. The software scales up to massive RNA sequencing datasets with tens of thousands of samples.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: An overview of the rMATS-turbo workflow to discover and quantify alternative splicing events in large-scale RNA-seq datasets.
Fig. 2: General two-group comparison to identify differential alternative splicing events between PC3E and GS689 cell lines using rMATS-turbo.
Fig. 3: Alternative splicing analysis in a large-scale RNA-seq dataset (CCLE) using rMATS-turbo.

Similar content being viewed by others

Data availability

RNA-seq data for PC3E and GS689 cell lines (Procedure 1) and RNA-seq data for 1,019 CCLE human cancer cell lines (Procedure 2) can be downloaded from the SRA archive (https://www.ncbi.nlm.nih.gov/sra) under accessions BioProject PRJNA438990 and PRJNA523380, respectively. rMATS-turbo output files for both Procedure 1 and Procedure 2 datasets are available at https://doi.org/10.5281/zenodo.6647023 and other result files are provided in the companion GitHub repository of this protocol (https://github.com/Xinglab/rmats-turbo-tutorial63).

Code availability

rMATS-turbo is publicly available on GitHub (https://github.com/Xinglab/rmats-turbo) and Bioconda (https://anaconda.org/bioconda/rmats). rmats2sashimiplot is publicly available on GitHub (https://github.com/Xinglab/rmats2sashimiplot). Custom scripts for data analysis and visualization of the results generated in Procedures 1 and 2 are provided in the companion GitHub repository of this protocol (https://github.com/Xinglab/rmats-turbo-tutorial63).

References

  1. Nilsen, T. W. & Graveley, B. R. Expansion of the eukaryotic proteome by alternative splicing. Nature 463, 457–463 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Sharp, P. A. Split genes and RNA splicing. Cell 77, 805–815 (1994).

    CAS  PubMed  Google Scholar 

  3. Wang, Z. & Burge, C. B. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA 14, 802–813 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Fu, X. D. & Ares, M. Jr Context-dependent control of alternative splicing by RNA-binding proteins. Nat. Rev. Genet. 15, 689–701 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Kalsotra, A. & Cooper, T. A. Functional consequences of developmentally regulated alternative splicing. Nat. Rev. Genet. 12, 715–729 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Baralle, F. E. & Giudice, J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18, 437–451 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Scotti, M. M. & Swanson, M. S. RNA mis-splicing in disease. Nat. Rev. Genet. 17, 19–32 (2016).

    CAS  PubMed  Google Scholar 

  8. Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Park, E., Pan, Z., Zhang, Z., Lin, L. & Xing, Y. The expanding landscape of alternative splicing variation in human populations. Am. J. Hum. Genet. 102, 11–26 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Pan, Q., Shai, O., Lee, L. J., Frey, B. J. & Blencowe, B. J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).

    CAS  PubMed  Google Scholar 

  11. Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

    CAS  PubMed  Google Scholar 

  13. Alamancos, G. P., Agirre, E. & Eyras, E. Methods to study splicing from high-throughput RNA sequencing data. Methods Mol. Biol. 1126, 357–397 (2014).

    CAS  PubMed  Google Scholar 

  14. Conesa, A. et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 13 (2016).

    PubMed  PubMed Central  Google Scholar 

  15. Stark, R., Grzelak, M. & Hadfield, J. RNA sequencing: the teenage years. Nat. Rev. Genet. 20, 631–656 (2019).

    CAS  PubMed  Google Scholar 

  16. Pan, Y. et al. RNA dysregulation: an expanding source of cancer immunotherapy targets. Trends Pharmacol. Sci. 42, 268–282 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    Google Scholar 

  18. Shen, S. et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-seq data. Proc. Natl Acad. Sci. USA 111, E5593–E5601 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Van Nostrand, E. L. et al. A large-scale binding and functional map of human RNA-binding proteins. Nature 583, 711–719 (2020).

    PubMed  PubMed Central  Google Scholar 

  21. Begg, B. E., Jens, M., Wang, P. Y., Minor, C. M. & Burge, C. B. Concentration-dependent splicing is enabled by Rbfox motifs of intermediate affinity. Nat. Struct. Mol. Biol. 27, 901–912 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Hu, X. et al. The RNA-binding protein AKAP8 suppresses tumor metastasis by antagonizing EMT-associated alternative splicing. Nat. Commun. 11, 486 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Jourdain, A. A. et al. Loss of LUC7L2 and U1 snRNP subunits shifts energy metabolism from glycolysis to OXPHOS. Mol. Cell 81, 1905–1919 e1912 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Leclair, N. K. et al. Poison exon splicing regulates a coordinated network of SR protein expression during differentiation and tumorigenesis. Mol. Cell 80, 648–665 e649 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Liu, W. et al. Ectopic targeting of CG DNA methylation in Arabidopsis with the bacterial SssI methyltransferase. Nat. Commun. 12, 3130 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Wang, L. et al. RALF1–FERONIA complex affects splicing dynamics to modulate stress responses and growth in plants. Sci. Adv. 6, eaaz1622 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Phillips, J. W. et al. Pathway-guided analysis identifies Myc-dependent alternative pre-mRNA splicing in aggressive prostate cancers. Proc. Natl Acad. Sci. USA 117, 5269–5279 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Wang, Y. et al. Role of Hakai in m(6)A modification pathway in Drosophila. Nat. Commun. 12, 2159 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Lau, E. et al. Splice-junction-based mapping of alternative isoforms in the human proteome. Cell Rep. 29, 3751–3765 e3755 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Daniels, N. J. et al. Functional analyses of human LUC7-like proteins involved in splicing regulation and myeloid neoplasms. Cell Rep. 35, 108989 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Zhang, Y. et al. Regional variation of splicing QTLs in human brain. Am. J. Hum. Genet. 107, 196–210 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Heber, S., Alekseyev, M., Sze, S. H., Tang, H. & Pevzner, P. A. Splicing graphs and EST assembly problem. Bioinformatics 18, S181–S188 (2002).

    PubMed  Google Scholar 

  33. Xing, Y., Resch, A. & Lee, C. The multiassembly problem: reconstructing multiple transcript isoforms from EST fragment mixtures. Genome Res. 14, 426–441 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Rahman, M. A., Krainer, A. R. & Abdel-Wahab, O. SnapShot: splicing alterations in cancer. Cell 180, 208–208 e201 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Anczukow, O. & Krainer, A. R. Splicing-factor alterations in cancers. RNA 22, 1285–1301 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Mironov, A., Denisov, S., Gress, A., Kalinina, O. V. & Pervouchine, D. D. An extended catalogue of tandem alternative splice sites in human tissue transcriptomes. PLoS Comput. Biol. 17, e1008329 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Demirdjian, L. et al. Detecting allele-specific alternative splicing from population-scale RNA-seq data. Am. J. Hum. Genet. 107, 461–472 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Trapnell, C. et al. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

    CAS  PubMed  Google Scholar 

  40. Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Wu, J. et al. SpliceTrap: a method to quantify alternative splicing under single cellular conditions. Bioinformatics 27, 3010–3016 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Alamancos, G. P., Pages, A., Trincado, J. L., Bellora, N. & Eyras, E. Leveraging transcript quantification for fast computation of alternative splicing profiles. RNA 21, 1521–1531 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Trincado, J. L. et al. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 19, 40 (2018).

    PubMed  PubMed Central  Google Scholar 

  44. Vaquero-Garcia, J. et al. RNA splicing analysis using heterogeneous and large RNA-seq datasets. Nat. Commun. 14, 1230 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Li, Y. I. et al. Annotation-free quantification of RNA splicing using LeafCutter. Nat. Genet. 50, 151–158 (2018).

    CAS  PubMed  Google Scholar 

  46. Lin, K. T. & Krainer, A. R. PSI-Sigma: a comprehensive splicing-detection method for short-read and long-read RNA-seq analysis. Bioinformatics 35, 5048–5054 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Sterne-Weiler, T., Weatheritt, R. J., Best, A. J., Ha, K. C. H. & Blencowe, B. J. Efficient and accurate quantitative profiling of alternative splicing patterns of any complexity on a laptop. Mol. Cell 72, 187–200 e186 (2018).

    CAS  PubMed  Google Scholar 

  48. Wang, Q. & Rio, D. C. JUM is a computational method for comprehensive annotation-free analysis of alternative pre-mRNA splicing patterns. Proc. Natl Acad. Sci. USA 115, E8181–E8190 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Mehmood, A. et al. Systematic evaluation of differential splicing tools for RNA-seq studies. Brief. Bioinform. 21, 2052–2065 (2020).

    CAS  PubMed  Google Scholar 

  50. Muller, I. B. et al. Computational comparison of common event-based differential splicing tools: practical considerations for laboratory researchers. BMC Bioinform. 22, 347 (2021).

    CAS  Google Scholar 

  51. Vaquero-Garcia, J. et al. A new view of transcriptome complexity and regulation through the lens of local splicing variations. eLife 5, e11752 (2016).

    PubMed  PubMed Central  Google Scholar 

  52. Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21, 30 (2020).

    PubMed  PubMed Central  Google Scholar 

  53. Byrne, A., Cole, C., Volden, R. & Vollmers, C. Realizing the potential of full-length transcriptome sequencing. Philos. Trans. R. Soc. Lond. B 374, 20190097 (2019).

    CAS  Google Scholar 

  54. Gao, Y. et al. ESPRESSO: robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data. Sci. Adv. 9, eabq5072 (2023).

    PubMed  PubMed Central  Google Scholar 

  55. Zhang, Z. et al. Deep-learning augmented RNA-seq analysis of transcript splicing. Nat. Methods 16, 307–310 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Lu, Z. X. et al. Transcriptome-wide landscape of pre-mRNA alternative splicing associated with metastatic colonization. Mol. Cancer Res. 13, 305–318 (2015).

    CAS  PubMed  Google Scholar 

  57. Yang, J. et al. Guidelines and definitions for research on epithelial–mesenchymal transition. Nat. Rev. Mol. Cell Biol. 21, 341–352 (2020).

    PubMed  PubMed Central  Google Scholar 

  58. Ghandi, M. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. Chakraborty, P., George, J. T., Tripathi, S., Levine, H. & Jolly, M. K. Comparative study of transcriptomics-based scoring metrics for the epithelial–hybrid–mesenchymal spectrum. Front. Bioeng. Biotechnol. 8, 220 (2020).

    PubMed  PubMed Central  Google Scholar 

  60. Tan, T. Z. et al. Epithelial–mesenchymal transition spectrum quantification and its efficacy in deciphering survival and drug responses of cancer patients. EMBO Mol. Med. 6, 1279–1293 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  PubMed  Google Scholar 

  62. Veeneman, B. A., Shukla, S., Dhanasekaran, S. M., Chinnaiyan, A. M. & Nesvizhskii, A. I. Two-pass alignment improves novel splice junction quantification. Bioinformatics 32, 43–49 (2016).

    CAS  PubMed  Google Scholar 

  63. Wang, Y. et al. rMATS-turbo: an efficient and flexible computational tool for alternative splicing analysis of large-scale RNA-seq data. rMATS-turbo-tutorial https://doi.org/10.5281/zenodo.7931186 (2023).

  64. Klijn, C. et al. A comprehensive transcriptional portrait of human cancer cell lines. Nat. Biotechnol. 33, 306–312 (2015).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by National Institutes of Health grants R01GM088342 and R01GM117624.

Author information

Authors and Affiliations

Authors

Contributions

Y.X. conceived the study. Y.W., Z.X. and Y.X. designed the research and developed the methodology. Y.W., Z.X. and E.K. contributed to analytic tools. Y.W., E.K., J.I.A. and Y.X. analyzed the data. Y.W., K.E.K.-E. and Y.X. wrote the paper with input from all other authors.

Corresponding author

Correspondence to Yi Xing.

Ethics declarations

Competing interests

Y.X. is a scientific cofounder of Panorama Medicine. Y.X. and Z.X. receive licensing income for commercial usage of rMATS-turbo. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Protocols thanks Yiwen Chen and Julien Gagneur for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Phillips, J. W. et al. Proc. Natl Acad. Sci. USA 117, 5269–5279 (2020): https://doi.org/10.1073/pnas.1915975117

Shen, S. et al. Proc. Natl Acad. Sci. USA 111, E5593–E5601 (2014): https://doi.org/10.1073/pnas.1419161111

Key data used in this protocol

Zhang, Z. et al. Nat. Methods 16, 307–310 (2019): https://doi.org/10.1038/s41592-019-0351-9

Supplementary information

Supplementary Information

Supplementary Figs. 1–3.

Reporting Summary

Supplementary Tables 1, 2

SRA accession numbers and sample information for examples 1 and 2.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Xie, Z., Kutschera, E. et al. rMATS-turbo: an efficient and flexible computational tool for alternative splicing analysis of large-scale RNA-seq data. Nat Protoc 19, 1083–1104 (2024). https://doi.org/10.1038/s41596-023-00944-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-023-00944-2

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics