Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Detection of aberrant gene expression events in RNA sequencing data


RNA sequencing (RNA-seq) has emerged as a powerful approach to discover disease-causing gene regulatory defects in individuals affected by genetically undiagnosed rare disorders. Pioneering studies have shown that RNA-seq could increase the diagnosis rates over DNA sequencing alone by 8–36%, depending on the disease entity and tissue probed. To accelerate adoption of RNA-seq by human genetics centers, detailed analysis protocols are now needed. We present a step-by-step protocol that details how to robustly detect aberrant expression levels, aberrant splicing and mono-allelic expression in RNA-seq data using dedicated statistical methods. We describe how to generate and assess quality control plots and interpret the analysis results. The protocol is based on the detection of RNA outliers pipeline (DROP), a modular computational workflow that integrates all the analysis steps, can leverage parallel computing infrastructures and generates browsable web page reports.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Workflow overview.
Fig. 2: The aberrant expression module.
Fig. 3: The aberrant splicing module.
Fig. 4: MAE module.
Fig. 5: Downstream analysis of outlier results.

Data availability

A subset of the Geuvadis dataset15 comprising 100 samples (Supplementary Data 1) was used to test and demonstrate the workflow. The original dataset is accessible without restriction under The analyses performed in the ‘Dataset design’ section used the GTEx and Kremer et al.9 datasets. The GTEx dataset was downloaded from the GTEx Portal on 12 June 2017, under the dbGaP accession number phs00424.v6.p1. The count matrices from the Kremer et al. dataset are available on Zenodo (

Code availability

DROP, including a small demo dataset of 10 samples and chromosome 21 only, is publicly available at under MIT license. The current version is 0.9.2, which is fixed with All the plots, results, and analyses of the test dataset can be found at


  1. 1.

    Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011).

    CAS  PubMed  Google Scholar 

  2. 2.

    Yang, Y. et al. Clinical whole-exome sequencing for the diagnosis of Mendelian disorders. N. Engl. J. Med. 369, 1502–1511 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Taylor, J. C. et al. Factors influencing success of clinical genome sequencing across a broad spectrum of disorders. Nat. Genet. 47, 717–726 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Lionel, A. C. et al. Improved diagnostic yield compared with targeted gene sequencing panels suggests a role for whole-genome sequencing as a first-tier genetic test. Genet. Med. 20, 435–443 (2018).

    CAS  PubMed  Google Scholar 

  5. 5.

    Chong, J. X. et al. The genetic basis of Mendelian phenotypes: discoveries, challenges, and opportunities. Am. J. Hum. Genet. 97, 199–215 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Cooper, G. M. Parlez-vous VUS? Genome Res. 25, 1423–1426 (2015).

  7. 7.

    Kremer, L. S. et al. Genetic diagnosis of Mendelian disorders via RNA sequencing. Nat. Commun. 8, 15824 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Cummings, B. B. et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med. 9, eaal5209 (2017).

    PubMed  PubMed Central  Google Scholar 

  9. 9.

    Frésard, L. et al. Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts. Nat. Med. 25, 911–919 (2019).

    PubMed  PubMed Central  Google Scholar 

  10. 10.

    Gonorazky, H. D. et al. Expanding the boundaries of RNA sequencing as a diagnostic tool for rare Mendelian disease. Am. J. Hum. Genet. 104, 466–483 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed  PubMed Central  Google Scholar 

  12. 12.

    Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Murdock, D. R. et al. Transcriptome-directed analysis for Mendelian disease diagnosis overcomes limitations of conventional genomic testing. J. Clin. Investig. (2020).

  14. 14.

    Koster, J. & Rahmann, S. Snakemake–a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).

    PubMed  Google Scholar 

  15. 15.

    Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    PubMed  PubMed Central  Google Scholar 

  17. 17.

    Li, Y. I. et al. Annotation-free quantification of RNA splicing using LeafCutter. Nat. Genet. 50, 151–158 (2018).

    CAS  PubMed  Google Scholar 

  18. 18.

    Brechtmann, F. et al. OUTRIDER: a statistical method for detecting aberrantly expressed genes in RNA sequencing data. Am. J. Hum. Genet. 103, 907–917 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Mertes, C. et al. Detection of aberrant splicing events in RNA-Seq data with FRASER. Preprint at bioRxiv (2019).

  20. 20.

    Köhler, S. et al. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res. 47, D1018–D1027 (2019).

    PubMed  Google Scholar 

  21. 21.

    GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    PubMed Central  Google Scholar 

  22. 22.

    Papatheodorou, I. et al. Expression Atlas: gene and protein expression across multiple studies and organisms. Nucleic Acids Res. 46, D246–D251 (2018).

    CAS  PubMed  Google Scholar 

  23. 23.

    Aicher, J. K., Jewell, P., Vaquero-Garcia, J., Barash, Y. & Bhoj, E. J. Mapping RNA splicing variations in clinically accessible and nonaccessible tissues to facilitate Mendelian disease diagnosis using RNA-seq. Genet. Med. 22, 1181–1190 (2020).

    PubMed  PubMed Central  Google Scholar 

  24. 24.

    Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Scotti, M. M. & Swanson, M. S. RNA mis-splicing in disease. Nat. Rev. Genet. 17, 19–32 (2016).

    CAS  PubMed  Google Scholar 

  26. 26.

    Singh, R. K. & Cooper, T. A. Pre-mRNA splicing in disease and therapeutics. Trends Mol. Med. 18, 472–482 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Cheng, J. et al. MMSplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol. 20, 48 (2019).

    PubMed  PubMed Central  Google Scholar 

  28. 28.

    Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).

    CAS  PubMed  Google Scholar 

  29. 29.

    Lee, H. et al. Diagnostic utility of transcriptome sequencing for rare Mendelian diseases. Genet. Med. 22, 490–499 (2019).

    PubMed  PubMed Central  Google Scholar 

  30. 30.

    Gonorazky, H. et al. RNAseq analysis for the diagnosis of muscular dystrophy. Ann. Clin. Transl. Neurol. 3, 55–60 (2016).

    CAS  PubMed  Google Scholar 

  31. 31.

    Kernohan, K. D. et al. Whole-transcriptome sequencing in blood provides a diagnosis of spinal muscular atrophy with progressive myoclonic epilepsy. Hum. Mutat. 38, 611–614 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Hamanaka, K. et al. RNA sequencing solved the most common but unrecognized NEB pathogenic variant in Japanese nemaline myopathy. Genet. Med. 21, 1629–1638 (2019).

    CAS  PubMed  Google Scholar 

  33. 33.

    Wang, K. et al. Whole-genome DNA/RNA sequencing identifies truncating mutations in RBCK1 in a novel Mendelian disease with neuromuscular and cardiac involvement. Genome Med. 5, 67 (2013).

    PubMed  PubMed Central  Google Scholar 

  34. 34.

    Pervouchine, D. D., Knowles, D. G. & Guigo, R. Intron-centric estimation of alternative splicing from RNA-seq data. Bioinformatics 29, 273–274 (2013).

    CAS  PubMed  Google Scholar 

  35. 35.

    Kapustin, Y. et al. Cryptic splice sites and split genes. Nucleic Acids Res. 39, 5837–5844 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Mohammadi, P. et al. Genetic regulatory variation in populations informs transcriptome analysis in rare disease. Science 366, 351–356 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Albers, C. A. et al. Compound inheritance of a low-frequency regulatory SNP and a rare null mutation in exon-junction complex subunit RBM8A causes TAR syndrome. Nat. Genet. 44, 435–439 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    van Haelst, M. M. et al. Further confirmation of the MED13L haploinsufficiency syndrome. Eur. J. Hum. Genet. 23, 135–138 (2015).

    PubMed  Google Scholar 

  39. 39.

    Lindstrand, A. et al. Different mutations in PDE4D associated with developmental disorders with mirror phenotypes. J. Med. Genet. 51, 45–54 (2014).

    CAS  PubMed  Google Scholar 

  40. 40.

    ’t Hoen, P. A. C. et al. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat. Biotechnol. 31, 1015–1022 (2013).

    PubMed  Google Scholar 

  41. 41.

    Lee, S. et al. NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types. Nucleic Acids Res. 45, e103–e103 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Castel, S. E., Mohammadi, P., Chung, W. K., Shen, Y. & Lappalainen, T. Rare variant phasing and haplotypic expression from RNA sequencing with phASER. Nat. Commun. 7, 12817 (2016).

    PubMed  PubMed Central  Google Scholar 

  43. 43.

    Mitelman, F., Johansson, B. & Mertens, F. The impact of translocations and gene fusions on cancer causation. Nat. Rev. Cancer 7, 233–245 (2007).

    CAS  PubMed  Google Scholar 

  44. 44.

    Dai, X., Theobard, R., Cheng, H., Xing, M. & Zhang, J. Fusion genes: a promising tool combating against cancer. Biochim. Biophys. Acta Rev. Cancer 1869, 149–160 (2018).

    CAS  PubMed  Google Scholar 

  45. 45.

    van Heesch, S. et al. Genomic and functional overlap between somatic and germline chromosomal rearrangements. Cell Rep. 9, 2001–2010 (2014).

    PubMed  Google Scholar 

  46. 46.

    Oliver, G. R. et al. A tailored approach to fusion transcript identification increases diagnosis of rare inherited disease. PLoS One 14, e0223337 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Tian, L. et al. CICERO: a versatile method for detecting complex and diverse driver fusions using cancer RNA sequencing data. Genome Biol. 21, 126 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. 48.

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  Google Scholar 

  49. 49.

    Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. 51.

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Van der Auwera, G. A. et al. in Current Protocols in Bioinformatics 11.10.1–11.10.33 (Wiley, 2013).

  53. 53.

    Li, H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27, 718–719 (2011).

    PubMed  PubMed Central  Google Scholar 

  54. 54.

    McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).

    PubMed  PubMed Central  Google Scholar 

  55. 55.

    Haeussler, M. et al. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res. 47, D853–D858 (2019).

    CAS  PubMed  Google Scholar 

  56. 56.

    Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).

    CAS  PubMed  Google Scholar 

  57. 57.

    Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 3 (2011).

    Google Scholar 

  58. 58.

    Ben-Kiki, O. & Evans, C. YAML Ain’t Markup Language (YAMLTM) Version 1.2. 80 (2009).

  59. 59.

    Anders, S., Pyl, P. T. & Huber, W. HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

    CAS  Google Scholar 

  60. 60.

    McCarthy, D. J., Chen, Y. & Smyth, G. K. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 40, 4288–4297 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. 61.

    Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. 62.

    Katz, Y. et al. Quantitative visualization of alternative exon expression from RNA-seq data. Bioinformatics 31, 2400–2402 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. 63.

    Amberger, J. S., Bocchini, C. A., Scott, A. F. & Hamosh, A. leveraging knowledge across phenotype–gene relationships. Nucleic Acids Res. 47, D1038–D1043 (2019).

    CAS  PubMed  Google Scholar 

  64. 64.

    Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–423 (2015).

    PubMed  PubMed Central  Google Scholar 

Download references


The authors thank all the users who helped with their feedback during the revision, especially D. R. Murdock. We also thank C. Andrade for helping with the DROP logo, as well as the members of the Gagneur lab for input. The Bavaria California Technology Center supported C.M. through a fellowship. The German Bundesministerium für Bildung und Forschung (BMBF) supported the study through the e:Med Networking fonds AbCD-Net (FKZ 01ZX1706A to V.A.Y., C.M., and J.G.), the German Network for Mitochondrial Disorders (mitoNET; 01GM1113C to H.P.), the E-Rare project GENOMIT (01GM1920A to M.G. and H.P.), the Medical Informatics Initiative CORD-MI (Collaboration on Rare Diseases) to V.A.Y., and the ERA PerMed project PerMiM (01KU2016A to H.P. and J.G.). The Genotype-Tissue Expression (GTEx) project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS.

Author information




V.A.Y., C.M., M.F.M., and J.G. participated to the design of the workflow. V.A.Y., C.M., M.F.M., D.K-A., I.F.S., and P.F.G. contributed to the computational workflow. L.F. implemented the candidate prioritization workflow. L.W. designed and implemented wBuild. V.A.Y. and J.G. wrote the manuscript with the help of L.F, D.K-A., M.G., I.F.S., and H.P. C.M., H.P., and J.G. supervised the research. All authors revised the manuscript.

Corresponding author

Correspondence to Julien Gagneur.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Protocols thanks Anna Esteve-Codina and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Kremer, L. et al. Nat. Commun. 8, 15824 (2017):

Murdock, D. R. et al. J. Clin. Invest. (2020):

Key data used in this protocol

Kremer, L. et al. Nat. Commun. 8, 15824 (2017):

GTEx Consortium. Nature 550, 7675 (2017):

Lappalainen, T. et al. Nature 501, 7468 (2013):

Supplementary information

Supplementary Information

Supplementary Figs. 1–8 and Supplementary Methods.

Reporting Summary

Supplementary Data 1

Sample annotation of the test dataset.

Supplementary Data 2

Configuration file for the test dataset.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yépez, V.A., Mertes, C., Müller, M.F. et al. Detection of aberrant gene expression events in RNA sequencing data. Nat Protoc 16, 1276–1296 (2021).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing