Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Pathogenic variants that alter protein code often disrupt splicing


The lack of tools to identify causative variants from sequencing data greatly limits the promise of precision medicine. Previous studies suggest that one-third of disease-associated alleles alter splicing. We discovered that the alleles causing splicing defects cluster in disease-associated genes (for example, haploinsufficient genes). We analyzed 4,964 published disease-causing exonic mutations using a massively parallel splicing assay (MaPSy), which showed an 81% concordance rate with splicing in patient tissue. Approximately 10% of exonic mutations altered splicing, mostly by disrupting multiple stages of spliceosome assembly. We present a large-scale characterization of exonic splicing mutations using a new technology that facilitates variant classification and keeps pace with variant discovery.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: MaPSy on the 5K panel.
Figure 2: Prevalence of splicing mutations in disease-associated genes.
Figure 3: Random forest classification of exonic mutations that disrupt splicing.
Figure 4: Detection of RBP motifs that affect splicing.
Figure 5: Isolation of spliceosomal intermediates.
Figure 6: Clustering of allelic ratios provides ESM mechanistic insights.

Similar content being viewed by others

Accession codes


NCBI Reference Sequence


  1. Baird, P.A., Anderson, T.W., Newcombe, H.B. & Lowry, R.B. Genetic disorders in children and young adults: a population study. Am. J. Hum. Genet. 42, 677–693 (1988).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Yang, Y. et al. Molecular findings among patients referred for clinical whole-exome sequencing. J. Am. Med. Assoc. 312, 1870–1879 (2014).

    Article  CAS  Google Scholar 

  3. Bamshad, M.J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011).

    Article  CAS  PubMed  Google Scholar 

  4. Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Xue, Y. et al. Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. Am. J. Hum. Genet. 91, 1022–1032 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Lim, K.H., Ferraris, L., Filloux, M.E., Raphael, B.J. & Fairbrother, W.G. Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes. Proc. Natl. Acad. Sci. USA 108, 11093–11098 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Stenson, P.D. et al. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 21, 577–581 (2003).

    Article  CAS  PubMed  Google Scholar 

  9. Taggart, A.J., DeSimone, A.M., Shih, J.S., Filloux, M.E. & Fairbrother, W.G. Large-scale mapping of branchpoints in human pre-mRNA transcripts in vivo. Nat. Struct. Mol. Biol. 19, 719–721 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Huang, N., Lee, I., Marcotte, E.M. & Hurles, M.E. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 6, e1001154 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Ke, S. et al. Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res. 21, 1360–1374 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Fairbrother, W.G., Yeh, R.F., Sharp, P.A. & Burge, C.B. Predictive identification of exonic splicing enhancers in human genes. Science 297, 1007–1013 (2002).

    Article  CAS  PubMed  Google Scholar 

  13. Amit, M. et al. Differential GC content between exons and introns establishes distinct strategies of splice-site recognition. Cell Rep. 1, 543–556 (2012).

    Article  CAS  PubMed  Google Scholar 

  14. Mort, M. et al. MutPred Splice: machine learning–based prediction of exonic variants that disrupt splicing. Genome Biol. 15, R19 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  16. Wang, Z. et al. Systematic identification and analysis of exonic splicing silencers. Cell 119, 831–845 (2004).

    Article  CAS  PubMed  Google Scholar 

  17. Ke, S., Zhang, X.H. & Chasin, L.A. Positive selection acting on splicing motifs reflects compensatory evolution. Genome Res. 18, 533–543 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Smith, P.J. et al. An increased specificity score matrix for the prediction of SF2/ASF-specific exonic splicing enhancers. Hum. Mol. Genet. 15, 2490–2508 (2006).

    Article  CAS  PubMed  Google Scholar 

  19. Zhang, X.H. & Chasin, L.A. Computational definition of sequence motifs governing constitutive exon splicing. Genes Dev. 18, 1241–1250 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Ray, D. et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Ray, D. et al. Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat. Biotechnol. 27, 667–670 (2009).

    Article  CAS  PubMed  Google Scholar 

  22. Long, J.C. & Caceres, J.F. The SR protein family of splicing factors: master regulators of gene expression. Biochem. J. 417, 15–27 (2009).

    Article  CAS  PubMed  Google Scholar 

  23. Rahman, M.A. et al. SRSF1 and hnRNP H antagonistically regulate splicing of COLQ exon 16 in a congenital myasthenic syndrome. Sci. Rep. 5, 13208 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Shen, H., Kan, J.L., Ghigna, C., Biamonti, G. & Green, M.R. A single polypyrimidine tract binding protein (PTB) binding site mediates splicing inhibition at mouse IgM exons M1 and M2. RNA 10, 787–794 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Sterne-Weiler, T., Howard, J., Mort, M., Cooper, D.N. & Sanford, J.R. Loss of exon identity is a common mechanism of human inherited disease. Genome Res. 21, 1563–1571 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Wang, J., Xiao, S.H. & Manley, J.L. Genetic analysis of the SR protein ASF/SF2: interchangeability of RS domains and negative control of splicing. Genes Dev. 12, 2222–2233 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Lim, K.H. & Fairbrother, W.G. Spliceman—a computational web server that predicts sequence variations in pre-mRNA splicing. Bioinformatics 28, 1031–1032 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Padgett, R.A., Grabowski, P.J., Konarska, M.M., Seiler, S. & Sharp, P.A. Splicing of messenger RNA precursors. Annu. Rev. Biochem. 55, 1119–1150 (1986).

    Article  CAS  PubMed  Google Scholar 

  29. Konarska, M.M. & Sharp, P.A. Electrophoretic separation of complexes involved in the splicing of precursors to mRNAs. Cell 46, 845–855 (1986).

    Article  CAS  PubMed  Google Scholar 

  30. Das, R. & Reed, R. Resolution of the mammalian E complex and the ATP-dependent spliceosomal complexes on native agarose mini-gels. RNA 5, 1504–1508 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  32. MacArthur, D.G. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469–476 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Wang, Y., Ma, M., Xiao, X. & Wang, Z. Intronic splicing enhancers, cognate splicing factors and context-dependent regulation rules. Nat. Struct. Mol. Biol. 19, 1044–1052 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Rosenberg, A.B., Patwardhan, R.P., Shendure, J. & Seelig, G. Learning the sequence determinants of alternative splicing from millions of random sequences. Cell 163, 698–711 (2015).

    Article  CAS  PubMed  Google Scholar 

  35. Yeo, G. & Burge, C.B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11, 377–394 (2004).

    Article  CAS  PubMed  Google Scholar 

  36. Gozani, O., Patton, J.G. & Reed, R. A novel set of spliceosome-associated proteins and the essential splicing factor PSF bind stably to pre-mRNA prior to catalytic step II of the splicing reaction. EMBO J. 13, 3356–3367 (1994).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Reichert, V. & Moore, M.J. Better conditions for mammalian in vitro splicing provided by acetate and glutamate as potassium counterions. Nucleic Acids Res. 28, 416–423 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Dobin, A. et al. STAR: ultrafast universal RNA–seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  39. Kursa, M.B., Jankowski, A. & Rudnicki, W.R. Boruta—a system for feature selection. Fundam. Inform. 101, 271–285 (2010).

    Google Scholar 

  40. Fairbrother, W.G. et al. RESCUE-ESE identifies candidate exonic splicing enhancers in vertebrate exons. Nucleic Acids Res. 32, W187–W190 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Lin, C.L. et al. RNA structure replaces the need for U2AF2 in splicing. Genome Res. 26, 12–23 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Wasserman, W.W. & Sandelin, A. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5, 276–287 (2004).

    Article  CAS  PubMed  Google Scholar 

  43. Chambers, J.M. & Hastie, T. Statistical Models in S (Wadsworth & Brooks/Cole Advanced Books & Software, 1992).

  44. Fraley, C. & Raftery, A.E. Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002).

    Article  Google Scholar 

  45. Pesarin, F. Multivariate Permutation Tests: With Applications in Biostatistics (J. Wiley, 2001).

Download references


We thank K. Villanueva for generating the list of SNPs used in this study and A. Leblang for compiling the variants to make the oligonucleotide library. We thank M. Jurica and M. Moore for suggestions and protocols for the in vitro spliceosome assembly assay and nuclear extract preparation. We thank A. Janssens for contacting investigators for patient samples. We thank A. Toland (Ohio State University), J. Marini (NIH/NICHD) and A. Goate (Washington University Alzheimer's Disease Research Center) for contributing patient samples for validation. R.S. was supported by a Postdoctoral Fellowship from the Center for Computational Molecular Biology (CCMB), Brown University. C.R. was supported by a Graduate Research Fellowship from the National Science Foundation (NSF). This work was supported by US National Institutes of Health (NIH) grants R01GM095612 (to W.G.F.), R01GM105681 (to W.G.F.) and R21HG007905 (to W.G.F.) and by SFARI award 342705 (to W.G.F.). Part of this research was conducted using computational resources and services at the Center for Computation and Visualization, Brown University and the Genomics Core Facility, Brown University.

Author information

Authors and Affiliations



W.G.F. and R.S. designed the experiments. R.S. performed MaPSy experiments. R.S., J.W., P.B.-T. and J.M. performed validation experiments. K.J.C. performed alignment, counting and RBP motif analyses. R.S. performed ESM analyses, machine learning and MaPSy SELEX analyses. C.L.R. performed HGMD gene analyses. C.B. and J.Y. developed the visualization web browser. W.G.F. and R.S. wrote the paper with contributions from all authors.

Corresponding author

Correspondence to William G Fairbrother.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Alternative splicing events in the 5K panel.

The majority of cryptic splicing occurred by creation of an AG or GT (Type I). While some other mutations increased the usage of a nearby weaker splice-site (type II). Very few mutations were found to abolish alternative splice-site usage (type III).

Supplementary Figure 2 MaPSy performance.

(ad) Agreement between allelic splicing ratios (log2) of three cell culture replicates of MaPSy in vivo (ac) and two experimental replicates of MaPSy in vitro (d). (e) Stacked histogram of mutant (red) and wild-type (blue) relative splicing efficiency in MaPSy in vivo (top) and in vitro (bottom). (f) Full gel of output (spliced species) from MaPSy in vivo.

Supplementary Figure 3 MaPSy validation in patient samples and ENCODE data.

(af) MaPSy’s identified ESMs in mutations causing inosine triphosphatase deficiency (a), galactosemia (b), haemorrhagic telangiectasia (c), Menkes syndrome (d) and Barth syndrome (e,f) were shown to exhibit splicing aberrations (exon skipping and/or intron retention) in RNAs derived from patient tissue samples. (g) Splicing efficiency in MaPSy corresponds to splicing in ENCODE data.

Supplementary Figure 4 Mode of inheritance in the 5K panel.

(a) Percent ESM in the 5K panel stratified by modes of inheritance in haploinsufficient genes (prediction score = 1), haplosufficient genes (prediction score < 0.7) and moderately haploinsufficient genes (1 > prediction score ≥ 0.7)8. Error bars, 95% confidence intervals. (b) Number of mutations in the different modes of inheritance in the 5K panel.

Supplementary Figure 5 Genes intolerant to protein-truncating variants (PTVs) in the ExAC population are predisposed to disease-associated splicing mutations.

(a) Mean fraction of ESMs in PTV-intolerant (pLI ≥ 0.9), semitolerant (0.1 < pLI < 0.9) and tolerant (pLI ≤ 0.1) genes in dominant and recessive traits. Error bars, s.e.m. (b) PTV-intolerant genes also have more introns than other genes, similar to disease genes that lose function via splicing mutations.

Supplementary Figure 6 Features of splicing.

(a) The mean of relative splicing efficiency of wild-type species in vivo (n = 2,086) is plotted against increasing mean of feature measures in sliding window (size = 200, step = 1). Shaded regions represent 95% confidence intervals. Intron length is plotted on a log10 scale. The mean of PhastCons score for all bases of the exon was used to measure conservation. Genomic features that have previously been associated with splicing are shown to display similar trends in MaPSy. P values were obtained from linear regression analyses. (b) The 5K panel is divided into five bins of increasing feature measures, and percent ESM in each bin is plotted. Error bars, 95% confidence intervals. Low differential GC content between exon and intron, less ESE, more ESS and less agreement with splice-site consensus sequence, which are all associated with weaker splicing are shown to sensitize exons to ESM. The Kruskal–Wallis test was used to obtain P values.

Supplementary Figure 7 The role of PTBP1 and SRSF1 in ESM phenotypes.

(a) The splicing phenotype of a mutation in exon 20 of COL1A2 that creates a PTBP1-binding motif was partially rescued when PTBP1 was knocked down. (b) A mutation that weaken a SRSF1-binding motif in exon 8 of MLH1 caused a modest but not significant increase of skipping events in the absence of SRSF1, whereas the wild-type exon that contains a SRSF1-binding site had a significant increase in skipping events when SRSF1 was knocked down.

Supplementary Figure 8 Overlap of intronic and exonic splicing regulatory motifs.

(a) The density for each RBP motif was calculated in all wild-type species (n = 2,048). (b) Clustering of intronic data reveals similar trends in vivo and in vitro. (c) Intronic splicing activators and exonic splicing repressors show a high degree of overlap. (d) Intronic splicing repressor motifs and exonic splicing activator motifs display a high degree of overlap. (e) Table of exonic splicing repressors and exonic splicing activators that exhibit the same function in vivo and in vitro.

Supplementary Figure 9 In vitro functional SELEX.

(a) Series of functional SELEX with MaPSy. (b) Mutant/wild type ratio in the B/C fraction in comparison to spliced species (left) and in the A fraction in comparison to spliced species (right). Enrichment in B/C complex is positively correlated with splicing, while enrichment in A complex is negatively correlated with splicing. (c) Clustering the effects of exonic mutation disruptions on different stages of spliceosomal assembly revealed mechanistic signatures of ESM. Only clusters with ≥8 members are shown.

Supplementary Figure 10 Mutant feature analyses in different clusters revealed distinct ESM mechanistic signatures.

Horizontal dotted lines indicate the mean value of the features in the 5K panel. Box plots of feature values that are significantly different than background (permuted cluster assignment) are colored red. The medians are indicated as horizontal bold lines, and the means as black hollow dots.

Supplementary Figure 11 ESM visualization browser.

A web browser was developed to visualize raw counts and information on individual mutations from original publications. Mutations can be queried by HGMD ID, gene or author.

Supplementary Figure 12 Common sequences of the 5K panel reporters.

(a) In vivo reporter sequence: CMV enhancer and promoter sequence (blue), adenovirus sequence (green; exon in uppercase and intron in lowercase), 200-mer library (red), ACTN1 intron 15 (lowercase, purple) and exon 16 (uppercase, purple), bGH poly(A) (cyan). (b) In vitro reporter sequence: adenovirus sequence (green; including T7 promoter sequence in bold), 200-mer oligo library (red) and additional intronic sequence (purple).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–12 and Supplementary Tables 2 and 4. (PDF 2725 kb)

Supplementary Table 1

SNPs evaluated with MaPSy. (XLSX 46 kb)

Supplementary Table 3

Genes that are enriched with SSM. (XLSX 31 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Soemedi, R., Cygan, K., Rhine, C. et al. Pathogenic variants that alter protein code often disrupt splicing. Nat Genet 49, 848–855 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research