Pathogenic variants that alter protein code often disrupt splicing


The lack of tools to identify causative variants from sequencing data greatly limits the promise of precision medicine. Previous studies suggest that one-third of disease-associated alleles alter splicing. We discovered that the alleles causing splicing defects cluster in disease-associated genes (for example, haploinsufficient genes). We analyzed 4,964 published disease-causing exonic mutations using a massively parallel splicing assay (MaPSy), which showed an 81% concordance rate with splicing in patient tissue. Approximately 10% of exonic mutations altered splicing, mostly by disrupting multiple stages of spliceosome assembly. We present a large-scale characterization of exonic splicing mutations using a new technology that facilitates variant classification and keeps pace with variant discovery.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: MaPSy on the 5K panel.
Figure 2: Prevalence of splicing mutations in disease-associated genes.
Figure 3: Random forest classification of exonic mutations that disrupt splicing.
Figure 4: Detection of RBP motifs that affect splicing.
Figure 5: Isolation of spliceosomal intermediates.
Figure 6: Clustering of allelic ratios provides ESM mechanistic insights.

Accession codes


NCBI Reference Sequence


  1. 1

    Baird, P.A., Anderson, T.W., Newcombe, H.B. & Lowry, R.B. Genetic disorders in children and young adults: a population study. Am. J. Hum. Genet. 42, 677–693 (1988).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2

    Yang, Y. et al. Molecular findings among patients referred for clinical whole-exome sequencing. J. Am. Med. Assoc. 312, 1870–1879 (2014).

    CAS  Article  Google Scholar 

  3. 3

    Bamshad, M.J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011).

    CAS  Article  Google Scholar 

  4. 4

    Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

    CAS  Article  Google Scholar 

  5. 5

    Xue, Y. et al. Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. Am. J. Hum. Genet. 91, 1022–1032 (2012).

    CAS  Article  Google Scholar 

  6. 6

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    CAS  Article  Google Scholar 

  7. 7

    Lim, K.H., Ferraris, L., Filloux, M.E., Raphael, B.J. & Fairbrother, W.G. Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes. Proc. Natl. Acad. Sci. USA 108, 11093–11098 (2011).

    CAS  Article  Google Scholar 

  8. 8

    Stenson, P.D. et al. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 21, 577–581 (2003).

    CAS  Article  Google Scholar 

  9. 9

    Taggart, A.J., DeSimone, A.M., Shih, J.S., Filloux, M.E. & Fairbrother, W.G. Large-scale mapping of branchpoints in human pre-mRNA transcripts in vivo. Nat. Struct. Mol. Biol. 19, 719–721 (2012).

    CAS  Article  Google Scholar 

  10. 10

    Huang, N., Lee, I., Marcotte, E.M. & Hurles, M.E. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 6, e1001154 (2010).

    Article  Google Scholar 

  11. 11

    Ke, S. et al. Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res. 21, 1360–1374 (2011).

    CAS  Article  Google Scholar 

  12. 12

    Fairbrother, W.G., Yeh, R.F., Sharp, P.A. & Burge, C.B. Predictive identification of exonic splicing enhancers in human genes. Science 297, 1007–1013 (2002).

    CAS  Article  Google Scholar 

  13. 13

    Amit, M. et al. Differential GC content between exons and introns establishes distinct strategies of splice-site recognition. Cell Rep. 1, 543–556 (2012).

    CAS  Article  Google Scholar 

  14. 14

    Mort, M. et al. MutPred Splice: machine learning–based prediction of exonic variants that disrupt splicing. Genome Biol. 15, R19 (2014).

    Article  Google Scholar 

  15. 15

    Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  16. 16

    Wang, Z. et al. Systematic identification and analysis of exonic splicing silencers. Cell 119, 831–845 (2004).

    CAS  Article  Google Scholar 

  17. 17

    Ke, S., Zhang, X.H. & Chasin, L.A. Positive selection acting on splicing motifs reflects compensatory evolution. Genome Res. 18, 533–543 (2008).

    CAS  Article  Google Scholar 

  18. 18

    Smith, P.J. et al. An increased specificity score matrix for the prediction of SF2/ASF-specific exonic splicing enhancers. Hum. Mol. Genet. 15, 2490–2508 (2006).

    CAS  Article  Google Scholar 

  19. 19

    Zhang, X.H. & Chasin, L.A. Computational definition of sequence motifs governing constitutive exon splicing. Genes Dev. 18, 1241–1250 (2004).

    CAS  Article  Google Scholar 

  20. 20

    Ray, D. et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177 (2013).

    CAS  Article  Google Scholar 

  21. 21

    Ray, D. et al. Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat. Biotechnol. 27, 667–670 (2009).

    CAS  Article  Google Scholar 

  22. 22

    Long, J.C. & Caceres, J.F. The SR protein family of splicing factors: master regulators of gene expression. Biochem. J. 417, 15–27 (2009).

    CAS  Article  Google Scholar 

  23. 23

    Rahman, M.A. et al. SRSF1 and hnRNP H antagonistically regulate splicing of COLQ exon 16 in a congenital myasthenic syndrome. Sci. Rep. 5, 13208 (2015).

    CAS  Article  Google Scholar 

  24. 24

    Shen, H., Kan, J.L., Ghigna, C., Biamonti, G. & Green, M.R. A single polypyrimidine tract binding protein (PTB) binding site mediates splicing inhibition at mouse IgM exons M1 and M2. RNA 10, 787–794 (2004).

    CAS  Article  Google Scholar 

  25. 25

    Sterne-Weiler, T., Howard, J., Mort, M., Cooper, D.N. & Sanford, J.R. Loss of exon identity is a common mechanism of human inherited disease. Genome Res. 21, 1563–1571 (2011).

    CAS  Article  Google Scholar 

  26. 26

    Wang, J., Xiao, S.H. & Manley, J.L. Genetic analysis of the SR protein ASF/SF2: interchangeability of RS domains and negative control of splicing. Genes Dev. 12, 2222–2233 (1998).

    CAS  Article  Google Scholar 

  27. 27

    Lim, K.H. & Fairbrother, W.G. Spliceman—a computational web server that predicts sequence variations in pre-mRNA splicing. Bioinformatics 28, 1031–1032 (2012).

    CAS  Article  Google Scholar 

  28. 28

    Padgett, R.A., Grabowski, P.J., Konarska, M.M., Seiler, S. & Sharp, P.A. Splicing of messenger RNA precursors. Annu. Rev. Biochem. 55, 1119–1150 (1986).

    CAS  Article  Google Scholar 

  29. 29

    Konarska, M.M. & Sharp, P.A. Electrophoretic separation of complexes involved in the splicing of precursors to mRNAs. Cell 46, 845–855 (1986).

    CAS  Article  Google Scholar 

  30. 30

    Das, R. & Reed, R. Resolution of the mammalian E complex and the ATP-dependent spliceosomal complexes on native agarose mini-gels. RNA 5, 1504–1508 (1999).

    CAS  Article  Google Scholar 

  31. 31

    Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).

    Article  Google Scholar 

  32. 32

    MacArthur, D.G. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469–476 (2014).

    CAS  Article  Google Scholar 

  33. 33

    Wang, Y., Ma, M., Xiao, X. & Wang, Z. Intronic splicing enhancers, cognate splicing factors and context-dependent regulation rules. Nat. Struct. Mol. Biol. 19, 1044–1052 (2012).

    CAS  Article  Google Scholar 

  34. 34

    Rosenberg, A.B., Patwardhan, R.P., Shendure, J. & Seelig, G. Learning the sequence determinants of alternative splicing from millions of random sequences. Cell 163, 698–711 (2015).

    CAS  Article  Google Scholar 

  35. 35

    Yeo, G. & Burge, C.B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11, 377–394 (2004).

    CAS  Article  Google Scholar 

  36. 36

    Gozani, O., Patton, J.G. & Reed, R. A novel set of spliceosome-associated proteins and the essential splicing factor PSF bind stably to pre-mRNA prior to catalytic step II of the splicing reaction. EMBO J. 13, 3356–3367 (1994).

    CAS  Article  Google Scholar 

  37. 37

    Reichert, V. & Moore, M.J. Better conditions for mammalian in vitro splicing provided by acetate and glutamate as potassium counterions. Nucleic Acids Res. 28, 416–423 (2000).

    CAS  Article  Google Scholar 

  38. 38

    Dobin, A. et al. STAR: ultrafast universal RNA–seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  Article  Google Scholar 

  39. 39

    Kursa, M.B., Jankowski, A. & Rudnicki, W.R. Boruta—a system for feature selection. Fundam. Inform. 101, 271–285 (2010).

    Google Scholar 

  40. 40

    Fairbrother, W.G. et al. RESCUE-ESE identifies candidate exonic splicing enhancers in vertebrate exons. Nucleic Acids Res. 32, W187–W190 (2004).

    CAS  Article  Google Scholar 

  41. 41

    Lin, C.L. et al. RNA structure replaces the need for U2AF2 in splicing. Genome Res. 26, 12–23 (2016).

    CAS  Article  Google Scholar 

  42. 42

    Wasserman, W.W. & Sandelin, A. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5, 276–287 (2004).

    CAS  Article  Google Scholar 

  43. 43

    Chambers, J.M. & Hastie, T. Statistical Models in S (Wadsworth & Brooks/Cole Advanced Books & Software, 1992).

  44. 44

    Fraley, C. & Raftery, A.E. Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002).

    Article  Google Scholar 

  45. 45

    Pesarin, F. Multivariate Permutation Tests: With Applications in Biostatistics (J. Wiley, 2001).

Download references


We thank K. Villanueva for generating the list of SNPs used in this study and A. Leblang for compiling the variants to make the oligonucleotide library. We thank M. Jurica and M. Moore for suggestions and protocols for the in vitro spliceosome assembly assay and nuclear extract preparation. We thank A. Janssens for contacting investigators for patient samples. We thank A. Toland (Ohio State University), J. Marini (NIH/NICHD) and A. Goate (Washington University Alzheimer's Disease Research Center) for contributing patient samples for validation. R.S. was supported by a Postdoctoral Fellowship from the Center for Computational Molecular Biology (CCMB), Brown University. C.R. was supported by a Graduate Research Fellowship from the National Science Foundation (NSF). This work was supported by US National Institutes of Health (NIH) grants R01GM095612 (to W.G.F.), R01GM105681 (to W.G.F.) and R21HG007905 (to W.G.F.) and by SFARI award 342705 (to W.G.F.). Part of this research was conducted using computational resources and services at the Center for Computation and Visualization, Brown University and the Genomics Core Facility, Brown University.

Author information




W.G.F. and R.S. designed the experiments. R.S. performed MaPSy experiments. R.S., J.W., P.B.-T. and J.M. performed validation experiments. K.J.C. performed alignment, counting and RBP motif analyses. R.S. performed ESM analyses, machine learning and MaPSy SELEX analyses. C.L.R. performed HGMD gene analyses. C.B. and J.Y. developed the visualization web browser. W.G.F. and R.S. wrote the paper with contributions from all authors.

Corresponding author

Correspondence to William G Fairbrother.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Alternative splicing events in the 5K panel.

The majority of cryptic splicing occurred by creation of an AG or GT (Type I). While some other mutations increased the usage of a nearby weaker splice-site (type II). Very few mutations were found to abolish alternative splice-site usage (type III).

Supplementary Figure 2 MaPSy performance.

(ad) Agreement between allelic splicing ratios (log2) of three cell culture replicates of MaPSy in vivo (ac) and two experimental replicates of MaPSy in vitro (d). (e) Stacked histogram of mutant (red) and wild-type (blue) relative splicing efficiency in MaPSy in vivo (top) and in vitro (bottom). (f) Full gel of output (spliced species) from MaPSy in vivo.

Supplementary Figure 3 MaPSy validation in patient samples and ENCODE data.

(af) MaPSy’s identified ESMs in mutations causing inosine triphosphatase deficiency (a), galactosemia (b), haemorrhagic telangiectasia (c), Menkes syndrome (d) and Barth syndrome (e,f) were shown to exhibit splicing aberrations (exon skipping and/or intron retention) in RNAs derived from patient tissue samples. (g) Splicing efficiency in MaPSy corresponds to splicing in ENCODE data.

Supplementary Figure 4 Mode of inheritance in the 5K panel.

(a) Percent ESM in the 5K panel stratified by modes of inheritance in haploinsufficient genes (prediction score = 1), haplosufficient genes (prediction score < 0.7) and moderately haploinsufficient genes (1 > prediction score ≥ 0.7)8. Error bars, 95% confidence intervals. (b) Number of mutations in the different modes of inheritance in the 5K panel.

Supplementary Figure 5 Genes intolerant to protein-truncating variants (PTVs) in the ExAC population are predisposed to disease-associated splicing mutations.

(a) Mean fraction of ESMs in PTV-intolerant (pLI ≥ 0.9), semitolerant (0.1 < pLI < 0.9) and tolerant (pLI ≤ 0.1) genes in dominant and recessive traits. Error bars, s.e.m. (b) PTV-intolerant genes also have more introns than other genes, similar to disease genes that lose function via splicing mutations.

Supplementary Figure 6 Features of splicing.

(a) The mean of relative splicing efficiency of wild-type species in vivo (n = 2,086) is plotted against increasing mean of feature measures in sliding window (size = 200, step = 1). Shaded regions represent 95% confidence intervals. Intron length is plotted on a log10 scale. The mean of PhastCons score for all bases of the exon was used to measure conservation. Genomic features that have previously been associated with splicing are shown to display similar trends in MaPSy. P values were obtained from linear regression analyses. (b) The 5K panel is divided into five bins of increasing feature measures, and percent ESM in each bin is plotted. Error bars, 95% confidence intervals. Low differential GC content between exon and intron, less ESE, more ESS and less agreement with splice-site consensus sequence, which are all associated with weaker splicing are shown to sensitize exons to ESM. The Kruskal–Wallis test was used to obtain P values.

Supplementary Figure 7 The role of PTBP1 and SRSF1 in ESM phenotypes.

(a) The splicing phenotype of a mutation in exon 20 of COL1A2 that creates a PTBP1-binding motif was partially rescued when PTBP1 was knocked down. (b) A mutation that weaken a SRSF1-binding motif in exon 8 of MLH1 caused a modest but not significant increase of skipping events in the absence of SRSF1, whereas the wild-type exon that contains a SRSF1-binding site had a significant increase in skipping events when SRSF1 was knocked down.

Supplementary Figure 8 Overlap of intronic and exonic splicing regulatory motifs.

(a) The density for each RBP motif was calculated in all wild-type species (n = 2,048). (b) Clustering of intronic data reveals similar trends in vivo and in vitro. (c) Intronic splicing activators and exonic splicing repressors show a high degree of overlap. (d) Intronic splicing repressor motifs and exonic splicing activator motifs display a high degree of overlap. (e) Table of exonic splicing repressors and exonic splicing activators that exhibit the same function in vivo and in vitro.

Supplementary Figure 9 In vitro functional SELEX.

(a) Series of functional SELEX with MaPSy. (b) Mutant/wild type ratio in the B/C fraction in comparison to spliced species (left) and in the A fraction in comparison to spliced species (right). Enrichment in B/C complex is positively correlated with splicing, while enrichment in A complex is negatively correlated with splicing. (c) Clustering the effects of exonic mutation disruptions on different stages of spliceosomal assembly revealed mechanistic signatures of ESM. Only clusters with ≥8 members are shown.

Supplementary Figure 10 Mutant feature analyses in different clusters revealed distinct ESM mechanistic signatures.

Horizontal dotted lines indicate the mean value of the features in the 5K panel. Box plots of feature values that are significantly different than background (permuted cluster assignment) are colored red. The medians are indicated as horizontal bold lines, and the means as black hollow dots.

Supplementary Figure 11 ESM visualization browser.

A web browser was developed to visualize raw counts and information on individual mutations from original publications. Mutations can be queried by HGMD ID, gene or author.

Supplementary Figure 12 Common sequences of the 5K panel reporters.

(a) In vivo reporter sequence: CMV enhancer and promoter sequence (blue), adenovirus sequence (green; exon in uppercase and intron in lowercase), 200-mer library (red), ACTN1 intron 15 (lowercase, purple) and exon 16 (uppercase, purple), bGH poly(A) (cyan). (b) In vitro reporter sequence: adenovirus sequence (green; including T7 promoter sequence in bold), 200-mer oligo library (red) and additional intronic sequence (purple).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–12 and Supplementary Tables 2 and 4. (PDF 2725 kb)

Supplementary Table 1

SNPs evaluated with MaPSy. (XLSX 46 kb)

Supplementary Table 3

Genes that are enriched with SSM. (XLSX 31 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Soemedi, R., Cygan, K., Rhine, C. et al. Pathogenic variants that alter protein code often disrupt splicing. Nat Genet 49, 848–855 (2017).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing