Deciphering the splicing code

Journal name:
Date published:


Alternative splicing has a crucial role in the generation of biological complexity, and its misregulation is often involved in human disease. Here we describe the assembly of a ‘splicing code’, which uses combinations of hundreds of RNA features to predict tissue-dependent changes in alternative splicing for thousands of exons. The code determines new classes of splicing patterns, identifies distinct regulatory programs in different tissues, and identifies mutation-verified regulatory sequences. Widespread regulatory strategies are revealed, including the use of unexpectedly large combinations of features, the establishment of low exon inclusion levels that are overcome by features in specific tissues, the appearance of features deeper into introns than previously appreciated, and the modulation of splice variant levels by transcript structure characteristics. The code detected a class of exons whose inclusion silences expression in adult tissues by activating nonsense-mediated messenger RNA decay, but whose exclusion promotes expression during embryogenesis. The code facilitates the discovery and detailed characterization of regulated alternative splicing events on a genome-wide scale.

At a glance


  1. Assembling the splicing code.
    Figure 1: Assembling the splicing code.

    a, The code extracts hundreds of RNA features (known/new/short motifs and transcript structure features) from any exon of interest (red), its neighbouring exons (yellow) and intervening introns (blue). It then predicts whether or not the exon is alternatively spliced, and if so, whether the exon’s inclusion level will increase or decrease in a given tissue, relative to others. b, c, Code assembly proceeds by recursively adding features to maximize an information measure of code quality (b), and different feature types are preferred at different stages of assembly (c). d, The final assembled code achieves higher code quality than simpler codes derived using previously reported features and feature subsets. Cons, conservation; w/o, without. Error bars represent 1s.d.

  2. Predicting tissue-regulated alternative splicing.
    Figure 2: Predicting tissue-regulated alternative splicing.

    a, Classification rates for the final assembled code and simpler codes, assessed using microarray data (n = 28,920). b, Accuracy of the code in predicting microarray- and RT–PCR-measured changes in exon inclusion levels between pairs of tissues (n = 346 and n = 208). Error bars represent 1s.d. c, For each exon and pair of tissues, the RT–PCR-measured change in the percentage inclusion is plotted against the code-predicted change in the probability of exon inclusion. Dashed lines indicate RT–PCR differences exceeding 1s.d. in measurement error. d, RT–PCR data for four exons, plus code predictions indicating relative increases (dark shading) or decreases (light shading) in the exon inclusion level.

  3. Graphical depiction of the splicing code.
    Figure 3: Graphical depiction of the splicing code.

    a, The region-specific activity of each feature in increased exon inclusion (red bar) or exclusion (blue bar) is shown for CNS (C), muscle (M), embryo (E) and digestive (D) tissues, plus a tissue-independent mixture (I). A bar with/without a black hat indicates activity due to feature depletion/enrichment. Bar size conveys enrichment P-value; P<0.005 in all cases. Potential feature binding proteins are shown in parentheses. bd, Unexpectedly frequent feature pairs were identified and used to generate feature interaction networks for CNS (b), muscle (c) and embryonic (d) tissues. Node size and colour indicate the feature’s P-value and region (see colour key in a). Red/blue edges correspond to increased inclusion/exclusion and edge thickness conveys interaction P-value (false discovery rate-corrected Fisher test); P<0.05 in all cases. A thick/thin node boundary indicates activity due to feature depletion/enrichment.

  4. Validation of a regulatory feature map.
    Figure 4: Validation of a regulatory feature map.

    Regulatory elements in the intron upstream of exon 16 in Daam1 predicted to be associated with CNS-specific increased exon inclusion. a, Putative features (grey blocks), along with code-selected features from the compendium and the unbiased motif set (red blocks). Twelve segments were selected for testing (blue blocks), including one not overlapping with predictions (7), and 15 minigene reporters with single- or combined-segment substitutions were constructed and transfected into neuroblastoma (N2A) and epithelial (NIH-3T3) cells. b, RT–PCR results for the wild type and 15 mutants. c, Mutations of several nPTB-like elements support code-predicted synergistic interactions. d, Mutations of several CU and CUG elements between 55 and 90 nucleotides support code-predicted antagonistic interactions. Symbol size indicates the percentage exon inclusion (0–83.7%).

  5. The code predicts a mechanism for developmental regulation.
    Figure 5: The code predicts a mechanism for developmental regulation.

    a, The code identified a class of PTC-introducing exons predicted to activate NMD when included in adult tissues, but to allow mRNA expression when skipped in embryonic tissues. b, RT–PCR data monitoring splicing and mRNA expression levels of transcripts from Xpo4, which contains a code-predicted PTC-introducing exon, in four adult tissues (cortex, cerebellum, kidney and liver) and three embryonic samples (embryonic day (E)9.5, E12.5 and E15). c, RT–PCR data monitoring mRNA levels of the NMD factor Upf1 and the PTC-containing Xpo4 isoform in neuroblastoma (N2A) cells transfected with control siRNAs or Upf1 siRNAs. The Xpo4 PTC-containing isoform was selectively amplified using an exon-specific primer. Gapdh mRNA levels represent a loading control.


  1. Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470476 (2008)
  2. Pan, Q., Shai, O., Lee, L. J., Frey, B. J. & Blencowe, B. J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature Genet. 40, 14131415 (2008)
  3. Wang, G.-S. & Cooper, T. A. Splicing in disease: disruption of the splicing code and the decoding machinery. Nature Rev. Genet. 8, 749761 (2007)
  4. Wang, Z. & Burge, C. B. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA 14, 802813 (2008)
  5. Hartmann, B. & Valcarcel, J. Decrypting the genome’s alternative messages. Curr. Opin. Cell Biol. 21, 377386 (2009)
  6. Hallegger, M., Llorian, M. & Smith, C. W. Alternative splicing: global insights. FEBS J. 277, 856866 (2010)
  7. Nilsen, T. W. & Graveley, B. R. Expansion of the eukaryotic proteome by alternative splicing. Nature 463, 457463 (2010)
  8. Blencowe, B. J. Alternative splicing: new insights from global analyses. Cell 126, 3747 (2006)
  9. Black, D. L. Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem. 72, 291336 (2003)
  10. Fagnani, M. et al. Functional coordination of alternative splicing in the mammalian central nervous system. Genome Biol. 8, R108 (2007)
  11. Shai, O., Morris, Q. D., Blencowe, B. J. & Frey, B. J. Inferring global levels of alternative splicing isoforms using a generative model of microarray data. Bioinformatics 22, 606613 (2006)
  12. Sugnet, C. W. et al. Unusual intron conservation near tissue-regulated exons found by splicing microarrays. PLoS Comput. Biol. 2, e4 (2006)
  13. Das, D. et al. A correlation with exon expression approach to identify cis-regulatory elements for tissue-specific alternative splicing. Nucleic Acids Res. 35, 48454857 (2007)
  14. Castle, J. C. et al. Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nature Genet. 40, 14161425 (2008)
  15. Minovitsky, S., Gee, S. L., Schokrpur, S., Dubchak, I. & Conboy, J. G. The splicing regulatory element, UGCAUG, is phylogenetically and spatially conserved in introns that flank tissue-specific alternative exons. Nucleic Acids Res. 33, 714724 (2005)
  16. Kawamoto, S. Neuron-specific alternative splicing of nonmuscle myosin II heavy chain-B pre-mRNA requires a cis-acting intron sequence. J. Biol. Chem. 271, 1761317616 (1996)
  17. Ule, J. et al. An RNA map predicting Nova-dependent splicing regulation. Nature 444, 580586 (2006)
  18. Licatalosi, D. D. et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464469 (2008)
  19. Chan, R. C. & Black, D. L. The polypyrimidine tract binding protein binds upstream of neural cell-specific c-src exon N1 to repress the splicing of the intron downstream. Mol. Cell. Biol. 17, 46674676 (1997)
  20. Ashiya, M. & Grabowski, P. J. A neuron-specific splicing switch mediated by an array of pre-mRNA repressor sites: evidence of a regulatory role for the polypyrimidine tract binding protein and a brain-specific PTB counterpart. RNA 3, 9961015 (1997)
  21. Faustino, N. A. & Cooper, T. A. Identification of putative new splicing targets for ETR-3 using sequences identified by systematic evolution of ligands by exponential enrichment. Mol. Cell. Biol. 25, 879887 (2005)
  22. Galarneau, A. & Richard, S. Target RNA motif and target mRNAs of the Quaking STAR protein. Nature Struct. Mol. Biol. 12, 691698 (2005)
  23. Sorek, R. & Ast, G. Intronic sequences flanking alternatively spliced exons are conserved between human and mouse. Genome Res. 13, 16311637 (2003)
  24. Fairbrother, W. G., Yeh, R. F., Sharp, P. A. & Burge, C. B. Predictive identification of exonic splicing enhancers in human genes. Science 297, 10071013 (2002)
  25. Zhang, X. H. & Chasin, L. A. Computational definition of sequence motifs governing constitutive exon splicing. Genes Dev. 18, 12411250 (2004)
  26. Stadler, M. B. et al. Inference of splicing regulatory activities by sequence neighborhood analysis. PLoS Genet. 2, e191 (2006)
  27. Yeo, G. W., Nostrand, E. L. & Liang, T. Y. Discovery and analysis of evolutionarily conserved intronic splicing regulatory elements. PLoS Genet. 3, e85 (2007)
  28. Xiao, X., Wang, Z., Jang, M. & Burge, C. B. Coevolutionary networks of splicing cis-regulatory elements. Proc. Natl Acad. Sci. USA 104, 1858318588 (2007)
  29. Shepard, P. J. & Hertel, K. J. Conserved RNA secondary structures promote alternative splicing. RNA 14, 14631469 (2008)
  30. Bishop, C. M. Pattern Recognition and Machine Learning. (Springer, 2006)
  31. Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379423 (1948)
  32. Wollerton, M. C., Gooding, C., Wagner, E. J., Garcia-Blanco, M. A. & Smith, C. W. Autoregulation of polypyrimidine tract binding protein by alternative splicing leading to nonsense-mediated decay. Mol. Cell 13, 91100 (2004)
  33. Wei, N., Lin, C. Q., Modafferi, E. F., Gomes, W. A. & Black, D. L. A unique intronic splicing enhancer controls the inclusion of the agrin Y exon. RNA 3, 12751288 (1997)
  34. Lim, L. P. & Sharp, P. A. Alternative splicing of the fibronectin EIIIB exon depends on specific TGCATG repeats. Mol. Cell. Biol. 18, 39003906 (1998)
  35. Côté, J., Dupuis, S., Jiang, Z. & Wu, J. Y. Caspase-2 pre-mRNA alternative splicing: identification of an intronic element containing a decoy 3′ acceptor site. Proc. Natl Acad. Sci. USA 98, 938943 (2001)
  36. Hayakawa, M. et al. Muscle-specific exonic splicing silencer for exon exclusion in human ATP synthase γ-subunit pre-mRNA. J. Biol. Chem. 277, 69746984 (2002)
  37. Jin, Y. et al. A vertebrate RNA-binding protein Fox-1 regulates tissue-specific splicing via the pentanucleotide GCAUG. EMBO J. 22, 905912 (2003)
  38. Zhang, C. et al. Defining the regulatory network of the tissue-specific splicing factors Fox-1 and Fox-2. Genes Dev. 22, 25502563 (2008)
  39. Calarco, J. A. et al. Regulation of vertebrate nervous system alternative splicing and development by an SR-related protein. Cell 138, 898910 (2009)
  40. Gooding, C. et al. A class of human exons with predicted distant branch points revealed by analysis of AG dinucleotide exclusion zones. Genome Biol. 7, R1 (2006)
  41. Wu, J. I., Reed, R. B., Grabowski, P. J. & Artzt, K. Function of quaking in myelination: regulation of alternative splicing. Proc. Natl Acad. Sci. USA 99, 42334238 (2002)
  42. Oberstrass, F. C. et al. Structure of PTB bound to RNA: specific binding and implications for splicing regulation. Science 309, 20542057 (2005)
  43. Markovtsov, V. et al. Cooperative assembly of an hnRNP complex induced by a tissue-specific homolog of polypyrimidine tract binding protein. Mol. Cell. Biol. 20, 74637479 (2000)
  44. Xie, J., Jan, C., Stoilov, P., Park, J. & Black, D. L. A consensus CaMK IV-responsive RNA sequence mediates regulation of alternative exons in neurons. RNA 11, 18251834 (2005)
  45. Pérez, I., Lin, C. H., McAfee, J. G. & Patton, J. G. Mutation of PTB binding sites causes misregulation of alternative 3′ splice site selection in vivo . RNA 3, 764778 (1997)
  46. Lipowsky, G. et al. Exportin 4: a mediator of a novel nuclear export pathway in higher eukaryotes. EMBO J. 19, 43624371 (2000)
  47. Gontan, C. et al. Exportin 4 mediates a novel nuclear import pathway for Sox family transcription factors. J. Cell Biol. 185, 2734 (2009)
  48. Lefebvre, V., Dumitriu, B., Penzo-Méndez, A., Han, Y. & Pallavi, B. Control of cell fate and differentiation by Sry-related high-mobility-group box (Sox) transcription factors. Int. J. Biochem. Cell Biol. 39, 21952214 (2007)
  49. Zender, L. et al. An oncogenomics-based in vivo RNAi screen identifies tumor suppressors in liver cancer. Cell 135, 852864 (2008)
  50. Ray, D. et al. RNACompete: Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nature Biotechnol. 27, 667670 (2009)

Download references

Author information

  1. These authors contributed equally to this work.

    • Yoseph Barash &
    • John A. Calarco


  1. Biomedical Engineering, Department of Electrical and Computer Engineering, University of Toronto, 10 King’s College Road, Toronto M5S 3G4, Canada

    • Yoseph Barash,
    • Weijun Gao,
    • Xinchen Wang,
    • Ofer Shai &
    • Brendan J. Frey
  2. Banting and Best Department of Medical Research and Department of Molecular Genetics, Donnelly Centre, University of Toronto, 160 College Street, Toronto M5S 3E1, Canada

    • Yoseph Barash,
    • John A. Calarco,
    • Qun Pan,
    • Xinchen Wang,
    • Benjamin J. Blencowe &
    • Brendan J. Frey
  3. Microsoft Research, 7 J. J. Thomson Avenue, Cambridge CB3 0FB, UK

    • Brendan J. Frey


Y.B. and B.J.F. developed the predictive framework and code assembly algorithms, analysed validation rates, and with B.J.B. and J.A.C. extracted predictions for regulatory mechanisms. Y.B., B.J.B. and B.J.F. produced the feature compendium. J.A.C. performed wet laboratory experiments. Q.P. generated exon and intron datasets. W.G. and Y.B. developed the web tool with input from the other authors. X.W. analysed exons from neurological disorder-associated genes. O.S. estimated the percentage inclusion values. B.J.F., B.J.B. and Y.B. designed the study and wrote the manuscript with input from the other authors.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary information (2.4M)

    This file contains Supplementary Information and Data, Supplementary Figures 1-3, Supplementary Tables 1-2 and References.

Additional data