Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Recursive splicing in long vertebrate genes


It is generally believed that splicing removes introns as single units from precursor messenger RNA transcripts. However, some long Drosophila melanogaster introns contain a cryptic site, known as a recursive splice site (RS-site), that enables a multi-step process of intron removal termed recursive splicing1,2. The extent to which recursive splicing occurs in other species and its mechanistic basis have not been examined. Here we identify highly conserved RS-sites in genes expressed in the mammalian brain that encode proteins functioning in neuronal development. Moreover, the RS-sites are found in some of the longest introns across vertebrates. We find that vertebrate recursive splicing requires initial definition of an ‘RS-exon’ that follows the RS-site. The RS-exon is then excluded from the dominant mRNA isoform owing to competition with a reconstituted 5′ splice site formed at the RS-site after the first splicing step. Conversely, the RS-exon is included when preceded by cryptic promoters or exons that fail to reconstitute an efficient 5′ splice site. Most RS-exons contain a premature stop codon such that their inclusion can decrease mRNA stability. Thus, by establishing a binary splicing switch, RS-sites demarcate different mRNA isoforms emerging from long genes by coupling cryptic elements with inclusion of RS-exons.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Figure 1: Detection of recursive splice sites within long genes expressed in the human brain.
Figure 2: Recursive splicing requires initial definition of RS-exons.
Figure 3: The reconstituted 5′ splice site is required for RS-exon skipping.
Figure 4: Splice site competition allows a binary splicing switch for RS-exons.

Accession codes

Primary accessions


Data deposits

The sequence data and scripts are publically available from the European Genome-phenome Archive under the accession number EGAS00001001170, ArrayExpress (E-MTAB-3534), and


  1. Burnette, J. M., Miyamoto-Sato, E., Schaub, M. A., Conklin, J. & Lopez, A. J. Subdivision of large introns in Drosophila by recursive splicing at nonexonic elements. Genetics 170, 661–674 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Hatton, A. R., Subramaniam, V. & Lopez, A. J. Generation of alternative Ultrabithorax isoforms and stepwise removal of a large intron by resplicing at exon–exon junctions. Mol. Cell 2, 787–796 (1998)

    Article  CAS  PubMed  Google Scholar 

  3. Grellscheid, S. N. & Smith, C. W. An apparent pseudo-exon acts both as an alternative exon that leads to nonsense-mediated decay and as a zero-length exon. Mol. Cell. Biol 26, 2237–2246 (2006)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Shepard, S., McCreary, M. & Fedorov, A. The peculiarities of large intron splicing in animals. PLoS ONE 4, e7853 (2009)

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  5. Thakurela, S. et al. Gene regulation and priming by topoisomerase IIα in embryonic stem cells. Nature Commun. 4, 2478 (2013)

    Article  ADS  Google Scholar 

  6. Ameur, A. et al. Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain. Nature Struct. Mol. Biol. 18, 1435–1440 (2011)

    Article  CAS  Google Scholar 

  7. Rogelj, B. et al. Widespread binding of FUS along nascent RNA regulates alternative splicing in the brain. Sci. Rep. 2, 603 (2012)

    Article  PubMed  PubMed Central  Google Scholar 

  8. Ke, S. & Chasin, L. A. Context-dependent splicing regulation: exon definition, co-occurring motif pairs and tissue specificity. RNA Biol. 8, 384–388 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Robberson, B. L., Cote, G. J. & Berget, S. M. Exon definition may facilitate splice site selection in RNAs with multiple exons. Mol. Cell. Biol. 10, 84–94 (1990)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. McGlincy, N. J. & Smith, C. W. Alternative splicing resulting in nonsense-mediated mRNA decay: what is the meaning of nonsense? Trends Biochem. Sci. 33, 385–393 (2008)

    Article  CAS  PubMed  Google Scholar 

  11. Parra, M. K., Tan, J. S., Mohandas, N. & Conboy, J. G. Intrasplicing coordinates alternative first exons with alternative splicing in the protein 4.1R gene. EMBO J. 27, 122–131 (2008)

    Article  CAS  PubMed  Google Scholar 

  12. Yeo, G. & Burge, C. B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comp. Biol. 11, 377–394 (2004)

    Article  CAS  Google Scholar 

  13. Jaillon, O. et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431, 946–957 (2004)

    Article  ADS  PubMed  Google Scholar 

  14. Roy, M., Kim, N., Xing, Y. & Lee, C. The effect of intron length on exon creation ratios during the evolution of mammalian genomes. RNA 14, 2261–2273 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Pickrell, J. K., Pai, A. A., Gilad, Y. & Pritchard, J. K. Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet. 6, e1001236 (2010)

    Article  PubMed  PubMed Central  Google Scholar 

  16. Lagier-Tourenne, C. et al. Divergent roles of ALS-linked proteins FUS/TLS and TDP-43 intersect in processing long pre-mRNAs. Nature Neurosci. 15, 1488–1497 (2012)

    Article  CAS  PubMed  Google Scholar 

  17. Polymenidou, M. et al. Long pre-mRNA depletion and RNA missplicing contribute to neuronal vulnerability from loss of TDP-43. Nature Neurosci. 14, 459–468 (2011)

    Article  CAS  PubMed  Google Scholar 

  18. King, I. F. et al. Topoisomerases facilitate transcription of long genes linked to autism. Nature 501, 58–62 (2013)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  19. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Trabzuni, D. et al. Quality control parameters on a large dataset of regionally dissected human control brains for whole genome expression studies. J. Neurochem. 119, 275–28 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013)

    Article  CAS  PubMed  Google Scholar 

  22. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. König, J. et al. iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nature Struct. Mol. Biol. 17, 909–915 (2010)

    Article  Google Scholar 

  24. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009)

    Article  PubMed  PubMed Central  Google Scholar 

  25. Singh, J. & Padgett, R. A. Rates of in situ transcription and splicing in large human genes. Nature Struct. Mol. Biol. 16, 1128–1133 (2009)

    Article  CAS  Google Scholar 

  26. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013)

    Article  PubMed  PubMed Central  Google Scholar 

  27. Eden, E., Navon, R., Steinfeld, I., Lipson, D. & Yakhini, Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10, 48 (2009)

    Article  PubMed  PubMed Central  Google Scholar 

  28. Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnol. 28, 511–515 (2010)

    Article  CAS  Google Scholar 

  30. Herrera, F. J., Yamaguchi, T., Roelink, H. & Tjian, R. Core promoter factor TAF9B regulates neuronal gene expression. eLife 3, e02559 (2014)

    Article  PubMed  PubMed Central  Google Scholar 

  31. Madzo, J. et al. Hydroxymethylation at gene regulatory regions directs stem/early progenitor cell commitment during erythropoiesis. Cell Rep. 6, 231–244 (2014)

    Article  CAS  PubMed  Google Scholar 

Download references


We thank S. El-Andaloussi for technical support, J. Witten, J. König and Ule laboratory members for comments on the manuscript, and remaining members of the UK Brain Expression Consortium: S. Guelfi, K. D’Sa, M. Matarin, J. Vandrovcova, A. Ramasamy, J. A. Botia, C. Smith and P. Forabosco. This work was supported by the European Research Council (206726-CLIP and 617837-Translate) to J.U.; Marie Curie Post-doctoral Research Fellowship (627783-NeuroCRYSP) to L.B.; the Slovenian Research Agency (J7-5460) to J.U. and T.C.; the UK NIHR Biomedical Research Centre at Moorfields Eye Hospital and UCL Institute of Ophthalmology to V.P. and W.E.; the Wellcome Trust to S.W.W. and A.F.; the UK Medical Research Council (MRC) (U105185858) to J.U.; MRC training fellowships to C.R.S. and M.B.; and MRC project grant (G0901254), MRC training fellowship (G0802462) and MRC Sudden Death Brain Bank.

Author information

Authors and Affiliations



C.R.S., M.B. and J.U. conceived and designed the project; C.R.S., L.B., A.F., M.B., M.M. and D.T. performed experiments; C.R.S., W.E., L.B., V.P., T.C. and J.U. analysed the data and interpreted results with contributions from M.R., M.E.W. and J.H.; C.R.S. and J.U. wrote the manuscript with contributions from W.E., V.P., L.B. and S.W.W.

Corresponding authors

Correspondence to Vincent Plagnol or Jernej Ule.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Long gene expression is enriched in the brain.

a, GO term analysis of genes >150 kb relative to all human genes. All GO terms are associated with enrichment scores >2. b, The log2-fold gene expression ratios following DESeq19 analysis of all human protein-coding genes between the brain and all other tissues. Data are represented as Loess smoothing curves after the genes by their maximum length in kilobases. Hashed vertical line indicates 150 kb gene length. RNA-seq data was obtained from the GTEX consortium. c, Individual scatterplots used to create Fig. 1b and representing DESeq19 analysis of individual genes within indicated tissues compared to the brain. Red dots indicate genes that contain RS-sites, blue dots indicate dystrophin, and black dots indicate titin (two long genes most highly expressed in muscle tissues). Grey dots are all remaining genes. d, DESeq19 analysis of individual gene expression after vs before differentiation of C2C12 mouse myoblasts (GSM521256) into myogenic lineage (GSM521259)29, after vs before differentiation of mouse embryonic stem cells (GSM1346027) into motor neurons (GSM1346035)30, or after vs before differentiation of haematopoietic stem cells (GSM992931) into erythroid lineage (GSM992934)31. Loess smoothing curves are shown after sorting the genes by their maximum length in kilobases. Hashed vertical line indicates 150 kb gene length.

Extended Data Figure 2 Linear regression analysis and novel junction sequence considerations used to identify mammalian recursive splice sites.

a, Examples of RNA-seq read density patterns for three genes together with their calculated gradients across the (1) first intron >50 kb, and (2) the average across all other >50-kb long introns within the same gene. Gradients represent the change in summated read count every 5 kb since RNA-seq reads are grouped in 5-kb windows and linear regression performed on resulting histograms. b, Density plot indicating the ratio of gradients of all other >50 kb introns within the same gene: the gradient of the first intron >50 kb. Blue hashed line represents ratio of 1. This would indicate that gradients for long introns within the same gene are comparable and transcription is proceeding at a largely constant rate. c, Schematic of the bioinformatics pipeline used to identify novel junctions. d, Ranking of human 5′ splice site pentamer usage genome-wide. e, Nucleotide usage frequency at human 3′ splice sites genome-wide, and branch-point positioning relative to 3′ splice site genome-wide.

Extended Data Figure 3 Inferred splicing patterns identify recursive splice sites within mammalian >150 kb intron genes.

ag, RNA-seq (red) read density patterns and normalized FUS iCLIP (green) cross-link density patterns for the OPCML (a), ROBO2 (b), HS6ST3 (c), ANK3 (d), CADM2 (e), NCAM1 (f) and PDE4D (g) genes within human brains. RNA-seq reads and normalized FUS iCLIP cross-links are grouped in 5-kb windows. RefSeq introns >150 kb were searched for novel junctions and linear regression performed on all Ensembl introns >50 kb in which novel junctions were located. Gene isoforms displayed are those including introns within which significant junctions were identified. Red novel junctions represent significant improvements in goodness-of-fit in both RNA-seq and FUS regression analysis (P < 0.01 in both data sets, F-test). Blue novel junctions contact RS-exons. Grey novel junctions were not deemed significant following regression analysis. Zoomed area represents sequence at deep intronic loci surrounding novel junction. Phylo-P conservation track indicates sequence conservation across 46 levels of mammalian evolution.

Extended Data Figure 4 Inferred recursive splicing patterns in the OPCML gene across four separate brains.

a, RNA-seq read density patterns for the OPCML gene across 12 different regions of four separate brains. Gene isoform displayed is that which included the long first intron within which a significant novel junction was identified. RNA-seq reads are grouped in 5-kb windows. Dotted arrows indicate location of experimentally derived RS-site.

Extended Data Figure 5 RT–PCR confirmation of RS-sites in human and zebrafish samples, and prediction of mouse RS-exons.

a, Schematic of primer design used for RT–PCR validation of novel junctions. bg, RT–PCR analysis of CADM2 (b), HS6ST3 (c), ROBO2 (d), PDE4D_1_1 (e), PDE4D_1_2 (f) and PDE4D_2_2 (g) genes around RS-sites using indicated primers. For PDE4D sites, first number after gene name indicates RS-site studied, second number indicates the upstream exon used. See Extended Data Fig. 3g for junctions detected. h, RT–PCR analysis of cadm2a RS-site junction in adult male and female zebrafish embryos, together with an alignment of zebrafish (ZF) cadm2a RS-site to human (HS) CADM2 RS-site. i, Map of consensus splice site location and in-frame termination codons following RS-sites in indicated mouse genes. Strong consensus splice sites are GTAAG, GTGAG, GTAGG and GTATG. Weak consensus splice sites are GTAAA, GTAAT, GTGGG, GTAAC, GTCAG and GTACG.

Extended Data Figure 6 Conservation of inferred recursive splicing patterns in the mouse brain.

ah, Normalized Fus iCLIP read density patterns for the Opcml (a), Robo2 (b), Hs6st3 (c), Ank3 (d), Cadm1 (e), Ncam1 (f), Cadm2 (g) and Pde4d (h) genes within the mouse brain. Normalized FUS iCLIP cross-link sites are grouped in 5-kb windows, and the displayed linear regression lines were computed on resulting histograms. Zoomed area at deep intronic loci represents RS-site sequences conserved from humans to mouse.

Extended Data Figure 7 Promoter-dependent inclusion of RS-exons in CADM2 and NTM genes.

a, Number of cassette and constitutive exons starting with motif GURAG. bd, RT–PCR of CADM2 gene in the frontal cortex using primers indicated in b or Fig. 4a. RT–PCR was carried out on one (b) or four (c, d) human brains. In c, the inclusion of the second RS-exon occurs together with the minor promoter. Two bands are present for both PCR reactions due to the presence of an alternatively spliced exon following the RS-exon. This can result in two distinct long or short isoforms. In d, the inclusion of the second RS-exon occurs when the first RS-exon is included. Schematics in c and d represent examined splicing products together with expected length of products. e, RNA-seq read density patterns for the NTM gene and expected human isoforms. RNA-seq reads are grouped in 5-kb windows and linear regression performed on resulting histograms. A cryptic minor promoter/exon detected by RNA-seq is indicated by vertical red line. The annotated RS-exon is indicated by the vertical blue line. Zoomed area represents RS-site sequence at start of the annotated RS-exon. Primers to assess the major and minor promoter products associated with the RS-exon are indicated by coloured arrows. f, RT–PCR of NTM gene around RS-exon using indicated primers. g, RT–PCR analysis of NTM products in which the upstream exon is either derived from the major upstream promoter or the cryptic upstream promoter/exon. RT–PCR was performed in the frontal cortex of three human brains using primer sets indicated by coloured arrows in e. Schematics represent possible splicing products together with expected length of products. Top panel assesses RS-exon inclusion, bottom panel assesses RS-site junction detection.

Extended Data Figure 8 Recursive splicing regulates the alternative splicing of RS-exons.

a, Qiaxcel analysis and quantification of the splicing intermediates of indicated CADM2 splicing reporter products following transfection in SH-SY5Y cells. Primers used are indicated by red arrows in schematic, together with expected products and their sizes. b, RT–PCR analysis of the zebrafish cadm2a mRNA after in vivo injection of AON-2. Sequencing reveals RS-exon inclusion results in subsequent splicing to additional downstream cryptic elements before the second exon, explaining why RS-exon included product size is larger than expected. c, qRT–PCR analysis of exon–exon junctions surrounding the RS-site containing introns following AON-A1 mediated inhibition of RS-site use of the human CADM1 and ANK3 genes (n = 3, 1 experiment) or the zebrafish cadm2a gene (n = 7, 3 separate experiments). d, Splice site scores of reconstituted 5′ splice sites following first step of recursive splicing versus the 5′ splice sites of corresponding recursive exons.

Extended Data Figure 9 Cryptic elements are frequent in long first introns.

a, UCSC annotated isoforms of the OPCML gene together with spliced expressed sequence tags (ESTs) detected across the OPCML locus. Recursive exon is marked in blue, and the preceding exons produced by minor promoter or cryptic splicing of the long first intron are marked in red. b, Lengths of the 9 introns containing the high-confidence RS-sites compared to other introns across vertebrates. Results are an extension of Fig. 4g. c, Boxplot showing the detected number of unannotated alternative start exons that junction to the dominant second exon of brain expressed genes. Only novel junctions that do not match UCSC/GENCODE transcripts are considered for analysis. Genes are separated into bins based on the first intron length of the canonical isoform. Boxplot presents median, first and third quartile boundaries for each bin. Additional red diamonds indicate mean values for each bin. *P < 10−10(Mann–Whitney U test). Only tests between the 100 kb+ bin to other bins are shown. Right panel shows cartoon of the implications of boxplot results.

Supplementary information

Supplementary Information

This file contains a Supplementary Note, Supplementary References and full legends for Supplementary Tables 1-4. (PDF 285 kb)

Supplementary Table 1

This table contains novel junction detection and linear regression analysis – see Supplementary Information file for full legend. (XLSX 23108 kb)

Supplementary Table 2

This table contains functions and disease associations of high confidence RS-site containing genes. (XLSX 13 kb)

Supplementary Table 3

This table contains RS-site splice site competition scores and cryptic splice site usage across the transcriptome – see Supplementary Information file for full legend. (XLSX 3457 kb)

Supplementary Table 4

This table contains reporter constructs and primer sequences used in this study – see Supplementary Information file for full legend. (XLSX 15 kb)

PowerPoint slides

Source data

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sibley, C., Emmett, W., Blazquez, L. et al. Recursive splicing in long vertebrate genes. Nature 521, 371–375 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing