Mechanism for DNA transposons to generate introns on genomic scales

This article has been updated


The discovery of introns four decades ago was one of the most unexpected findings in molecular biology1. Introns are sequences interrupting genes that must be removed as part of messenger RNA production. Genome sequencing projects have shown that most eukaryotic genes contain at least one intron, and frequently many2,3. Comparison of these genomes reveals a history of long evolutionary periods during which few introns were gained, punctuated by episodes of rapid, extensive gain2,3. However, although several detailed mechanisms for such episodic intron generation have been proposed4,5,6,7,8, none has been empirically supported on a genomic scale. Here we show how short, non-autonomous DNA transposons independently generated hundreds to thousands of introns in the prasinophyte Micromonas pusilla and the pelagophyte Aureococcus anophagefferens. Each transposon carries one splice site. The other splice site is co-opted from the gene sequence that is duplicated upon transposon insertion, allowing perfect splicing out of the RNA. The distributions of sequences that can be co-opted are biased with respect to codons, and phasing of transposon-generated introns is similarly biased. These transposons insert between pre-existing nucleosomes, so that multiple nearby insertions generate nucleosome-sized intervening segments. Thus, transposon insertion and sequence co-option may explain the intron phase biases2 and prevalence of nucleosome-sized exons9 observed in eukaryotes. Overall, the two independent examples of proliferating elements illustrate a general DNA transposon mechanism that can plausibly account for episodes of rapid, extensive intron gain during eukaryotic evolution2,3.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: M. pusilla introner elements insert between pre-existing nucleosomes.
Figure 2: Identification of introner elements in A. anophagefferens.
Figure 3: Introner elements are DNA transposons that carry one splice site and co-opt the other.
Figure 4: Introner element dynamics and genomic implications.

Change history

  • 20 October 2016

    Fig. 3 was replaced owing to image corruption, and the occurrences of the bold text in Fig. 4b were corrected.


  1. 1

    Gilbert, W. Why genes in pieces? Nature 271, 501 (1978)

    CAS  ADS  Article  Google Scholar 

  2. 2

    Rogozin, I. B., Carmel, L., Csuros, M. & Koonin, E. V. Origin and evolution of spliceosomal introns. Biol. Direct 7, 11 (2012)

    CAS  Article  Google Scholar 

  3. 3

    Irimia, M. & Roy, S. W. Origin of spliceosomal introns and alternative splicing. Cold Spring Harb. Perspect. Biol . 6, a016071 (2014)

    Article  Google Scholar 

  4. 4

    Cavalier-Smith, T. Selfish DNA and the origin of introns. Nature 315, 283–284 (1985)

    CAS  ADS  Article  Google Scholar 

  5. 5

    Purugganan, M. & Wessler, S. The splicing of transposable elements and its role in intron evolution. Genetica 86, 295–303 (1992)

    CAS  Article  Google Scholar 

  6. 6

    Wang, W., Yu, H. & Long, M. Duplication-degeneration as a mechanism of gene fission and the origin of new genes in Drosophila species. Nat. Genet. 36, 523–527 (2004)

    CAS  Article  Google Scholar 

  7. 7

    Li, W., Tucker, A. E., Sung, W., Thomas, W. K. & Lynch, M. Extensive, recent intron gains in Daphnia populations. Science 326, 1260–1262 (2009)

    CAS  ADS  Article  Google Scholar 

  8. 8

    Yenerall, P. & Zhou, L. Identifying the mechanisms of intron gain: progress and trends. Biol. Direct 7, 29 (2012)

    CAS  Article  Google Scholar 

  9. 9

    Schwartz, S., Meshorer, E. & Ast, G. Chromatin organization marks exon-intron structure. Nat. Struct. Mol. Biol. 16, 990–995 (2009)

    CAS  Article  Google Scholar 

  10. 10

    Worden, A. Z. et al. Green evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes Micromonas. Science 324, 268–272 (2009)

    CAS  ADS  Article  Google Scholar 

  11. 11

    Verhelst, B., Van de Peer, Y. & Rouzé, P. The complex intron landscape and massive intron invasion in a picoeukaryote provides insights into intron evolution. Genome Biol. Evol. 5, 2393–2401 (2013)

    Article  Google Scholar 

  12. 12

    Huff, J. T. & Zilberman, D. Dnmt1-independent CG methylation contributes to nucleosome positioning in diverse eukaryotes. Cell 156, 1286–1297 (2014)

    CAS  Article  Google Scholar 

  13. 13

    Gobler, C. J. et al. Niche of harmful alga Aureococcus anophagefferens revealed through ecogenomics. Proc. Natl Acad. Sci. USA 108, 4352–4357 (2011)

    CAS  ADS  Article  Google Scholar 

  14. 14

    van der Burgt, A., Severing, E., de Wit, P. J. G. M. & Collemare, J. Birth of new spliceosomal introns in fungi by multiplication of introner-like elements. Curr. Biol. 22, 1260–1265 (2012)

    CAS  Article  Google Scholar 

  15. 15

    Simmons, M. P. et al. Intron invasions trace algal speciation and reveal nearly identical Arctic and Antarctic Micromonas populations. Mol. Biol. Evol . 32, 2219–2235 (2015)

    CAS  Article  Google Scholar 

  16. 16

    Lambowitz, A. M. & Zimmerly, S. Group II introns: mobile ribozymes that invade DNA. Cold Spring Harb. Perspect. Biol . 3, a003616 (2011)

    Google Scholar 

  17. 17

    Calos, M. P., Johnsrud, L. & Miller, J. H. DNA sequence at the integration sites of the insertion element IS1. Cell 13, 411–418 (1978)

    CAS  Article  Google Scholar 

  18. 18

    Grindley, N. D. F. IS1 insertion generates duplication of a nine base pair sequence at its target site. Cell 13, 419–426 (1978)

    CAS  Article  Google Scholar 

  19. 19

    Wessler, S. R., Bureau, T. E. & White, S. E. LTR-retrotransposons and MITEs: important players in the evolution of plant genomes. Curr. Opin. Genet. Dev. 5, 814–821 (1995)

    CAS  Article  Google Scholar 

  20. 20

    van Baren, M. J. et al. Evidence-based green algal genomics reveals marine diversity and ancestral characteristics of land plants. BMC Genomics 17, 267 (2016)

    Article  Google Scholar 

  21. 21

    Gangadharan, S., Mularoni, L., Fain-Thornton, J., Wheelan, S. J. & Craig, N. L. DNA transposon Hermes inserts into DNA in nucleosome-free regions in vivo. Proc. Natl Acad. Sci. USA 107, 21966–21972 (2010)

    CAS  ADS  Article  Google Scholar 

  22. 22

    Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013)

    CAS  Article  Google Scholar 

  23. 23

    Parfrey, L. W., Lahr, D. J. G., Knoll, A. H. & Katz, L. A. Estimating the timing of early eukaryotic diversification with multigene molecular clocks. Proc. Natl Acad. Sci. USA 108, 13624–13629 (2011)

    CAS  ADS  Article  Google Scholar 

  24. 24

    Feschotte, C. & Pritham, E. J. DNA transposons and the evolution of eukaryotic genomes. Annu. Rev. Genet. 41, 331–368 (2007)

    CAS  Article  Google Scholar 

  25. 25

    Qu, G. et al. RNA-RNA interactions and pre-mRNA mislocalization as drivers of group II intron loss from nuclear genomes. Proc. Natl Acad. Sci. USA 111, 6612–6617 (2014)

    CAS  ADS  Article  Google Scholar 

  26. 26

    Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol . 14, R36 (2013)

    Article  Google Scholar 

  27. 27

    Roberts, A., Trapnell, C., Donaghey, J., Rinn, J. L. & Pachter, L. Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol . 12, R22 (2011)

    CAS  Article  Google Scholar 

  28. 28

    Crooks, G. E., Hon, G., Chandonia, J.-M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res . 14, 1188–1190 (2004)

    CAS  Article  Google Scholar 

  29. 29

    Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    CAS  Article  Google Scholar 

  30. 30

    Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013)

    CAS  Article  Google Scholar 

  31. 31

    Nei, M. & Kumar, S. Molecular Evolution and Phylogenetics (Oxford Univ. Press, 2000)

  32. 32

    Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013)

    CAS  Article  Google Scholar 

  33. 33

    Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at (2013)

  34. 34

    Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at (2012)

  35. 35

    Fiston-Lavier, A.-S., Barrón, M. G., Petrov, D. A. & González, J. T-lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. Nucleic Acids Res . 43, e22 (2015)

    Article  Google Scholar 

  36. 36

    Keane, T. M., Wong, K. & Adams, D. J. RetroSeq: transposable element discovery from next-generation sequencing data. Bioinformatics 29, 389–390 (2013)

    CAS  Article  Google Scholar 

  37. 37

    Wildschutte, J. H., Baron, A., Diroff, N. M. & Kidd, J. M. Discovery and characterization of Alu repeat sequences via precise local read assembly. Nucleic Acids Res . 43, 10292–10307 (2015)

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38

    Gibbs, A. J. & McIntyre, G. A. The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur. J. Biochem. 16, 1–11 (1970)

    CAS  Article  Google Scholar 

Download references


We thank C. Guthrie’s group for discussion of splicing, J. Stabile for discussion of A. anophagefferens isolates, K. Bi for discussion of sequence variation and the Vincent J. Coates Genomics Sequencing Laboratory for sequencing. This work was supported by NIH grant T32HG000047 (J.T.H.) and a Beckman Young Investigator Award (D.Z.).

Author information




J.T.H. and S.W.R. performed the initial search for introner elements. J.T.H. performed the remaining experiments. J.T.H., D.Z. and S.W.R. designed the project, interpreted data and wrote the manuscript.

Corresponding authors

Correspondence to Daniel Zilberman or Scott W. Roy.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

The A. anophagefferens introner element tree is available at TreeBASE (; study 18167). Newly sequenced A. anophagefferens data are available at the SRA (SRP083781).

Reviewer Information

Nature thanks R. Chalmers, L. Hurst, D. Penny and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Figure 1 M. pusilla introner elements are in phase with nucleosome linker DNA, even without methylation.

Unmethylated regions (indicated by the line with arrowheads) are defined as containing no base positions with fractional methylation 0.5 or greater in a window starting from 50 bp upstream of the 5′ end of the introner element intron and continuing 234 bp downstream, which is 50 bp beyond the predominant M. pusilla introner element intron size of 184 bp (Fig. 2a). Mean values at each base position are shown for chromatin maps12 aligned to the subset (7%) of introner element introns residing in unmethylated regions (dark grey and dark blue for nucleosomes centres and DNA methylation, respectively), compared with alignment to all introner element introns (light grey and light blue; same data as in Fig. 1b for introner element introns). On the other hand, to assess whether introner elements could be in phase with methylated regions that are not also nucleosome linkers, we looked for introner elements that had both ends in methylated DNA regions12 but not in nucleosome linkers, which gave 35 potential candidates (1% of introner elements). Manual inspection revealed that 34 of the 35 candidates apparently nonetheless have ends in nucleosome linkers, simply being missed by the filtering criteria we used for calling linkers. This leaves one candidate, indicating little evidence that DNA methylated regions are found at introner element ends, which are not also nucleosome linkers. Thus, unmethylated nucleosome linkers could be the primary determinant of introner element insertion in at least some cases, whereas we find virtually no evidence that methylated regions could be the primary determinant of introner element insertion without also being nucleosome linkers.

Extended Data Figure 2 A. anophagefferens introner elements insert into pre-existing nucleosome linkers.

a, Introner element (IE) introns are generally in phase with nucleosome positions, whereas other introns are not. DNA methylation12 was aligned to the 5′ ends of introner element introns (dark blue) or other introns (light blue). We did not generate nucleosome data previously for A. anophagefferens but DNA methylation is a reliable indicator of linker locations12. b, Introner elements are in phase with the starts of genes, indicating insertion between pre-existing nucleosomes. The 5′ ends of introner element introns and DNA methylation12 were aligned to gene starts. A kernel density estimate of introner element ends is displayed with peaks marked by vertical broken lines.

Extended Data Figure 3 Target site duplications (TSDs) at introner element introns.

a, c, Intron sequences contain directly repeated sequences at their ends. Each A. anophagefferens (a) and M. pusilla (c) intron 5′ and 3′ end is directly aligned in each possible offset from −10 to 10 bp apart. Positions relative to the 5′ splice site from 10 bp upstream to 10 bp downstream are shown. Introner element (IE) introns are shown on the left and other regular non-introner element introns are in the centre, and the differences obtained by subtracting the identity percentages of other introns from those of introner element introns are on the right. Each panel is separated by a vertical black line and a diagonally stepped black line to delineate different regions: the upper left region represents alignment of upstream exon versus 3′ intron end sequence; the upper right represents 5′ intron end versus 3′ intron end; the lower right represents 5′ intron end versus downstream exon; and the lower left represents upstream exon versus downstream exon. The red arrowheads on the right indicate the offset with maximum average identity (0 in both cases). The red boxes in the right panels highlight the identified TSD length and position (see Supplementary Discussion). b, d, An example of an aligned 5′ (above) and 3′ (below) intron end of an introner element for the offset with maximum identity is shown in b for A. anophagefferens and d for M. pusilla. Exonic sequence is uppercase and boxed; intronic is lowercase. Vertical lines show identities that are part of at least an identical 2-mer with the red lines corresponding to the boxed regions in a and c.

Extended Data Figure 4 Terminal inverted repeats (TIRs) in introner element introns.

a, c, Intron end sequences contain inverted repeats. Each A. anophagefferens (a) and M. pusilla (c) intron 5′ and reverse of the 3′ end is aligned in each possible offset from −30 to 30 bp apart. Positions relative to the 5′ splice site from 30 bp upstream to 30 bp downstream are shown. Introner element (IE) introns are shown on the left and other regular non-introner element introns are on the right. In each panel the upper left region represents upstream exon versus downstream exon sequence, the upper right represents 5′ intron end versus downstream exon, the lower right represents 5′ intron end versus 3′ intron end, and the lower left represents upstream exon versus 3′ intron end. The red arrowheads (right) indicate the offset with maximum average complementarity. b, d, An example of an aligned 5′ (top) and 3′ (bottom, reversed so that it is 3′ to 5′) end of an introner element intron for the offset with maximum complementarity is shown in b for A. anophagefferens (offset of +8) and d for M. pusilla (offset of −5). Exonic sequence is uppercase and boxed; intronic is lowercase. Vertical lines show complementarities that are part of at least an identical 2-mer.

Extended Data Figure 5 Intron gain templated by nucleosomes and co-opted sequences.

Model for intron generation by introner elements acting as short non-autonomous DNA transposons that carry a splice site and insert between nucleosomes with co-option of the other splice site sequence.

Extended Data Figure 6 Diploid genomic sequence variation in a more recent isolate of A. anophagefferens.

a, Calling of sequence variation from genomic sequencing reads without an assumption of ploidy reveals a peak at an alternate allele fraction of approximately 0.5. The most likely scenario is that this A. anophagefferens isolate has a diploid genome. It is not physically plausible for it to have higher ploidy because that amount of chromatin could not fit into its extremely compact nucleus12. b, An example reference introner element (IE) is present within one allele and absent from the alternate allele. The locus is displayed as in Fig. 3a. The reference introner element is located in an annotated protein-coding gene with a 200-bp RNA sequencing-validated intron in the reference isolate. The alternate allele is probably exonic without an intron (broken lines), so that it encodes the same amino acid sequence. The TSD within the reference allele is 8 bp, immediately flanking the introner element TIRs. c, An example introner element not found within the reference allele is present within the alternate allele. The locus is displayed as in Fig. 3a. The alternate introner element is within an annotated protein-coding gene with a predicted 200-bp intron (broken lines). If the predicted intron is indeed spliced out of the RNA, then the alternate allele encodes the same amino acid sequence. The TSD within the alternate allele is 8 bp, immediately flanking the introner element TIRs.

Extended Data Figure 7 Splice site sequences.

Logos for the 10 bp upstream and downstream of 5′ and 3′ splice sites for introner element and other introns are shown for each organism. The rectangles show exonic positions. The core splice sites are GY (Y is C or T) and AG. Introner elements (IEs) combined with co-opted exonic sequence that is duplicated (Fig. 3) to generate particular sequences that extend beyond the core sites (bracketed). Specifically, this results in a predominance of AG|GY sequences (| denotes the position of splicing that ultimately occurs) at 5′ splice sites in M. pusilla introner element introns and 3′ splice sites in A. anophagefferens introner element introns. Similar respective sequences are observed in other introns in each organism: G|GT for M. pusilla 5′ splice sites and AG|G for A. anophagefferens 3′ splice sites. In non-introner element introns, these sequences have been under selection for long periods of time to promote RNA splicing, revealing the sequences extending beyond core sites that probably contribute to optimal splicing in each organism. The similarity of introner element intron splice sites to other intron splice sites thus suggests that introner elements in each organism generate new introns that are spliced reasonably well.

Extended Data Figure 8 Most introner elements are located in genes expressing low to average RNA levels.

Distributions of detectable RNA levels of all transcripts (black) and only those containing at least one introner element (IE-containing, green) are shown as measured by RNA sequencing. Box plots indicate the median, first and third quartiles with whiskers extending up to data 1.5 times the interquartile range away from the box. For M. pusilla, introner element-containing gene expression does not differ significantly from that of all genes, P = 0.59. For A. anophagefferens, introner element-containing gene expression is slightly lower than that of all genes, P = 0.041.

Supplementary information

Supplementary Information

This file contains a Supplementary Discussion regarding TSD and TIR identification and splice site orientation in introner elements. (PDF 92 kb)

Supplementary Data 1

RNA sequencing-validated introns in Micromonas pusilla CCMP1545. (XLS 932 kb)

Supplementary Data 2

RNA sequencing-validated introns in Aureococcus anophagefferens CCMP1984. (XLS 588 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Huff, J., Zilberman, D. & Roy, S. Mechanism for DNA transposons to generate introns on genomic scales. Nature 538, 533–536 (2016).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing