Mechanism for DNA transposons to generate introns on genomic scales

Journal name:
Nature
Volume:
538,
Pages:
533–536
Date published:
DOI:
doi:10.1038/nature20110
Received
Accepted
Published online
Corrected online

The discovery of introns four decades ago was one of the most unexpected findings in molecular biology1. Introns are sequences interrupting genes that must be removed as part of messenger RNA production. Genome sequencing projects have shown that most eukaryotic genes contain at least one intron, and frequently many2, 3. Comparison of these genomes reveals a history of long evolutionary periods during which few introns were gained, punctuated by episodes of rapid, extensive gain2, 3. However, although several detailed mechanisms for such episodic intron generation have been proposed4, 5, 6, 7, 8, none has been empirically supported on a genomic scale. Here we show how short, non-autonomous DNA transposons independently generated hundreds to thousands of introns in the prasinophyte Micromonas pusilla and the pelagophyte Aureococcus anophagefferens. Each transposon carries one splice site. The other splice site is co-opted from the gene sequence that is duplicated upon transposon insertion, allowing perfect splicing out of the RNA. The distributions of sequences that can be co-opted are biased with respect to codons, and phasing of transposon-generated introns is similarly biased. These transposons insert between pre-existing nucleosomes, so that multiple nearby insertions generate nucleosome-sized intervening segments. Thus, transposon insertion and sequence co-option may explain the intron phase biases2 and prevalence of nucleosome-sized exons9 observed in eukaryotes. Overall, the two independent examples of proliferating elements illustrate a general DNA transposon mechanism that can plausibly account for episodes of rapid, extensive intron gain during eukaryotic evolution2, 3.

At a glance

Figures

  1. M. pusilla introner elements insert between pre-existing nucleosomes.
    Figure 1: M. pusilla introner elements insert between pre-existing nucleosomes.

    a, Each introner element (IE) contains a nucleosome with ends in linker DNA, which is specifically marked by methylation in this organism. Validated introns and chromatin data12 are displayed. HEME1 contains two introner elements (green). b, Introner element introns are generally in phase with nucleosome positions, whereas other introns are not. Chromatin maps12 are aligned to 5′ introner element intron ends (dark lines) or other intron ends (light lines). c, Introner elements are in phase with the starts of genes, indicating insertion between pre-existing nucleosomes. Chromatin maps12 and 5′ introner element ends are aligned to gene starts. A kernel density estimate of introner element ends is shown with peaks marked.

  2. Identification of introner elements in A. anophagefferens.
    Figure 2: Identification of introner elements in A. anophagefferens.

    a, Validated lengths for introner element (IE, green) and other (grey) introns. b, A. anophagefferens introner elements share sequence similarity in intronic sequence but not in neighbouring exonic sequence. Six example introner elements contain regions with maximal pairwise identities from 96 to 100%. Bases identical in at least 5 of the 6 sequences are green. c, Most A. anophagefferens introner elements can be aligned to form one or more related groups. Nodes present in >50% of 1,000 bootstraps are indicated with black dots on the ML tree. Introner elements are found in either orientation with respect to the intron (orange and blue). Many elements carry 3′ splice sites in both orientations (black lines on right).

  3. Introner elements are DNA transposons that carry one splice site and co-opt the other.
    Figure 3: Introner elements are DNA transposons that carry one splice site and co-opt the other.

    a, Introner elements (IEs, green) exhibit hallmarks of DNA transposons. Direct duplications (bold; target site duplications, TSDs) of 8 bp and 3 bp particular to A. anophagefferens and M. pusilla introner elements, respectively, are adjacent to the ends. Inverted repeats (underlined) are at introner element ends (terminal inverted repeats, TIRs). b, Introner elements carry one splice site and co-opt the other. Logos for the ends of the most abundant intron size classes are shown: 200 bp for A. anophagefferens and 184 bp for M. pusilla. In A. anophagefferens the 5′ splice site (bracketed) is constructed from a TSD (gene sequence before duplication), and the 3′ splice site (underlined) is carried in a transposon TIR. In M. pusilla the 5′ splice site (underlined) is carried in a transposon TIR and the 3′ splice site (bracketed) is constructed from a TSD.

  4. Introner element dynamics and genomic implications.
    Figure 4: Introner element dynamics and genomic implications.

    a, Presence–absence variation in a newer isolate of A. anophagefferens. *Non-reference introner elements (IEs) identified cannot be absent/absent. b, Sequences that can be co-opted to construct splice sites are biased with respect to codon phasing. For M. pusilla, introner element introns should be biased by availability of AG sequences that can be co-opted as 3′ splice sites (3′ss). For A. anophagefferens, introner element introns should be biased by availability of GY (Y is C or T) sequences that can be co-opted for 5′ splice sites (5′ss). Introner element introns have phase biases more similar to the respectively co-opted sequence (bold). c, Nearby introner element insertions generate nucleosome-sized segments. Distances between neighbouring introner element introns (solid) and between other neighbouring introns (broken) are displayed as kernel density estimates. Nucleosome repeat lengths12 of 206 bp for M. pusilla and 168 bp for A. anophagefferens show the expected sizes of integer numbers of nucleosomes (vertical lines).

  5. M. pusilla introner elements are in phase with nucleosome linker DNA, even without methylation.
    Extended Data Fig. 1: M. pusilla introner elements are in phase with nucleosome linker DNA, even without methylation.

    Unmethylated regions (indicated by the line with arrowheads) are defined as containing no base positions with fractional methylation 0.5 or greater in a window starting from 50 bp upstream of the 5′ end of the introner element intron and continuing 234 bp downstream, which is 50 bp beyond the predominant M. pusilla introner element intron size of 184 bp (Fig. 2a). Mean values at each base position are shown for chromatin maps12 aligned to the subset (7%) of introner element introns residing in unmethylated regions (dark grey and dark blue for nucleosomes centres and DNA methylation, respectively), compared with alignment to all introner element introns (light grey and light blue; same data as in Fig. 1b for introner element introns). On the other hand, to assess whether introner elements could be in phase with methylated regions that are not also nucleosome linkers, we looked for introner elements that had both ends in methylated DNA regions12 but not in nucleosome linkers, which gave 35 potential candidates (1% of introner elements). Manual inspection revealed that 34 of the 35 candidates apparently nonetheless have ends in nucleosome linkers, simply being missed by the filtering criteria we used for calling linkers. This leaves one candidate, indicating little evidence that DNA methylated regions are found at introner element ends, which are not also nucleosome linkers. Thus, unmethylated nucleosome linkers could be the primary determinant of introner element insertion in at least some cases, whereas we find virtually no evidence that methylated regions could be the primary determinant of introner element insertion without also being nucleosome linkers.

  6. A. anophagefferens introner elements insert into pre-existing nucleosome linkers.
    Extended Data Fig. 2: A. anophagefferens introner elements insert into pre-existing nucleosome linkers.

    a, Introner element (IE) introns are generally in phase with nucleosome positions, whereas other introns are not. DNA methylation12 was aligned to the 5′ ends of introner element introns (dark blue) or other introns (light blue). We did not generate nucleosome data previously for A. anophagefferens but DNA methylation is a reliable indicator of linker locations12. b, Introner elements are in phase with the starts of genes, indicating insertion between pre-existing nucleosomes. The 5′ ends of introner element introns and DNA methylation12 were aligned to gene starts. A kernel density estimate of introner element ends is displayed with peaks marked by vertical broken lines.

  7. Target site duplications (TSDs) at introner element introns.
    Extended Data Fig. 3: Target site duplications (TSDs) at introner element introns.

    a, c, Intron sequences contain directly repeated sequences at their ends. Each A. anophagefferens (a) and M. pusilla (c) intron 5′ and 3′ end is directly aligned in each possible offset from −10 to 10 bp apart. Positions relative to the 5′ splice site from 10 bp upstream to 10 bp downstream are shown. Introner element (IE) introns are shown on the left and other regular non-introner element introns are in the centre, and the differences obtained by subtracting the identity percentages of other introns from those of introner element introns are on the right. Each panel is separated by a vertical black line and a diagonally stepped black line to delineate different regions: the upper left region represents alignment of upstream exon versus 3′ intron end sequence; the upper right represents 5′ intron end versus 3′ intron end; the lower right represents 5′ intron end versus downstream exon; and the lower left represents upstream exon versus downstream exon. The red arrowheads on the right indicate the offset with maximum average identity (0 in both cases). The red boxes in the right panels highlight the identified TSD length and position (see Supplementary Discussion). b, d, An example of an aligned 5′ (above) and 3′ (below) intron end of an introner element for the offset with maximum identity is shown in b for A. anophagefferens and d for M. pusilla. Exonic sequence is uppercase and boxed; intronic is lowercase. Vertical lines show identities that are part of at least an identical 2-mer with the red lines corresponding to the boxed regions in a and c.

  8. Terminal inverted repeats (TIRs) in introner element introns.
    Extended Data Fig. 4: Terminal inverted repeats (TIRs) in introner element introns.

    a, c, Intron end sequences contain inverted repeats. Each A. anophagefferens (a) and M. pusilla (c) intron 5′ and reverse of the 3′ end is aligned in each possible offset from −30 to 30 bp apart. Positions relative to the 5′ splice site from 30 bp upstream to 30 bp downstream are shown. Introner element (IE) introns are shown on the left and other regular non-introner element introns are on the right. In each panel the upper left region represents upstream exon versus downstream exon sequence, the upper right represents 5′ intron end versus downstream exon, the lower right represents 5′ intron end versus 3′ intron end, and the lower left represents upstream exon versus 3′ intron end. The red arrowheads (right) indicate the offset with maximum average complementarity. b, d, An example of an aligned 5′ (top) and 3′ (bottom, reversed so that it is 3′ to 5′) end of an introner element intron for the offset with maximum complementarity is shown in b for A. anophagefferens (offset of +8) and d for M. pusilla (offset of −5). Exonic sequence is uppercase and boxed; intronic is lowercase. Vertical lines show complementarities that are part of at least an identical 2-mer.

  9. Intron gain templated by nucleosomes and co-opted sequences.
    Extended Data Fig. 5: Intron gain templated by nucleosomes and co-opted sequences.

    Model for intron generation by introner elements acting as short non-autonomous DNA transposons that carry a splice site and insert between nucleosomes with co-option of the other splice site sequence.

  10. Diploid genomic sequence variation in a more recent isolate of A. anophagefferens.
    Extended Data Fig. 6: Diploid genomic sequence variation in a more recent isolate of A. anophagefferens.

    a, Calling of sequence variation from genomic sequencing reads without an assumption of ploidy reveals a peak at an alternate allele fraction of approximately 0.5. The most likely scenario is that this A. anophagefferens isolate has a diploid genome. It is not physically plausible for it to have higher ploidy because that amount of chromatin could not fit into its extremely compact nucleus12. b, An example reference introner element (IE) is present within one allele and absent from the alternate allele. The locus is displayed as in Fig. 3a. The reference introner element is located in an annotated protein-coding gene with a 200-bp RNA sequencing-validated intron in the reference isolate. The alternate allele is probably exonic without an intron (broken lines), so that it encodes the same amino acid sequence. The TSD within the reference allele is 8 bp, immediately flanking the introner element TIRs. c, An example introner element not found within the reference allele is present within the alternate allele. The locus is displayed as in Fig. 3a. The alternate introner element is within an annotated protein-coding gene with a predicted 200-bp intron (broken lines). If the predicted intron is indeed spliced out of the RNA, then the alternate allele encodes the same amino acid sequence. The TSD within the alternate allele is 8 bp, immediately flanking the introner element TIRs.

  11. Splice site sequences.
    Extended Data Fig. 7: Splice site sequences.

    Logos for the 10 bp upstream and downstream of 5′ and 3′ splice sites for introner element and other introns are shown for each organism. The rectangles show exonic positions. The core splice sites are GY (Y is C or T) and AG. Introner elements (IEs) combined with co-opted exonic sequence that is duplicated (Fig. 3) to generate particular sequences that extend beyond the core sites (bracketed). Specifically, this results in a predominance of AG|GY sequences (| denotes the position of splicing that ultimately occurs) at 5′ splice sites in M. pusilla introner element introns and 3′ splice sites in A. anophagefferens introner element introns. Similar respective sequences are observed in other introns in each organism: G|GT for M. pusilla 5′ splice sites and AG|G for A. anophagefferens 3′ splice sites. In non-introner element introns, these sequences have been under selection for long periods of time to promote RNA splicing, revealing the sequences extending beyond core sites that probably contribute to optimal splicing in each organism. The similarity of introner element intron splice sites to other intron splice sites thus suggests that introner elements in each organism generate new introns that are spliced reasonably well.

  12. Most introner elements are located in genes expressing low to average RNA levels.
    Extended Data Fig. 8: Most introner elements are located in genes expressing low to average RNA levels.

    Distributions of detectable RNA levels of all transcripts (black) and only those containing at least one introner element (IE-containing, green) are shown as measured by RNA sequencing. Box plots indicate the median, first and third quartiles with whiskers extending up to data 1.5 times the interquartile range away from the box. For M. pusilla, introner element-containing gene expression does not differ significantly from that of all genes, P = 0.59. For A. anophagefferens, introner element-containing gene expression is slightly lower than that of all genes, P = 0.041.

Change history

Corrected online 20 October 2016
Fig. 3 was replaced owing to image corruption, and the occurrences of the bold text in Fig. 4b were corrected.

References

  1. Gilbert, W. Why genes in pieces? Nature 271, 501 (1978)
  2. Rogozin, I. B., Carmel, L., Csuros, M. & Koonin, E. V. Origin and evolution of spliceosomal introns. Biol. Direct 7, 11 (2012)
  3. Irimia, M. & Roy, S. W. Origin of spliceosomal introns and alternative splicing. Cold Spring Harb. Perspect. Biol . 6, a016071 (2014)
  4. Cavalier-Smith, T. Selfish DNA and the origin of introns. Nature 315, 283284 (1985)
  5. Purugganan, M. & Wessler, S. The splicing of transposable elements and its role in intron evolution. Genetica 86, 295303 (1992)
  6. Wang, W., Yu, H. & Long, M. Duplication-degeneration as a mechanism of gene fission and the origin of new genes in Drosophila species. Nat. Genet. 36, 523527 (2004)
  7. Li, W., Tucker, A. E., Sung, W., Thomas, W. K. & Lynch, M. Extensive, recent intron gains in Daphnia populations. Science 326, 12601262 (2009)
  8. Yenerall, P. & Zhou, L. Identifying the mechanisms of intron gain: progress and trends. Biol. Direct 7, 29 (2012)
  9. Schwartz, S., Meshorer, E. & Ast, G. Chromatin organization marks exon-intron structure. Nat. Struct. Mol. Biol. 16, 990995 (2009)
  10. Worden, A. Z. et al. Green evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes Micromonas. Science 324, 268272 (2009)
  11. Verhelst, B., Van de Peer, Y. & Rouzé, P. The complex intron landscape and massive intron invasion in a picoeukaryote provides insights into intron evolution. Genome Biol. Evol. 5, 23932401 (2013)
  12. Huff, J. T. & Zilberman, D. Dnmt1-independent CG methylation contributes to nucleosome positioning in diverse eukaryotes. Cell 156, 12861297 (2014)
  13. Gobler, C. J. et al. Niche of harmful alga Aureococcus anophagefferens revealed through ecogenomics. Proc. Natl Acad. Sci. USA 108, 43524357 (2011)
  14. van der Burgt, A., Severing, E., de Wit, P. J. G. M. & Collemare, J. Birth of new spliceosomal introns in fungi by multiplication of introner-like elements. Curr. Biol. 22, 12601265 (2012)
  15. Simmons, M. P. et al. Intron invasions trace algal speciation and reveal nearly identical Arctic and Antarctic Micromonas populations. Mol. Biol. Evol . 32, 22192235 (2015)
  16. Lambowitz, A. M. & Zimmerly, S. Group II introns: mobile ribozymes that invade DNA. Cold Spring Harb. Perspect. Biol . 3, a003616 (2011)
  17. Calos, M. P., Johnsrud, L. & Miller, J. H. DNA sequence at the integration sites of the insertion element IS1. Cell 13, 411418 (1978)
  18. Grindley, N. D. F. IS1 insertion generates duplication of a nine base pair sequence at its target site. Cell 13, 419426 (1978)
  19. Wessler, S. R., Bureau, T. E. & White, S. E. LTR-retrotransposons and MITEs: important players in the evolution of plant genomes. Curr. Opin. Genet. Dev. 5, 814821 (1995)
  20. van Baren, M. J. et al. Evidence-based green algal genomics reveals marine diversity and ancestral characteristics of land plants. BMC Genomics 17, 267 (2016)
  21. Gangadharan, S., Mularoni, L., Fain-Thornton, J., Wheelan, S. J. & Craig, N. L. DNA transposon Hermes inserts into DNA in nucleosome-free regions in vivo. Proc. Natl Acad. Sci. USA 107, 2196621972 (2010)
  22. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 12131218 (2013)
  23. Parfrey, L. W., Lahr, D. J. G., Knoll, A. H. & Katz, L. A. Estimating the timing of early eukaryotic diversification with multigene molecular clocks. Proc. Natl Acad. Sci. USA 108, 1362413629 (2011)
  24. Feschotte, C. & Pritham, E. J. DNA transposons and the evolution of eukaryotic genomes. Annu. Rev. Genet. 41, 331368 (2007)
  25. Qu, G. et al. RNA-RNA interactions and pre-mRNA mislocalization as drivers of group II intron loss from nuclear genomes. Proc. Natl Acad. Sci. USA 111, 66126617 (2014)
  26. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol . 14, R36 (2013)
  27. Roberts, A., Trapnell, C., Donaghey, J., Rinn, J. L. & Pachter, L. Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol . 12, R22 (2011)
  28. Crooks, G. E., Hon, G., Chandonia, J.-M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res . 14, 11881190 (2004)
  29. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403410 (1990)
  30. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772780 (2013)
  31. Nei, M. & Kumar, S. Molecular Evolution and Phylogenetics (Oxford Univ. Press, 2000)
  32. Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30, 27252729 (2013)
  33. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013)
  34. Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at https://arxiv.org/abs/1207.3907 (2012)
  35. Fiston-Lavier, A.-S., Barrón, M. G., Petrov, D. A. & González, J. T-lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. Nucleic Acids Res . 43, e22 (2015)
  36. Keane, T. M., Wong, K. & Adams, D. J. RetroSeq: transposable element discovery from next-generation sequencing data. Bioinformatics 29, 389390 (2013)
  37. Wildschutte, J. H., Baron, A., Diroff, N. M. & Kidd, J. M. Discovery and characterization of Alu repeat sequences via precise local read assembly. Nucleic Acids Res . 43, 1029210307 (2015)
  38. Gibbs, A. J. & McIntyre, G. A. The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur. J. Biochem. 16, 111 (1970)

Download references

Author information

Affiliations

  1. Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA

    • Jason T. Huff &
    • Daniel Zilberman
  2. California Institute for Quantitative Biosciences, University of California, Berkeley, California 94720, USA

    • Jason T. Huff
  3. Department of Biology, San Francisco State University, San Francisco, California 94132, USA

    • Scott W. Roy

Contributions

J.T.H. and S.W.R. performed the initial search for introner elements. J.T.H. performed the remaining experiments. J.T.H., D.Z. and S.W.R. designed the project, interpreted data and wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

The A. anophagefferens introner element tree is available at TreeBASE (https://treebase.org; study 18167). Newly sequenced A. anophagefferens data are available at the SRA (SRP083781).

Reviewer Information

Nature thanks R. Chalmers, L. Hurst, D. Penny and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: M. pusilla introner elements are in phase with nucleosome linker DNA, even without methylation. (181 KB)

    Unmethylated regions (indicated by the line with arrowheads) are defined as containing no base positions with fractional methylation 0.5 or greater in a window starting from 50 bp upstream of the 5′ end of the introner element intron and continuing 234 bp downstream, which is 50 bp beyond the predominant M. pusilla introner element intron size of 184 bp (Fig. 2a). Mean values at each base position are shown for chromatin maps12 aligned to the subset (7%) of introner element introns residing in unmethylated regions (dark grey and dark blue for nucleosomes centres and DNA methylation, respectively), compared with alignment to all introner element introns (light grey and light blue; same data as in Fig. 1b for introner element introns). On the other hand, to assess whether introner elements could be in phase with methylated regions that are not also nucleosome linkers, we looked for introner elements that had both ends in methylated DNA regions12 but not in nucleosome linkers, which gave 35 potential candidates (1% of introner elements). Manual inspection revealed that 34 of the 35 candidates apparently nonetheless have ends in nucleosome linkers, simply being missed by the filtering criteria we used for calling linkers. This leaves one candidate, indicating little evidence that DNA methylated regions are found at introner element ends, which are not also nucleosome linkers. Thus, unmethylated nucleosome linkers could be the primary determinant of introner element insertion in at least some cases, whereas we find virtually no evidence that methylated regions could be the primary determinant of introner element insertion without also being nucleosome linkers.

  2. Extended Data Figure 2: A. anophagefferens introner elements insert into pre-existing nucleosome linkers. (280 KB)

    a, Introner element (IE) introns are generally in phase with nucleosome positions, whereas other introns are not. DNA methylation12 was aligned to the 5′ ends of introner element introns (dark blue) or other introns (light blue). We did not generate nucleosome data previously for A. anophagefferens but DNA methylation is a reliable indicator of linker locations12. b, Introner elements are in phase with the starts of genes, indicating insertion between pre-existing nucleosomes. The 5′ ends of introner element introns and DNA methylation12 were aligned to gene starts. A kernel density estimate of introner element ends is displayed with peaks marked by vertical broken lines.

  3. Extended Data Figure 3: Target site duplications (TSDs) at introner element introns. (323 KB)

    a, c, Intron sequences contain directly repeated sequences at their ends. Each A. anophagefferens (a) and M. pusilla (c) intron 5′ and 3′ end is directly aligned in each possible offset from −10 to 10 bp apart. Positions relative to the 5′ splice site from 10 bp upstream to 10 bp downstream are shown. Introner element (IE) introns are shown on the left and other regular non-introner element introns are in the centre, and the differences obtained by subtracting the identity percentages of other introns from those of introner element introns are on the right. Each panel is separated by a vertical black line and a diagonally stepped black line to delineate different regions: the upper left region represents alignment of upstream exon versus 3′ intron end sequence; the upper right represents 5′ intron end versus 3′ intron end; the lower right represents 5′ intron end versus downstream exon; and the lower left represents upstream exon versus downstream exon. The red arrowheads on the right indicate the offset with maximum average identity (0 in both cases). The red boxes in the right panels highlight the identified TSD length and position (see Supplementary Discussion). b, d, An example of an aligned 5′ (above) and 3′ (below) intron end of an introner element for the offset with maximum identity is shown in b for A. anophagefferens and d for M. pusilla. Exonic sequence is uppercase and boxed; intronic is lowercase. Vertical lines show identities that are part of at least an identical 2-mer with the red lines corresponding to the boxed regions in a and c.

  4. Extended Data Figure 4: Terminal inverted repeats (TIRs) in introner element introns. (395 KB)

    a, c, Intron end sequences contain inverted repeats. Each A. anophagefferens (a) and M. pusilla (c) intron 5′ and reverse of the 3′ end is aligned in each possible offset from −30 to 30 bp apart. Positions relative to the 5′ splice site from 30 bp upstream to 30 bp downstream are shown. Introner element (IE) introns are shown on the left and other regular non-introner element introns are on the right. In each panel the upper left region represents upstream exon versus downstream exon sequence, the upper right represents 5′ intron end versus downstream exon, the lower right represents 5′ intron end versus 3′ intron end, and the lower left represents upstream exon versus 3′ intron end. The red arrowheads (right) indicate the offset with maximum average complementarity. b, d, An example of an aligned 5′ (top) and 3′ (bottom, reversed so that it is 3′ to 5′) end of an introner element intron for the offset with maximum complementarity is shown in b for A. anophagefferens (offset of +8) and d for M. pusilla (offset of −5). Exonic sequence is uppercase and boxed; intronic is lowercase. Vertical lines show complementarities that are part of at least an identical 2-mer.

  5. Extended Data Figure 5: Intron gain templated by nucleosomes and co-opted sequences. (128 KB)

    Model for intron generation by introner elements acting as short non-autonomous DNA transposons that carry a splice site and insert between nucleosomes with co-option of the other splice site sequence.

  6. Extended Data Figure 6: Diploid genomic sequence variation in a more recent isolate of A. anophagefferens. (185 KB)

    a, Calling of sequence variation from genomic sequencing reads without an assumption of ploidy reveals a peak at an alternate allele fraction of approximately 0.5. The most likely scenario is that this A. anophagefferens isolate has a diploid genome. It is not physically plausible for it to have higher ploidy because that amount of chromatin could not fit into its extremely compact nucleus12. b, An example reference introner element (IE) is present within one allele and absent from the alternate allele. The locus is displayed as in Fig. 3a. The reference introner element is located in an annotated protein-coding gene with a 200-bp RNA sequencing-validated intron in the reference isolate. The alternate allele is probably exonic without an intron (broken lines), so that it encodes the same amino acid sequence. The TSD within the reference allele is 8 bp, immediately flanking the introner element TIRs. c, An example introner element not found within the reference allele is present within the alternate allele. The locus is displayed as in Fig. 3a. The alternate introner element is within an annotated protein-coding gene with a predicted 200-bp intron (broken lines). If the predicted intron is indeed spliced out of the RNA, then the alternate allele encodes the same amino acid sequence. The TSD within the alternate allele is 8 bp, immediately flanking the introner element TIRs.

  7. Extended Data Figure 7: Splice site sequences. (159 KB)

    Logos for the 10 bp upstream and downstream of 5′ and 3′ splice sites for introner element and other introns are shown for each organism. The rectangles show exonic positions. The core splice sites are GY (Y is C or T) and AG. Introner elements (IEs) combined with co-opted exonic sequence that is duplicated (Fig. 3) to generate particular sequences that extend beyond the core sites (bracketed). Specifically, this results in a predominance of AG|GY sequences (| denotes the position of splicing that ultimately occurs) at 5′ splice sites in M. pusilla introner element introns and 3′ splice sites in A. anophagefferens introner element introns. Similar respective sequences are observed in other introns in each organism: G|GT for M. pusilla 5′ splice sites and AG|G for A. anophagefferens 3′ splice sites. In non-introner element introns, these sequences have been under selection for long periods of time to promote RNA splicing, revealing the sequences extending beyond core sites that probably contribute to optimal splicing in each organism. The similarity of introner element intron splice sites to other intron splice sites thus suggests that introner elements in each organism generate new introns that are spliced reasonably well.

  8. Extended Data Figure 8: Most introner elements are located in genes expressing low to average RNA levels. (76 KB)

    Distributions of detectable RNA levels of all transcripts (black) and only those containing at least one introner element (IE-containing, green) are shown as measured by RNA sequencing. Box plots indicate the median, first and third quartiles with whiskers extending up to data 1.5 times the interquartile range away from the box. For M. pusilla, introner element-containing gene expression does not differ significantly from that of all genes, P = 0.59. For A. anophagefferens, introner element-containing gene expression is slightly lower than that of all genes, P = 0.041.

Supplementary information

PDF files

  1. Supplementary Information (92 KB)

    This file contains a Supplementary Discussion regarding TSD and TIR identification and splice site orientation in introner elements.

Excel files

  1. Supplementary Data 1 (932 KB)

    RNA sequencing-validated introns in Micromonas pusilla CCMP1545.

  2. Supplementary Data 2 (588 KB)

    RNA sequencing-validated introns in Aureococcus anophagefferens CCMP1984.

Additional data