Unprecedented frequency of mitochondrial introns in colonial bilaterians

Animal mitogenomes are typically devoid of introns. Here, we report the largest number of mitochondrial introns ever recorded from bilaterian animals. Mitochondrial introns were identified for the first time from the phylum Bryozoa. They were found in four species from three families (Order Cheilostomatida). A total of eight introns were found in the complete mitogenome of Exechonella vieirai, and five, 17 and 18 introns were found in the partial mitogenomes of Parantropora penelope, Discoporella cookae and Cupuladria biporosa, respectively. Intron-encoded protein domains reverse transcriptase and intron maturase (RVT-IM) were identified in all species. Introns in E. vieirai and P. penelope had conserved Group II intron ribozyme domains V and VI. Conserved domains were lacking from introns in D. cookae and C. biporosa, preventing their further categorization. Putative origins of metazoan introns were explored in a phylogenetic context, using an up-to-date alignment of mitochondrial RVT-IM domains. Results confirmed previous findings of multiple origins of annelid, placozoan and sponge RVT-IM domains and provided evidence for common intron donor sources across metazoan phyla. Our results corroborate growing evidence that some metazoans with regenerative abilities (i.e. placozoans, sponges, annelids and bryozoans) are susceptible to intron integration, most likely via horizontal gene transfer.


Results and discussion
Mitogenome information. Numbers of raw Illumina reads per species, contig sizes and coverage, and GenBank accession numbers are given in Table 1. Only the mitogenome of E. vieirai could be circularised. In this mitogenome, genes nad1 and nad4 were terminated by abbreviated stop codon U (Supplementary Table S1); stop codon UAA is presumably completed by polyadenylation of the cleaved polycistronic transcript mRNA 47 . The mitogenome of C. biporosa, although complete with regards to gene contents, could not be circularised. Both these mitogenomes had the full complement of 13 PCGs, two rRNA genes and 22 tRNA genes. The mitogenomes of P. penelope and D. cookae were incomplete. Genes missing in the P. penelope contig were tRNAs trnW and trnE. Furthermore, we infer that gene nad4L starts on the non-standard initiation codon GUU, which encodes for valine (Supplementary Table S1). Codon GUG, also encoding for valine, has been shown to be a functional initiation codon for atp6 in some humans 48 and has therefore been accepted as alternative initiation codon. However, no evidence for the functionality of GUU as initiation codon has yet been shown. The mitogenome data for D. cookae was in two separate contigs: Contig A (size: 5,283 bp) and Contig B (size: 15,602 bp). Missing genes were: PCGs atp6 and nad4 and tRNAs trnA, trnC, trnE, trnH, trnR, trnT, trnV and trnW; genes nad2 and nad6 were incomplete at their 5' ends. For diagrammatic representation of gene order of mt fragments, see Fig. 2. For more detailed mitogenome information (gene boundaries/lengths, initiation/stop codons, GC content), see Supplementary Table S1.    Table S2). Furthermore, three introns were found in cytb (EV-cytb-i-iii) and there was one intron-like region situated between tRNA histidine and nad5 (EV-H-nad5) and one intron in nad5 (EV-nad5). The concatenation of EV-H-nad5 + EV-nad5 aligned well with other introns. Thus, it seems that this intron is interrupted by a short nad5 exon of 114 bp ( Fig. 2; Supplementary  Table S1). To eliminate the possibility of this being an assembly artefact, we confirmed the presence of this exon by PCR with primers annealing in flanking intron regions (Fig. 2). The PCR product was of the expected size of 607 bp ( Supplementary Fig. S1) and Sanger reads matched the existing assembly. Thus, this is a bona fide result. To confirm that the short 114 bp exon is indeed part of nad5, we examined this region in the context of a wider alignment of published and unpublished bryozoan nad5 sequences. Although the 5' end of nad5 is rather variable, we conclude that the 114 bp exon does fit into the alignment lengthwise and sequence-wise (see Supplementary  Fig. S2). This result, however, raises the question on how intron-splicing and translation of the exon can be performed effectively seeing that the intron may not be able to fold into a functional secondary structure. This issue ought to be revisited in future work. The mitogenome of P. penelope had single introns in cox2 (PP-cox2), cytb (PP-cytb) and nad5 (PP-nad5) and two introns in cox1 (PP-cox1-i-ii) ( Fig. 2; Supplementary Table S2). Furthermore, both E. vieirai and P. penelope had IEPs with RVT and IM domains in cox1 introns, EV-cox1 and PP-cox1-i, respectively ( The largest number of introns, however, was found in the two members of the family Cupuladriidae: C. biporosa and D. cookae Although there are substantial similarities in the intron distribution across these two species (two introns each in cox1, cox2, cox3, nad1; single introns in nad2, nad4L and nad6 [but note that the D. cookae fragments start on incomplete nad2 and nad6 sequences, so there may be more undiscovered introns]), D. cookae harboured three introns in nad5, whereas C. biporosa harboured only two introns in this gene. A further two introns were found in nad4 in C. biporosa, but due the partial nature of the D. cookae mitogenome, nad4 was missing from its contigs, thus, no inferences can be made about shared introns for this gene. Furthermore, both species have one intergenic region that harbours an open reading frame containing RVT and IM domains (for Pfam results, see Supplementary Fig. S3). In C. biporosa this is located between cox1 and tRNA isoleucine and in D. cookae it is found between nad3 and tRNA leucine 1 ( Fig. 2; Supplementary Tables S1 and S2; see Supplementary Fig. S3 for Pfam results).
Group II introns typically exhibit a secondary structure composed of six domains 13 . An alignment to reference sequence Nephtys sp. in which all six domains were annotated according to Vallès et al. 30 , revealed that all introns in E. vieirai and P. penelope had Group II intron ribozyme domains V and VI, as determined by Rfam searches (Supplementary Table S3), all of which aligned well with reference sequence Nephtys sp. (Fig. 3). None of the cupuladriid introns harboured conserved Group II ribozyme domains.
All but one intron of E. vieirai and P. penelope, started and ended on conserved splice motifs GUGYG and YAY 12,13 . PP-cox1-ii intron deviated from this pattern and started on GUAUG (Supplementary Table S2). In contrast, none of the cupuladriid introns started on the conserved GUGYG motif. As a result, intron starts were determined by the end of the preceding exon, which resulted in highly variable putative start motifs (Supplementary  Table S2). Similarly, the ends of introns were determined by the start of the following exons, which meant that 16 out of the 35 cupuladriid introns did not terminate on a conserved AY motif (Supplementary Table S2). The lack of these conserved motifs suggests that the cupuladriid introns might splice via alternative 5' and 3' splice sites (see 14 ; Fig. 3I). Most Group II introns with alternative splice sites contain LAGLIDADG homing endonucleases 14 , but these could not be found in the cupuladriid mitogenomes using Pfam searches.
Concerning the identification of intron types in our study species, we consider the introns of E. vieirai and P. penelope to be possible Group II introns due to the presence of a) IEP with RVT-IM domains, b) conserved Group II ribozyme domains V and VI, c) Group II intron-typical start and stop motifs (except the non-standard start motif in PP-cox1-ii). All introns in these two species finish on YAY, which is a typical Group IIA end motif, versus RAY, which is typical for Group IIB introns 13 . However, the secondary structures of the two IEP-containing introns (EV-cox1, PP-cox1-i; Fig. 4, Supplementary Fig. S4) do not adhere to the conserved motifs shown in the Group IIA consensus secondary structure given on the Zimmerly lab website (http:// webap ps2. ucalg ary. ca/ ~group ii/) 10,49,50 . Although the secondary structure drawing of EV-cox1 (Fig. 4) shows multiple domains radiating from a central wheel, only domains V and VI could be identified reliably. Furthermore, only a few tertiary interaction features that facilitate intron-exon pairings, i.e. intron-exon binding sites IBS1-EBS1 and delta (δ) and delta prime (δ') 51 could be determined reliably; the IBS2-EBS2 pairing, although indicated in Fig. 4 should www.nature.com/scientificreports/ be treated with caution. Regarding the secondary structure of intron PP-cox1-i ( Supplementary Fig. S4), only Group II domains V and VI could be reliably identified. Thus, the lack of unequivocal evidence prevents us from assigning a Group II subcategory to these introns. Concerning C. biporosa and D. cookae, the identity of the introns cannot be ascertained because of the absence of identifiable ribozyme domains and Group II intron start/stop motifs. However, the presence of RVT-IM domains in both species indicates that they might be atypical Group II introns. Furthermore, none of the reconstructed secondary structures resemble the characteristic P1-P10 stems of Group I introns 15 (Supplementary Figs  S5, S6). Moreover, none of the Pfam searches of any of the six reading frames of the cupuladriid introns provided any hits with homing endonucleases, which are characteristic of Group I introns. Thus, we conclude that they are unlikely Group I introns. We also considered the possibility of them being Group III introns, which have been described from euglenoid chloroplast genomes 52 . Group III introns are somewhat degenerate versions of Group II introns. They are typically short (91-119 bp), AU-rich with a base bias of U > A > G > C, have degenerate Group II intron-like boundaries (5': NUNNG; 3': ANNUNNNN), and lack any consistent secondary structure 52-54 , although they have been shown to have a structure resembling Group II intron domain VI 55 . Regarding the cupuladriid introns, their lengths, although shorter than introns in the non-cupuladriids, exceed the typical Group III intron size. Intron sizes range from 221-519 bp (average 272 bp) and 240-397 bp (average 273 bp) in C. biporosa and D. cookae, respectively (Supplementary Tables S1, S2). Although AU rich, the most frequent nucleotide in cupuladriid introns is adenine. In C. biporosa the average frequencies of adenine and uracil are 53.6% and 22.6%, respectively. In D. cookae the average frequencies of adenine and uracil are 49.9% and 31.9%, respectively (Supplementary Table S1). Concerning intron boundaries, only one of the 18 introns in C. biporosa had a degenerate Group III intron 5' end motif (NUNNG). This was found toward the 5' end of intron CB-cox3ii (AACUAAG). Similarly, only two of the 15 introns in D. cookae for which the 5' ends were available, had the degenerate Group III intron motif towards their 5' ends: DC-nad5-i (UGAC UUUUG ) and DC-nad4L (GUAUG ); nucleotides not emboldened are currently considered parts of the introns as they are the nucleotides immediately following the preceding exons. The degeneracy of the 3' end motif means that it is present too frequently to make any meaningful inferences. Lastly, the secondary structure drawings frequently show a stem towards the 3' end (Supplementary Figs. S5, S6) which may be a domain VI-like structure. However, at this stage, we consider the evidence for/against Group III introns too ambiguous to make any informed conclusions.

Putative intron origin(s).
In order to examine the origin(s) of bryozoan mitochondrial introns in an evolutionary context, a phylogeny of RVT-IM domains with representatives from across the tree of life was constructed. The following description of phylogenetic interrelationships of RVT-IM domains ( Fig. 5; for ML tree    32 and in Fig. 5 in the present study. This high sequence divergence makes an unambiguous alignment difficult and, combined with a different taxon sampling, including the closely related Exechonella vieirai copy, might have led to this difference in topology. In addition to Clades I-IV, there were two terminals that did not group with any other metazoan RVT-IM copies. The first, Axinella verrucosa (copy 1141 20 ), formed a weakly supported clade (40% bs) with rhodophyte genera Hildenbrandia, Ahnfeltia, Pyropia, Bangia, Neoporphyra (as Porphyra haitanensis in Huchon et al. 20 ), all of which also formed the sister group in Huchon et al. 20 . In addition, in the present analysis, this rhodophyte clade also included Paralemanea, Grateloupia. This A. verrucosa + Rhodophyta clade formed the sister group with strong support (95% bs) to a strongly supported clade (99% bs) composed of eustigmatophycean Nannochloropsis, a gamma-proteobacterium, and diatom genera Psammoneis, Pseudo-nitschia, Thalassiosira and Cylindrotheca; in Huchon et al. 20 this corresponding sister group was formed of Thalassiosira and Chattonella. Thus, there is broad agreement with our analysis and that by Huchon et al. 20 , except that our broader taxon sampling added additional genera. We also observed, judging by the associated taxa in our analysis and that by Huchon et al. 20 , the two intron copies of Axinella verrucosa (copy 966 = GenBank accession CRX66588; copy 1141 = GenBank accession CRX66589) came out in switched positions in ours and their topologies. We suspect that this could either be due to a mislabelling of Fig. 7 in Huchon et al. 20 or a mislabelling of their GenBank accession records. The position of the second terminal, Parantropora penelope, could not be unambiguously resolved as it nested on a very long branch in a poorly supported clade (16% bs) together with representatives of Fungi, Haptophyta, Bacillariophyceae, Chlorophyta, Charophyta and Marchantiophyta.
Our results show that Group II IEPs with RVT-IM domains were acquired independently numerous times amongst metazoans. Regarding our target bryozoan species, RVT-IM domains containing IEPs were likely acquired independently in Exechonella vieirai and Parantropora penelope. Furthermore, we infer a separate but possibly shared origin in the two cupuladriid taxa Cupuladria biporosa and Discoporella cookae (Fig. 5). Although it is conceivable that RVT-IM containing IEPs were acquired from multiple source organisms in a common ancestor of cheilostome bryozoans and were purged differentially during their evolution leading to today's www.nature.com/scientificreports/ distribution of copies, the more parsimonious solution is likely the independent acquisition of introns. This question lends itself to be examined further using ancestral character estimation in the context of well sampled phylogenies which are being produced ( 57 ; Jenkins & Graham et al., in preparation). Moreover, the monophyly of bryozoan and annelid RVT-IM domains (Clade I and Clade IV; Fig. 5) implies that these copies had a common evolutionary origin. Much uncertainty remains regarding the insertion and propagation mechanisms of metazoan mt introns. Vertical transmission followed by independent losses has been proposed as mechanism in cnidarians and sponges 17 , whereas others have proposed a mixture of both horizontal gene transfer (HGT) and vertical transmission 18,19,58 . However, much evidence points to intron insertion via HGT from microbial or algal donors. In sponges, putative intron donors include fungi 16,21,22 , rhodophytes, diatoms and raphidophytes 20 . Additionally, placozoans have been proposed as possible intron donors in sponges 20,21 . Our analysis confirmed a close association of RVT-IM domains in sponges and placozoans (Clade II), as well as a close relationship of both with red algae (Fig. 5). Furthermore, Clades I and II formed a larger clade with fungal representatives, thus our findings corroborate previous speculations about possible origins. However, pinpointing the exact sources remains difficult, especially in the case of RVT-IM domains in Axinella verrucosa copy 1141, whose closest relative ranged from rhodophytes, diatoms, bacteria and Eustigmatophyceae. Still, the fact that we recovered four clades, each composed of multiple species and, in the cases of Clades I, II and IV, multiple phyla, indicates that introns in each of those clades likely originated from one type of marine organism, suggesting that certain intron donors are particularly successful in penetrating metazoan tissues. A commonality of the metazoan taxa harbouring Group II introns is their ability to bud and regenerate. The idea that budding and regeneration ability may favour intron transmission via somatic cells/tissues was already formulated by Szitenberg et al. 18 in the context of sponges. In the case of the regenerative annelid Nephtys sp., 59 , Vallès et al. 30 hypothesised that introns may have entered the germ line following HGT from possible bacterial donors via undifferentiated cells. Evidence from bryozoans in the present paper further supports the idea that organisms with regenerative abilities are easy targets for intron donors. The two cupuladriid species were found to possess an unprecedented large number of introns (18 in Cupuladria biporosa; 17 in Discoporella cookae) which is consistent with their particular reproductive life history strategy. The cupuladriid family are all freeliving bryozoans that rely heavily on clonal propagation by fragmentation and regeneration of their disc-shaped colonies (e.g. 46 ). This mode of reproduction often results in zooids being split open 60 which could provide intron donors easy access to undifferentiated somatic cells. Discoporella cookae has undergone rates of clonal propagation exceeding 95% for at least 8 million years 61 and clonal propagation in free-living bryozoans extends into the Cretaceous 62 . More generally, bryozoans are a rich source of bioactive compounds, many of which are likely produced by microbial symbionts 63,64 . Although the associated microbial species are mostly unknown, detailed studies on some bryozoan species have shown their colonies to harbour symbiotic bacteria on the surfaces of rhizoids 65 , intercellularly in the pallial sinus of their larvae and across larval surfaces 65,66 , as well as in tissue strands (funicular cords) that connect colony modules to one another 65,67 . Furthermore, bryozoans are often associated with microbial films 68 and have been found to be infected by fungal species 69 . These close relationships between bryozoan hosts and microbes may have facilitated the acquisition of introns in this group.

Within species intron relationships. Phylogenetic analysis of IEP-less Group II introns from E. vieirai
and P. penelope showed that the five copies of the latter formed a monophyly with strong nodal support (Fig. 6).
Regarding the E. vieirai copies, two of them formed a clade with the P. penelope copies. However, their relationships were unresolved due to the low nodal support for one of the nodes (0.69 posterior probability, 48% bs). Thus, there is only evidence for a species-specific common ancestry of IEP-less Group II introns in P. penelope. This suggests that these introns may have either been inserted multiple times from the same type of source organism or that the introns self-propagated within the mitogenomes post initial insertion. Conversely, phylogenetic analyses of the IEP-less introns of the two cupuladriid taxa was completely unresolved and showed no grouping by species (Supplementary Fig. S8). Whether this is an indication of multiple insertion events from different sources or of a high post-insertion mutation rate cannot be inferred. In any case, intraspecific sequence divergence (uncorrected p-distances) was very high in all species and ranged from 38 to 60% in E. vieirai and 43-53% in P. penelope and from 39 to 59% in C. biporosa and 41-58% in D. cookae (Supplementary Tables S5,  S6).

Unusual intron characteristics. There are several features in the bryozoan mitogenomes investigated
here that distinguish them from other bilaterian and metazoan intron-harbouring mitogenomes. In the context of bilaterians, our observed intron frequency in the four species of Bryozoa is unprecedented: eight in E. vieirai (if considering EV-H-nad5 and EV-nad5 a separate introns) and five, 18 and 17 in the incomplete mitogenomes of P. penelope, C. biporosa and D. cookae, respectively. This is the largest number of introns ever recorded in bilaterians (Nephtys sp.-one intron 30 ; Glycera fallax-two introns, Glycera unicornis-one intron 31 ; Decemunciger sp.-three introns 32 ; Cucullaea labiate-one intron 34,35 ). Furthermore, RVT-IM domains have only ever been found in cox1 in metazoans (Placozoa 23,24,26 , Porifera 20 , Polychaeta 30,32 ). Although this was also the case in our E. vieirai and P. penelope mitogenomes, open-reading frames with RVT-IM domains were found in intergenic regions in C. biporosa and D. cookae. Thus, this is the first time that these domains have been found residing outside of cox1 introns in metazoans.
As foreseen by Richter et al. 31 , increased genome sequencing has revealed more Group II introns within the Bilateria, with more likely to be uncovered in the future. Nevertheless, for now at least, the frequency of mitochondrial introns in bryozoans is exceptional when compared to other bilaterians. This provides an unparalleled opportunity for bryozoans to perhaps become not only a model for studying introns but also bilaterian mitogenome architecture overall, challenging our understanding of their function and evolution. Initial observations www.nature.com/scientificreports/ of intron absence versus presence in a broad phylogenetic context point towards a random acquisition process, rather than one guided by shared common ancestry (unpublished data). However, future work that dissects the process of intron gain and loss amongst closely related species ought to be explored. Furthermore, seeing that HGT from microbial donors is a likely mechanism for intron integration, studying the intraspecific distribution and variability of introns could provide interesting insights into their heritability and persistence.

Intron identification and validation.
Introns were identified through the observation of fragmented exons. Exons were knitted together in light of alignments with non-intron containing reference data. In cases where exons had not been identified by MITOS, nucleotide data in between exons were translated into amino acids using EMBOSS Transeq (https:// www. ebi. ac. uk/ Tools/ st/ emboss_ trans eq) and searched using blastp (https:// blast. ncbi. nlm. nih. gov) to identify additional exons. In cases where blastp failed to identify matches, the amino acid translation was scanned for conserved motifs as gleaned from reference alignments. In order to identify intron-encoded RVT-IM domains, intron sequences were translated into amino acids using EMBOSS Transeq from all six reading frames. Translated sequences were subjected to searches using Pfam 33.1 (http:// pfam. xfam. org/ 75 ) and blastp. No RVT-IM domains were initially found for Cupuladria and Discoporella. However, blastx searches of the complete mt fragments against the ucalgary.ca database (http:// webap ps2. ucalg ary. ca/ ~group ii/ cgi-bin/ main/ blast usr. php 50 ) identified intergenic regions that were subsequently confirmed as RVT-IM domains by blastp and Pfam searches. Significant as well as insignificant Pfam results were considered. Conserved Group II ribozyme domains were identified using Rfam (https:// rfam. xfam. org 76 ). Intron boundaries were refined by searching for conserved 5' (GUGYG) and 3' terminals ([Y]AY) 12,13 .
The bona fide presence of 22 of the 48 identified introns was validated by PCR and Sanger sequencing, using specific exon-intron primers (Supplementary Table S4). PCR amplification used puRE Taq Ready-to-go PCR beads (Amersham Biosciences) with a total reaction volume of 25 μl, using 1 μl of 10 μM of each primer and 3 μl of template gDNA. PCR cycling conditions were as follows: initial denaturation at 94 °C for 3 min, followed by 35 cycles at 94 °C for 30 s, T ann for 30 s, 72 °C for 1 min (3 min if product > 1000 bp), with a final extension step at 72 °C for 10 min (see Supplementary Table S4 for primer pair-specific T ann ). Two PCRs (EV-cox1 and PP-cox1-i) failed and were repeated with Takara Long-range PCR kit (Takara Bio Inc.). Total reaction volume was 50 μl, using 0.5 μl enzyme, 5 μl buffer, 8 μl dNTPS, 2 μl of 10 μM of each primer and 4 μl gDNA. PCR cycling conditions were as follows: initial denaturation at 94 °C for 2 min, followed by 40 cycles at 94 °C for 20 s, 51 °C for 30 s, 68 °C for 3 min, with a final extension at 68 °C for 10 min. Sequencing using the PCR primers was performed using an Applied Biosystems 3730 DNA Analyser, using BigDye version 3.1. Sanger reads were edited in Geneious. Sequence identity was verified by mapping to the reference contigs in Geneious.
Secondary structure drawings. Following the excision of IEP ORFs, intron sequences of E. vieirai and P.
Initial alignments were constructed in MAFFT v.7.453 82 using the --auto setting. Alignments were examined by eye in Geneious and obvious outliers were removed. Initial neighbour-joining trees, produced in PAUP* v.4.0a 83 , identified a large clade composed of plant RVT-IM sequences. As none of the focus sequences were nesting in this clade, it was removed from further analyses. The final alignment was carried out using the MAFFT settings --amino --bl 30 --genafpair --maxiterate 100. Ambiguously aligned positions were excluded using the stand-alone version of Gblocks v. 0.91b 84 www.nature.com/scientificreports/ To assess common ancestry of introns within species, phylogenetic analyses of (a) Group II IEP-less introns of Exechonella and Parantropora, and (b) IEP-less introns of Cupuladria and Discoporella were carried out. The outgroup for analysis (a) was the well-annotated Group II intron Nephtys sp. (EU293739 30 ) from which the IEP ORF had been excised according to their secondary structure drawing ( Fig. 1 in 30 ); no suitable outgroup was available for analysis (b). Alignments were constructed in MAFFT with settings --genafpair --maxiterate 1000. Ambiguously aligned positions were excluded using the stand-alone version of Gblocks as outlined above. ModelTest-NG was used to determine the best model of nucleotide substitution. ML with 1000 fast bootstrap replicates and BI analyses were conducted in RAxML and MrBayes v.3.2.6 88 under the HKY + G model. Two parallel MrBayes runs were performed for 5 million generations. The burn-in was defined as the point at which the average standard deviation of split frequencies was < 0.01.