Introduction

The mitochondrial genome (mitogenome) has become commonly used for molecular phylogenetic analysis. Although mitogenomic phylogeny is less informative for resolving the higher classification, it often yields a robust framework for the phylogenetic relationships on shallow nodes1,2. In addition to phylogenetic reconstruction based on the nucleotide sequences of mitogenomes, gene order rearrangement has been used for inferring phylogenetic relationships3,4. The gene order of a mitogenome is relatively conserved; the order is sometimes the same among higher taxa, e.g., across orders of annelids, when considering only protein-coding genes (PCGs)3. Conversely, the gene order in some marine invertebrates, including annelids, shows an intra-familial variation2,5,6,7,8,9 and may shed light on the phylogenetic relationships of relatively closely related taxa.

Mitochondrial DNA (mtDNA) is a closed-circular molecule in most animals and is generally small (15–20 kb) compared to the nuclear genome. Animal mtDNA usually contains 37 genes, namely 13 PCGs (cox1–3, atp6, and 8, cytb, and nad1–6 and nad4l), 22 tRNAs, and two rRNAs10. Non-coding regions within the PCGs (i.e., introns) of mtDNA are known for many eukaryotes11. Known mitochondrial introns are mainly classified as groups I and II based on their structural features11. The group I introns are predominant in fungi, whereas group II introns are most frequent in plants12. Both group I and II introns appear to be rare in metazoan mitogenomes13. Indeed, reports of metazoan species possessing mitochondrial group II introns are sporadic. At least seven species in four phyla, namely Porifera13, Placozoa14, Mollusca15 (see “Discussion”), and Annelida16,17,18, possess group II introns in their mitogenomes. Group II introns are generally characterized by a secondary structure with six domains and intronic open reading frames (ORFs), encoding functions for splicing and mobility (e.g., reverse transcriptase and RNA maturase), and motifs beginning with 5′ GUGYG 3′ and ending with 5′ AY 3′11,19,20. However, these features are not necessarily present in all group II introns; for example, ORF-less introns21 and nucleotide substitutions in characteristic motifs22 have also been reported.

The phylum Annelida has over 20,000 described species23, including polychaetes, echiurans, sipunculans, leeches, and oligochaetes. The annelids show high morphological and ecological trait diversity and have adapted to various environments ranging from terrestrial sites to the ocean’s hadal zones. They are therefore interesting subjects for evolutionary study. The phylogenetic relationships between a wide range of lineages in Annelida have been well assessed using expressed sequence tags24, transcriptomic data25,26,27,28,29, and mitogenomes3,30,31,32. Currently, two major groups (Errantia and Sedentaria) and some early-branching families are recognized in Annelida. Sedentaria includes echiurans, vestimentiferans, clitellates (leeches and oligochaetes), and the sessile and tube-dwelling polychaetes. Several polychaete families are not yet included in the phylogenomics of annelids and therefore inter-familial relationships remain to be fully understood33.

The family Travisiidae includes small vermiform annelids with a single valid genus, Travisia, and at least 37 described species34,35,36,37. The species of Travisia are deposit feeders inhabiting the muddy bottom mainly in deep-sea lower than 200 m depth (reviewed by Blake and Maciolek35). The presence of Travisia in sediment samples is noticeable by their characteristic fetid odor and Travisia are known as “stink worm” for the smell. Although the function of chemical substances that are the source of the odor, is not fully understood, Taboada et al.38 verified that the lipophilic extract of Travisia sp. deters predatory starfish (the authors say it needs careful interpretation), and Nara and Seike39 inferred from the aggregation of trace fossil Macaronichnus segregatis degiberti that volatile chemical substances of Travisia might act as sex pheromones. Penry and Jumars40 hypothesized that microbial fermentation may be important in the digestive strategy of T. foetida considering the odor and the unusual gut structure of this species. Previously, Travisia and two synonymized genera (Dindymenides and Kesun41) were considered, based on morphological characters (see Rouse42), to be members of Opheliidae, which clusters with Capitellida by molecular phylogenetic analysis26 (Fig. 1). Conversely, molecular phylogeny indicates a close relationship between Travisia and scalibregmatid species, not Opheliidae43, and Travisiidae was recognized as a distinct subgroup in Scalibregmatidae44,45,46,47. The subgroup has been considered independent and raised to the family level based on morphological evidence35. Scalibregmatidae is clustered with Terebellida + Arenicolidae clade in recent phylogenomics based on transcriptomes48.

Figure 1
figure 1

Phylogenetic relationships from a subset of Sedentaria modified from a metatree regarded as a working hypothesis for future studies by Struck33. Dashed lines indicate lineages with undetermined phylogenetic positions.

In this study, we determined the mitochondrial genome sequence of Travisia sanrikuensis, the first mitogenome from the family Travisiidae, to elucidate the species’ mitogenomic features, reconstruct the phylogeny of Sedentaria, and examine the phylogenetic position of Travisiidae. The features of the mitochondrial genome, the intron in the barcoding region of cox1, and gene rearrangements are discussed. In addition, the nucleotide sequences of the mitochondrial cox1 intron of Travisia spp. were determined, and phylogenetic analysis was performed using the partial sequences of the group II intron.

Results

Assembly of the mitogenome

A total of 474,608 reads were obtained after trimming low-quality reads. A merged contig for T. sanrikuensis (12,166 bp) was obtained from an initial NOVOPlasty run using the 16S rRNA gene sequence (LC566242) seed. Although several assembly conditions were tested by varying kmer and read length, a merged contig longer than 500 bp was obtained only with kmer and read length set to 23 bp and 111 bp, respectively. A region from the merged contig showed moderately high homology (785 bp, max score 250, total score 665) to the nad5 gene of Glycera cf. tridactyla (KT989327) during a BLAST homology search. A partial sequence (192 bp) from the predicted nad5 gene in the initial T. sanrikuensis contig, which aligned with the nad5 gene of G. cf. tridactyla (KT989327) from position 6219–6410, was used as a seed sequence for a subsequent assembly. The resulting merged contig was 17,390 bp in length. Both ends of the contig had a consensus sequence larger than 100 bp, with both ends of the 16S rRNA gene sequence used as the initial seed (LC566242). Although the circular mitogenome of T. sanrikuensis was recovered by concatenating the contig and 16S rRNA gene sequence (LC566242), a dubious control region (> 2000 bp) between the nad5 and trnR genes, which includes tRNAs encoded on “−” strand and a long palindrome like sequence (a nearly perfect inverted repeat of > 600 bp), was present. This control region should be confirmed by polymerase chain reaction (PCR) but PCR failed to amplify a target including the control region and therefore the nearly complete mitogenome sequence (15,854 bp), excluding the control region, was registered (LC677172).

Mitochondrial genome organization

The mitogenome sequence includes 13 PCGs (atp6 and 8, cox1–3, cytb, nad1–6 and nad4l), 22 tRNA genes (one for each of the amino acids except for trnL and trnS), two rRNA genes [small ribosomal RNA (rrnS or 12S rRNA) and large ribosomal RNA (rrnL or 16S rRNA)] (Fig. 2 and Table 1). All determined genes were encoded on the “+” strand (Fig. 2). Both AT-skew and GC-skew of all genes, except for AT-skew of rrnS, were negative, indicating that T and C outnumber A and G, respectively (Table 2). Predicted secondary structures of tRNAs showed that the thymidine loops of trnD, trnM, and trnI and the dihydrouridine loop of trnK were reduced by 3 bp (Dataset S2). Dihydrouridine stem was lost in trnS1 (Dataset S2).

Figure 2
figure 2

Gene map of the nearly complete mitochondrial genome of Travisia sanrikuensis. A photograph shows T. sanrikuensis. Red: protein-coding genes, Blue: tRNAs, Green: rRNAs, Black: intron, Light gray: undetermined positions including a putative control region.

Table 1 Summary of the nearly complete mitochondrial genome of Travisia sanrikuensis (15,854 bp).
Table 2 Nucleotide composition (%) of 13 protein-coding genes and rRNAs, and the skewness of Travisia sanrikuensis.

Figure 3 shows the gene order of T. sanrikuensis and the putative ancestral gene order of PCGs. The gene order was identical to the order commonly found among Errantia and Sedentaria. The gene order, including determined tRNAs, was almost identical to the putative ancestral gene order of Sedentaria, which is known for oligochaetes, leeches, and Siboglinidae31,32 but the order of trnR and trnH diferred between T. sanrikuensis and the ancestors of Sedentaria.

Figure 3
figure 3

Gene order of the mitochondrial genome of (a) the putative ancestral gene order of Sedentaria (known for oligochaetes, leeches, and Siboglinidae) and (b) the nearly complete sequence of Travisia sanrikuensis. Red: protein-coding genes, Blue: tRNAs, Green: rRNAs, Gray: not determined. Underlines indicate gene order that differs between (a) and (b).

Features of the cox1 gene sequence in species of Travisia

The cox1 gene of T. sanrikuensis included an intron (882 bp) within the “Folmer region” and thus possessed a longer target sequence (1540 bp) than usual (658 bp). PCR successfully amplified the partial cox1 sequences of five unidentified species of Travisia and all of them included an intron (Table 3). The length of the introns of four Travisia spp. (T. sanrikuensis, GK623, GK625 and GK1734) were of varying lengths (790–1386 bp), although the intron sequences of two Travisia spp. (GK1732 and GK1736) were only partially determined (Dataset S3). The fully determined introns of Travisia spp. are shorter than the known mitochondrial cox1 introns in annelids (1647–2468 bp)16,17,18. The introns were inserted at the same positions in all specimens of Travisia. Sequence logos identified several conservative regions (Fig. S2).

Table 3 Intron size of Travisia spp.

The obtained nucleotide sequences in the “Folmer region” of Travisia were longer than expected, and thus the sequences were compared with those registered in the NCBI database. The BLAST search using cox1 from T. sanrikuensis did not return any sequences of Travisia (HM473706–HM473709, HQ025027, HM904906, and MF121290). The BLAST search with Travisia pupa sequences (HM473706–HM473709) resulted in a low max score (≤ 95.3), whereas the results of a search using T. sanrikuensis returned the mitogenome sequence of annelid species Melinna cristata (Ampharetidae; MW542504; max score = 926). Only five sequences were returned by the BLAST search of Travisia forbesii (HQ025027, HM904906, and MF121290), while 100 metazoan sequences were returned for T. sanrikuensis. An alignment of two scalibregmatids sequences (JN256052 and MN217515) and sequences from Travisia species showed ambiguous indels in the sequences of T. pupa (HM473706–HM473709), including indels that do not correspond to triplets (Dataset S4).

Introns in the cox1 gene of Travisia spp

The introns of Travisia spp. begin and end with motifs that are characteristic of group II introns (5′ GCGCG 3′ and 5′ AY 3′, respectively). Mfold identified secondary structures corresponding to domains V and VI of group II introns but other domains were not recovered. ORFs for type II intron maturase, characteristic of group II introns, were found in two species, namely Travisia sp. GK625 and Travisia sp. GK1736, by PfamScan. Phylogenetic analysis based on domain V and subsequent sequences of group II intron showed that Travisia spp. introns were monophyletic (BS = 98%) (Fig. 4). This clade did not cluster with the group II introns of other annelids, i.e., Decemunciger sp., Nephtys sp., Glycera fallax (cox1 I1 and I2), and Glycera unicornis. These annelid introns, except for G. fallax cox1 I1, were closely related and G. unicornis and Decemunciger sp. introns were monophyletic (BS = 98%). The intron of G. fallax cox1 I1 was not related to annelid introns but was clustered with the intron of the brown alga Pylaiella littoralis (BS = 100%). Sequence logos indicated five regions in the intron dataset were conservative and they roughly corresponded to the positions in the stem and ζ′ in the loop of domain V, and the stem of domain VI (Fig. S3).

Figure 4
figure 4

Maximum likelihood phylogeny of group II intron based on the nucleotide sequences of domain V and subsequent sites. The percentage of maximum likelihood bootstrap values (BS) ≥ 50% is shown above branches. Scientific names are followed by the host gene and intron ID. Bacterial group II introns (red), chloroplasts (green), and mitochondrial (blue) group II introns are included in the analysis. Annelid mitochondrial introns are shown in purple. OTUs with newly obtained sequences are in bold.

Phylogenetic relationships based on mitogenome sequences

Travisia sanrikuensis was included in the Maldanidae + Terebellida cluster with high support values (nucleotide: PP = 0.99, BS = 93%; AA: PP = 1.00, BS = 100%) but did not cluster with Thalassematidae in both nucleotide and AA sequence-based analyses (Fig. 5, Fig. S4). The monophyletic Terebellida clade was recovered as follows in the Newick format: (Pectinariidae, ((Terebellidae, Trichobranchidae), (Alvinellidae, Ampharetidae))). The phylogenetic positions of Thalassematidae (Capitellida) and Travisia were incongruent between nucleotide and AA sequence-based analyses. In the nucleotide-based analysis, Thalassematidae clustered with oligochaetes although support values were low (PP = 0.65, BS = 65%) (Fig. 5). Travisia sanrikuensis was sister to the clade Arenicolida (Maldanidae in the present analyses) + Terebellida (Ampharetidae, Alvinellidae, Pectinariidae, Terebellidae, and Trichobranchidae) but the support value of this lineage was low (PP = 0.89) and was not recovered by maximum likelihood (ML) analysis. In the AA-based analysis, the monophyly of early-branching Thalassematidae and polychaetes, including newly sequenced T. sanrikuensis, had relatively high support (PP = 0.98, BS = 94%) (Fig. S4).

Figure 5
figure 5

Bayesian phylogeny of a subset of Sedentaria based on the concatenated dataset, including the nucleotide sequences of 13 mitochondrial genome PCGs, 16S rRNA, and 12S rRNA (12,732 characters). Posterior probability (PP) followed by the percentage of the maximum likelihood bootstrap values (BS) above 50% is shown as numbers above branches. Asterisks indicate PP = 1.00 and BS = 100. Travisia sanrikuensis, for which the nucleotide sequence was newly obtained, is shown in bold.

All leech nodes were highly supported (PP ≥ 0.99, BS ≥ 98%) (Fig. 5). Rhynchobdellida (proboscis-bearing leeches) was recovered as monophyletic (PP ≥ 0.99, BS ≥ 98%). In Rhynchobdellida, Glossiphoniidae was sister to monophyletic Oceanobdelliformes (Ozobranchiidae and Piscicolidae). Monophyletic Arynchobdellida (leeches without a proboscis) (PP = 1.00, BS = 100%), including Erpobdellidae and Hirudinidae, was sister to Rhynchobdellida. Support values in the oligochaetes were largely low and this group was not the main subject of the present study, and thus, phylogenetic relationships in oligochaetes have not been mentioned here.

Discussion

We determined the nearly complete mitogenome sequence of a species from Travisiidae for the first time. Unexpectedly, an intron of a relatively short length (882 bp) was identified in the cox1 gene of T. sanrikuensis. Introns were also found in five undescribed travisiid species using Sanger sequencing. All determined travisiid introns in the mitochondrial cox1 gene (ranging from 790–1386 bp) were shorter than known cox1 introns found in Annelida, i.e., 1819 bp in Nephtys sp., 2357–2468 bp in Glycera spp., and 1647 bp in Decemunciger sp. The introns of travisiid species included motifs (beginning with 5′ GCGCG 3′ and ending with 5′ AY 3′) and domains V and VI that are characteristics of group II introns. Also, the ORFs for type II intron maturase, found in two Travisia spp. (GK625 and GK1736), are the characteristics of mitochondrial group II introns found in annelids16,17. Travisiid introns were inserted in the same position across species. They formed a monophyletic group, suggesting that an intron with an ORF was obtained in a common ancestor of Travisia and the ORF was subsequently lost in some travisiid species. We regarded travisiid introns as degenerate group II introns based on these lines of evidence. ORF-less introns have been found in bacteria21 and fungus49. Also, although the cox1 intron in the bivalve Cucullaea labiata15 is short (651 bp; positions 1184–1834 of KP091889) and lacks ORFs, it probably belongs to group II, considering the motifs at the 5′ (5′ GTGCG 3′) and 3′ ends (5′ AT 3′), and conserved regions suggested by the sequence logos (Fig. S3).

It is noteworthy that an intron was detected in all successfully sequenced travisiids in this study, considering that introns presumedly possess a high loss rate during speciation16. Richter et al.17 showed an absence of group II introns in Glycera nicobarica, which is closely related to G. fallax and G. unicornis (G. fallax, (G. nicobarica, G. unicornis)). The group II introns were probably obtained in a common ancestor of Travisia and have remained conserved (see above). Two possible scenarios explain the retention of the introns in Travisia spp.: (1) Travisia radiated rapidly, and thus had insufficient time to lose the intron from cox1. Indeed, the relatively small diversity of Travisiidae, with a single genus and about 40 described species, supports recent speciation of the group; (2) undetermined mechanisms help maintain the cox1 intron travisiid species. Unfortunately, it is difficult to test these hypotheses at this stage. The robust phylogenetic framework of travisiid species and knowledge of the mitochondrial intron's function are needed to further discuss the evolutionary history of the degeneration of the travisiid mitochondrial intron. Nevertheless, Travisia is a promising subject for studying the loss and gain of mitochondrial introns.

The introns of Travisia spp. were inserted within the “Folmer region” of the cox1 gene and this may have prevented amplification of cox1 due to short amplification times during PCR. Only seven sequences of the cox1 gene, which are obtained in DNA barcoding studies50,51, are available on GenBank: T. forbesii (HQ025027, HM904906, and MF121290) and T. pupa (HM473706–HM473709). However, the results of BLAST and alignment with scalibregmid sequences (MN217515 and JN256052) and T. sanrikuensis suggests that the cox1 sequences registered as belonging to Travisia are not likely derived from Travisia. The possibility of contamination of the cox1 sequences of Travisia in GenBank has been previously discussed (see the caption of Fig. 3 in Sun et al.52).

The phylogenetic relationships of leeches were contentious since the phylogenies based on several mitochondrial and nuclear genes were often incongruent53,54,55. Although phylogenomic studies with limited taxon sampling of annelids showed Rhynchobdellida as paraphyletic56,57, phylogenomic analysis based on anchored hybrid enrichment58 and transcriptomes28 with more taxon sampling revealed the monophyly of Rhynchobdellida. The high support for relationships among families in leeches in our results provides further support for the monophyly of Rhynchobdellida. On the other hand, the number of families in Arhynchobdellida represented by mitogenomes remains limited for proper phylogenomic studies. Therefore, further taxon sampling is needed to confirm the monophyly of hirudinean orders.

The relationships of polychaetes and clitellates ((Terebellida, Arenicolidae), clitellates) are consistent with previous phylogenomic studies25,26. The phylogenetic relationship within Terebellida is consistent with the recently published tree based on transcriptomes on Terebellida59 except for Melinnidae, whose mitogenome sequence is not included in this study. We confirmed the monophyly of Travisia, Terebellida, and Arenicolida (Fig. 5). The close relationship between Travisia, Arenicolida, and Terebellida was similar to the relationship (Scalibregmatidae, (Arenicolida, Terebellida)) in phylongeny based on 18S rRNA gene sequences60 and phylogenomics48, considering the sister relationship between Travisia and Scalibregmatidae44,45,46,47. Close relationships between Arenicolida and Scalibregmatidae + Travisia61 and Terebellida and Arenicolida25,26,60,62 has also been indicated previously. The morphological characters shared among the families in Arenicolida + Terebellida + Travisia (summarized in Rouse and Fauchald63, Appendix I and II) are also found in other lineages; therefore, no synapomorphy is known at this moment for this clade.

In the Travisia + Arenicolida + Terebellida clade, intra-familial molecular phylogenetic analyses have been conducted for Arenicolidae64,65, Maldanidae66, and Terebellida59,67. On the other hand, fewer than seven travisiid species have been included in a molecular phylogeny36,37,44,47, and intra-familial relationships are not yet sufficiently discussed. Travisia is one of the most interesting subjects for evolutionary study as they inhabit a wide range of water depths and show a variety of morphological characters such as branchiae34,35,41. A phylogenetic analysis using more travisiid species would shed light on their evolution and diversification patterns in annelids in the future.

Methods

Sampling and DNA extraction

A specimen of T. sanrikuensis (GK627) was collected from 1659–1684 m depth in the northwestern Pacific (the Sanriku region, Japan) at 39°17′N, 142°48–49′E with a beam trawl during the cruise KS-17-12 of R/V Shinsei-Maru. The specimen was previously used as the non-type specimen of T. sanrikuensis37. Total DNA was extracted from body wall tissue of the fixed specimen in 70% ethanol using a DNeasy Blood and Tissue Kit (QIAGEN, Hilden, Germany) in the previous study. Extracted DNA was stored in a freezer at − 30 °C.

Polymerase chain reaction and sequencing

Long PCR for the mitogenome of T. sanrikuensis was implemented following the method of Wu et al.68. A primer set for long PCR (Travi16SksF/Travi16SksR) (Table 4) was designed using the 16S rRNA sequence of T. sanrikuensis (GK627, GenBank accession number: LC566242). The PCR mixture for long PCR contained 14.0 μl of MilliQ water, 25.0 μl of 2 × Gflex PCR Buffer (TaKaRa, Shiga, Japan), 1.0 μl of 10 μM forward and reverse primers, 1.0 μl of Tks Gflex DNA Polymerase (TaKaRa), and 8.0 μl of template DNA solution. PCR amplification was performed as follows: 60 s at 94 °C for an initial denaturation, 36 cycles of 10 s at 98 °C, and 10 min at 68 °C. PCR product of > 15 kb in size was checked by electrophoresis in 1% agarose gel at 100 V for 40 min and then was used as a sample for next-generation sequencing. Bioengineering Lab. Co., Ltd., Japan, performed paired-end sequencing (2 × 151 bp) for the mitogenome ampliconusing an Illumina NextSeq 500 sequencer. Quality filtering for the sequences with a low-quality score (< 20) and short length (< 40) was performed using Sickle v1.3369.

Table 4 The primer sequences used in the present study.

A PCR primer LCO-annelid, which was modified from LCO149070, was designed from the cox1 gene sequences of annelids (see Table S1) and HCO219870 were used to amplify cox1 gene sequences of five Travisia spp. The PCR protocols for the cox1 amplification of Travisia spp. (see Table S2 for GenBank accessions numbers) using KOD One PCR Master Mix (Toyobo, Tokyo, Japan), which is high efficiency for extension (5 s/kb for a target in 1–10 kb length), followed Kobayashi et al.7 except that 35 cycles, an annealing temperature of 50 °C, and an extension step of 20 s were used instead.

Sequence analysis and gene annotation of the mitogenomes

Although the partial sequence of the 16S rRNA gene, which was not amplified by long PCR, was lacking in the NextSeq reads, a nearly complete mitogenome of T. sanrikuensis was assembled by NOVOPlasty v4.2.171. First, NOVOPlasty assembly using the 16S rRNA gene sequence (LC566242) as a seed sequence was conducted with kmer and read length set to 23 bp and 111 bp, respectively. Then, another assembly was conducted with kmer and read length set to 39 bp and 151 bp, respectively. The seed for this second assembly was a partial sequence from the merged contig from the previous assembly. The nearly complete mitogenome of T. sanrikuensis was determined manually by concatenating the merged contig from the NOVOPlasty assembly result and the 16S rRNA gene sequence (LC566242). The PCGs were identified using the MITOS web server72. The positions of tRNAs were determined by the MITOS web server and ARWEN73, implemented in ARAGORN74. The secondary structures of tRNAs were predicted using ARAGORN. The annotated mitogenome sequence and raw reads are deposited in the DNA Data Bank of Japan (DDBJ) with DDBJ/EMBL/GenBank accession number LC677172 and DRA013124, respectivelly. Compositional skews were calculated as follows: AT-skew = (A − T)/(A + T), GC-skew = (G − C)/(G + C).

Phylogenetic analysis based on mitogenomes

A preliminary phylogenetic analysis comprising the various lineages of annelid mitogenome sequences (149 OTUs) available from GenBank suggested that T. sanrikuensis is closely related to the clade of Arenicolida + Terebellida (Fig. S1 and Table S1). Based on this preliminary result and the results of a previous study26, 63 mitogenome sequences from a subset of Sedentaria (Arenicolida, Terebellida, echiurans, and clitellates), as well as two outgroups (Siboglinidae), were obtained from GenBank using the R package AnnotationBustR75 (Table 5). Outgroups were selected by referring to a review of annelid phylogeny33. Erpobdella octoculata (KC688270), Hirudinaria manillensis (KC688268), and Hirudo nipponia (KC667144) were indicated using double quotations and were excluded from discussion on phylogenetic relationships as Ye et al.76 suggested that species of these sequences were erroneously identified and should belong to Whitmania. DNA sequences of 13 PCGs were translated into amino acid (AA) sequences using the invertebrate mitochondrial genetic code with MEGA v7.0.2677. Alignment was performed using MAFFT v7 for AA sequences and two rRNA gene sequences (default parameters)78. PAL2NAL online service79 was used for codon alignments based on corresponding aligned AA sequences. Ambiguous positions were deleted with trimAl v.1.280 with the -gappyout option.

Table 5 Mitochondrial genome sequences used in this study. Bold indicates the sequence obtained in the present study.

Phylogenetic trees were reconstructed based on the concatenated dataset using Bayesian inference and ML methods. Bayesian analysis was performed using Phylobayes 4.181. Two parallel chains were made over 15,000 cycles using the CAT + GTR model. Convergence was automatically checked and terminated when maxdiff was < 1 and effective population size reached > 50 following the Phylobayes 4.1 manual. However, the run of AA dataset did not converge (maxdiff = 0.24 and effective population size < 50) after > 25,000 cycles and thus this tree was treated as supplementary data (Fig. S4). Phylogenetic trees using the ML method were reconstructed by IQ-TREE v1.6.1282 with 1000 ultrafast bootstrap replicates. Substitution models were selected with ModelFinder83 implemented in IQ-TREE. The resulting trees were edited using FigTree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/).

Intron analysis

In order to examine the phylogenetic relationships of the group II introns of Travisia and other annelids, phylogenetic analysis was conducted using a conserved region which consisted of domain V and subsequent sequences of the intron because the introns of Travisia spp. except for GK625 and GK1736 had no ORFs for putative proteins (i.e., reverse transcriptase or intron maturase). The cox1 intron in the bivalve Cucullaea labiata15 was identified as group II in this study (see “Discussion”) and was included in the dataset. To find ORFs in the Travisia spp. intron, NCBI ORFfinder (https://www.ncbi.nlm.nih.gov/orffinder/) was used; then, all identified ORFs were used for searching protein domains in the Pfam-A collection of protein families by PfamScan (https://www.ebi.ac.uk/Tools/pfa/pfamscan/)84. A dataset for the phylogenetic analysis was built based on previous studies16,17,18,85, as shown in Table S2 and Dataset S1. Mfold web server online application RNA Folding Form V2.3 (http://www.unafold.org/mfold/applications/rna-folding-form-v2.php)86 was used to search the secondary structures of domain V and VI. The dataset was aligned using MAFFT with default options (resulted in 228 characters). The ML analysis was conducted by the same methods as mentioned above. The outgroup Tetradesmus obliquus (as Scenedesmus obliquus in Richter et al.17) was selected based on Richter et al. In total, 64 partial sequences of the group II intron were used for phylogenetic analysis because TreeShrink v1.3.987 identified the Clostridium difficile sequence as a long branch, and it was excluded from the final dataset.

Sequences logos88 of the intron sequences, whose positions with gaps ≥ 20% were excluded, were generated using WebLogo89 to visualize the frequency of nucleotides of each position in the dataset. The sequence logos of introns of Travisia, except for GK1732 and GK1734 whose introns were not fully determined, were also created.