Mitogenome of a stink worm (Annelida: Travisiidae) includes degenerate group II intron that is also found in five congeneric species

Mitogenomes are useful for inferring phylogenetic relationships between organisms. Although the mitogenomes of Annelida, one of the most morphologically and ecologically diverse metazoan groups have been well sequenced, those of several families remain unexamined. This study determined the first mitogenome from the family Travisiidae (Travisia sanrikuensis), analyzed its mitogenomic features, and reconstructed a phylogeny of Sedentaria. The monophyly of the Terebellida + Arenicolida + Travisiidae clade is supported by molecular phylogenetic analysis. The placement of Travisiidae is unclear because of the lack of mitogenomes from closely related lineages. An unexpected intron appeared within the cox1 gene of T. sanrikuensis and in the same positions of five undescribed Travisia spp. Although the introns are shorter (790–1386 bp) than other group II introns, they can be considered degenerate group II introns due to type II intron maturase open reading frames, found in two of the examined species, and motifs characteristic of group II introns. This is likely the first known case in metazoans where mitochondrial group II introns obtained by a common ancestor are conserved in several descendants. Insufficient evolutionary time for intron loss in Travisiidae, or undetermined mechanisms may have helped maintain the degenerate introns.

. Phylogenetic relationships from a subset of Sedentaria modified from a metatree regarded as a working hypothesis for future studies by Struck 33  www.nature.com/scientificreports/ read length, a merged contig longer than 500 bp was obtained only with kmer and read length set to 23 bp and 111 bp, respectively. A region from the merged contig showed moderately high homology (785 bp, max score 250, total score 665) to the nad5 gene of Glycera cf. tridactyla (KT989327) during a BLAST homology search. A partial sequence (192 bp) from the predicted nad5 gene in the initial T. sanrikuensis contig, which aligned with the nad5 gene of G. cf. tridactyla (KT989327) from position 6219-6410, was used as a seed sequence for a subsequent assembly. The resulting merged contig was 17,390 bp in length. Both ends of the contig had a consensus sequence larger than 100 bp, with both ends of the 16S rRNA gene sequence used as the initial seed (LC566242).
Although the circular mitogenome of T. sanrikuensis was recovered by concatenating the contig and 16S rRNA gene sequence (LC566242), a dubious control region (> 2000 bp) between the nad5 and trnR genes, which includes tRNAs encoded on "−" strand and a long palindrome like sequence (a nearly perfect inverted repeat of > 600 bp), was present. This control region should be confirmed by polymerase chain reaction (PCR) but PCR failed to amplify a target including the control region and therefore the nearly complete mitogenome sequence (15,854 bp), excluding the control region, was registered (LC677172).
Mitochondrial genome organization. The mitogenome sequence includes 13 PCGs (atp6 and 8, cox1-3, cytb, nad1-6 and nad4l), 22 tRNA genes (one for each of the amino acids except for trnL and trnS), two rRNA genes [small ribosomal RNA (rrnS or 12S rRNA) and large ribosomal RNA (rrnL or 16S rRNA)] ( Fig. 2 and Table 1). All determined genes were encoded on the "+" strand ( Fig. 2). Both AT-skew and GC-skew of all genes, except for AT-skew of rrnS, were negative, indicating that T and C outnumber A and G, respectively (Table 2). Predicted secondary structures of tRNAs showed that the thymidine loops of trnD, trnM, and trnI and the dihydrouridine loop of trnK were reduced by 3 bp (Dataset S2). Dihydrouridine stem was lost in trnS1 (Dataset S2). Figure 3 shows the gene order of T. sanrikuensis and the putative ancestral gene order of PCGs. The gene order was identical to the order commonly found among Errantia and Sedentaria. The gene order, including determined tRNAs, was almost identical to the putative ancestral gene order of Sedentaria, which is known for oligochaetes, leeches, and Siboglinidae 31,32 but the order of trnR and trnH diferred between T. sanrikuensis and the ancestors of Sedentaria.   [16][17][18] . The introns were inserted at the same positions in all specimens of Travisia. Sequence logos identified several conservative regions (Fig. S2).

Discussion
We determined the nearly complete mitogenome sequence of a species from Travisiidae for the first time. Unexpectedly, an intron of a relatively short length (882 bp) was identified in the cox1 gene of T. sanrikuensis. Introns were also found in five undescribed travisiid species using Sanger sequencing. All determined travisiid introns in the mitochondrial cox1 gene (ranging from 790-1386 bp) were shorter than known cox1 introns found in www.nature.com/scientificreports/ Annelida, i.e., 1819 bp in Nephtys sp., 2357-2468 bp in Glycera spp., and 1647 bp in Decemunciger sp. The introns of travisiid species included motifs (beginning with 5′ GCGCG 3′ and ending with 5′ AY 3′) and domains V and VI that are characteristics of group II introns. Also, the ORFs for type II intron maturase, found in two Travisia spp. (GK625 and GK1736), are the characteristics of mitochondrial group II introns found in annelids 16,17 . Travisiid introns were inserted in the same position across species. They formed a monophyletic group, suggesting that an intron with an ORF was obtained in a common ancestor of Travisia and the ORF was subsequently lost in some travisiid species. We regarded travisiid introns as degenerate group II introns based on these lines of evidence. ORF-less introns have been found in bacteria 21 and fungus 49 . Also, although the cox1 intron in the bivalve Cucullaea labiata 15 is short (651 bp; positions 1184-1834 of KP091889) and lacks ORFs, it probably belongs to group II, considering the motifs at the 5′ (5′ GTGCG 3′) and 3′ ends (5′ AT 3′), and conserved regions suggested by the sequence logos (Fig. S3). It is noteworthy that an intron was detected in all successfully sequenced travisiids in this study, considering that introns presumedly possess a high loss rate during speciation 16 . Richter et al. 17 showed an absence of group II introns in Glycera nicobarica, which is closely related to G. fallax and G. unicornis (G. fallax, (G. nicobarica, G.  unicornis)). The group II introns were probably obtained in a common ancestor of Travisia and have remained conserved (see above). Two possible scenarios explain the retention of the introns in Travisia spp.: (1) Travisia radiated rapidly, and thus had insufficient time to lose the intron from cox1. Indeed, the relatively small diversity of Travisiidae, with a single genus and about 40 described species, supports recent speciation of the group; (2) undetermined mechanisms help maintain the cox1 intron travisiid species. Unfortunately, it is difficult to test these hypotheses at this stage. The robust phylogenetic framework of travisiid species and knowledge of the mitochondrial intron's function are needed to further discuss the evolutionary history of the degeneration of the travisiid mitochondrial intron. Nevertheless, Travisia is a promising subject for studying the loss and gain of mitochondrial introns.
The introns of Travisia spp. were inserted within the "Folmer region" of the cox1 gene and this may have prevented amplification of cox1 due to short amplification times during PCR. Only seven sequences of the cox1 gene, which are obtained in DNA barcoding studies 50,51 , are available on GenBank: T. forbesii (HQ025027, HM904906, and MF121290) and T. pupa (HM473706-HM473709). However, the results of BLAST and alignment with scalibregmid sequences (MN217515 and JN256052) and T. sanrikuensis suggests that the cox1 sequences registered as belonging to Travisia are not likely derived from Travisia. The possibility of contamination of the cox1 sequences of Travisia in GenBank has been previously discussed (see the caption of Fig. 3 in Sun et al. 52 ). www.nature.com/scientificreports/ The phylogenetic relationships of leeches were contentious since the phylogenies based on several mitochondrial and nuclear genes were often incongruent [53][54][55] . Although phylogenomic studies with limited taxon sampling of annelids showed Rhynchobdellida as paraphyletic 56,57 , phylogenomic analysis based on anchored hybrid enrichment 58 and transcriptomes 28 with more taxon sampling revealed the monophyly of Rhynchobdellida. The high support for relationships among families in leeches in our results provides further support for the monophyly of Rhynchobdellida. On the other hand, the number of families in Arhynchobdellida represented by mitogenomes remains limited for proper phylogenomic studies. Therefore, further taxon sampling is needed to confirm the monophyly of hirudinean orders.
The relationships of polychaetes and clitellates ((Terebellida, Arenicolidae), clitellates) are consistent with previous phylogenomic studies 25,26 . The phylogenetic relationship within Terebellida is consistent with the recently published tree based on transcriptomes on Terebellida 59 except for Melinnidae, whose mitogenome sequence is not included in this study. We confirmed the monophyly of Travisia, Terebellida, and Arenicolida (Fig. 5). The close relationship between Travisia, Arenicolida, and Terebellida was similar to the relationship (Scalibregmatidae, (Arenicolida, Terebellida)) in phylongeny based on 18S rRNA gene sequences 60 and phylogenomics 48 , considering the sister relationship between Travisia and Scalibregmatidae [44][45][46][47] . Close relationships between Arenicolida and Scalibregmatidae + Travisia 61 and Terebellida and Arenicolida 25,26,60,62 has also been indicated previously. The morphological characters shared among the families in Arenicolida + Terebellida + Travisia (summarized in Rouse and Fauchald 63 , Appendix I and II) are also found in other lineages; therefore, no synapomorphy is known at this moment for this clade.
In the Travisia + Arenicolida + Terebellida clade, intra-familial molecular phylogenetic analyses have been conducted for Arenicolidae 64,65 , Maldanidae 66 , and Terebellida 59,67 . On the other hand, fewer than seven travisiid species have been included in a molecular phylogeny 36,37,44,47 , and intra-familial relationships are not yet sufficiently discussed. Travisia is one of the most interesting subjects for evolutionary study as they inhabit a wide range of water depths and show a variety of morphological characters such as branchiae 34,35,41 . A phylogenetic analysis using more travisiid species would shed light on their evolution and diversification patterns in annelids in the future.

Methods
Sampling and DNA extraction. A specimen of T. sanrikuensis (GK627) was collected from 1659-1684 m depth in the northwestern Pacific (the Sanriku region, Japan) at 39°17′N, 142°48-49′E with a beam trawl during the cruise KS-17-12 of R/V Shinsei-Maru. The specimen was previously used as the non-type specimen of T. sanrikuensis 37 . Total DNA was extracted from body wall tissue of the fixed specimen in 70% ethanol using a DNeasy Blood and Tissue Kit (QIAGEN, Hilden, Germany) in the previous study. Extracted DNA was stored in a freezer at − 30 °C.
Polymerase chain reaction and sequencing. Long PCR for the mitogenome of T. sanrikuensis was implemented following the method of Wu et al. 68 . A primer set for long PCR (Travi16SksF/Travi16SksR) ( Table 4) was designed using the 16S rRNA sequence of T. sanrikuensis (GK627, GenBank accession number: LC566242). The PCR mixture for long PCR contained 14.0 μl of MilliQ water, 25.0 μl of 2 × Gflex PCR Buffer (TaKaRa, Shiga, Japan), 1.0 μl of 10 μM forward and reverse primers, 1.0 μl of Tks Gflex DNA Polymerase (TaKaRa), and 8.0 μl of template DNA solution. PCR amplification was performed as follows: 60 s at 94 °C for an initial denaturation, 36 cycles of 10 s at 98 °C, and 10 min at 68 °C. PCR product of > 15 kb in size was checked by electrophoresis in 1% agarose gel at 100 V for 40 min and then was used as a sample for next-generation sequencing. Bioengineering Lab. Co., Ltd., Japan, performed paired-end sequencing (2 × 151 bp) for the mitogenome ampliconusing an Illumina NextSeq 500 sequencer. Quality filtering for the sequences with a low-quality score (< 20) and short length (< 40) was performed using Sickle v1.33 69 .
A PCR primer LCO-annelid, which was modified from LCO1490 70 , was designed from the cox1 gene sequences of annelids (see Table S1) and HCO2198 70 were used to amplify cox1 gene sequences of five Travisia spp. The PCR protocols for the cox1 amplification of Travisia spp. (see Table S2 for GenBank accessions numbers) using KOD One PCR Master Mix (Toyobo, Tokyo, Japan), which is high efficiency for extension (5 s/kb for a target in 1-10 kb length), followed Kobayashi et al. 7 except that 35 cycles, an annealing temperature of 50 °C, and an extension step of 20 s were used instead.

Sequence analysis and gene annotation of the mitogenomes.
Although the partial sequence of the 16S rRNA gene, which was not amplified by long PCR, was lacking in the NextSeq reads, a nearly complete Table 4. The primer sequences used in the present study. a F forward, R reverse. b L long PCR, P PCR, S sequencing.

Locus
Primer Sequence (5′-3′) Direction a Usage b Reference Trav16SksF  CTA ATC CTC CTT AAG AGC CCA TAT TGA CAG G  F  L  This study   Trav16SksR  TTA CTT TAG AGA CAG ATG GGC CTT CGT TTA TCC R  L  This study   cox1  LCO-annelid CTC AAC WAA YCA YAA AGA YAT TGG  F  P/S  This study   HCO2198  TAA ACT TCA GGG TGA CCA AAA AAT 71 . First, NOVOPlasty assembly using the 16S rRNA gene sequence (LC566242) as a seed sequence was conducted with kmer and read length set to 23 bp and 111 bp, respectively. Then, another assembly was conducted with kmer and read length set to 39 bp and 151 bp, respectively. The seed for this second assembly was a partial sequence from the merged contig from the previous assembly. The nearly complete mitogenome of T. sanrikuensis was determined manually by concatenating the merged contig from the NOVOPlasty assembly result and the 16S rRNA gene sequence (LC566242). The PCGs were identified using the MITOS web server 72

Phylogenetic analysis based on mitogenomes.
A preliminary phylogenetic analysis comprising the various lineages of annelid mitogenome sequences (149 OTUs) available from GenBank suggested that T. sanrikuensis is closely related to the clade of Arenicolida + Terebellida ( Fig. S1 and Table S1). Based on this preliminary result and the results of a previous study 26 , 63 mitogenome sequences from a subset of Sedentaria (Arenicolida, Terebellida, echiurans, and clitellates), as well as two outgroups (Siboglinidae), were obtained from GenBank using the R package AnnotationBustR 75 (Table 5). Outgroups were selected by referring to a review of annelid phylogeny 33 . Erpobdella octoculata (KC688270), Hirudinaria manillensis (KC688268), and Hirudo nipponia (KC667144) were indicated using double quotations and were excluded from discussion on phylogenetic relationships as Ye et al. 76 suggested that species of these sequences were erroneously identified and should belong to Whitmania. DNA sequences of 13 PCGs were translated into amino acid (AA) sequences using the invertebrate mitochondrial genetic code with MEGA v7.0.26 77 . Alignment was performed using MAFFT v7 for AA sequences and two rRNA gene sequences (default parameters) 78 . PAL2NAL online service 79 was used for codon alignments based on corresponding aligned AA sequences. Ambiguous positions were deleted with trimAl v.1.2 80 with the -gappyout option. Phylogenetic trees were reconstructed based on the concatenated dataset using Bayesian inference and ML methods. Bayesian analysis was performed using Phylobayes 4.1 81 . Two parallel chains were made over 15,000 cycles using the CAT + GTR model. Convergence was automatically checked and terminated when maxdiff was < 1 and effective population size reached > 50 following the Phylobayes 4.1 manual. However, the run of AA dataset did not converge (maxdiff = 0.24 and effective population size < 50) after > 25,000 cycles and thus this tree was treated as supplementary data (Fig. S4). Phylogenetic trees using the ML method were reconstructed by IQ-TREE v1.6.12 82

Intron analysis.
In order to examine the phylogenetic relationships of the group II introns of Travisia and other annelids, phylogenetic analysis was conducted using a conserved region which consisted of domain V and subsequent sequences of the intron because the introns of Travisia spp. except for GK625 and GK1736 had no ORFs for putative proteins (i.e., reverse transcriptase or intron maturase). The cox1 intron in the bivalve Cucullaea labiata 15 was identified as group II in this study (see "Discussion") and was included in the dataset. To find ORFs in the Travisia spp. intron, NCBI ORFfinder (https:// www. ncbi. nlm. nih. gov/ orffi nder/) was used; then, all identified ORFs were used for searching protein domains in the Pfam-A collection of protein families by Pfam-  www.nature.com/scientificreports/ Scan (https:// www. ebi. ac. uk/ Tools/ pfa/ pfams can/) 84 . A dataset for the phylogenetic analysis was built based on previous studies [16][17][18]85 , as shown in Table S2 and Dataset S1. Mfold web server online application RNA Folding Form V2.3 (http:// www. unafo ld. org/ mfold/ appli catio ns/ rna-foldi ng-form-v2. php) 86 was used to search the secondary structures of domain V and VI. The dataset was aligned using MAFFT with default options (resulted in 228 characters). The ML analysis was conducted by the same methods as mentioned above. The outgroup Tetradesmus obliquus (as Scenedesmus obliquus in Richter et al. 17 ) was selected based on Richter et al. In total, 64 partial sequences of the group II intron were used for phylogenetic analysis because TreeShrink v1.3.9 87 identified the Clostridium difficile sequence as a long branch, and it was excluded from the final dataset. Sequences logos 88 of the intron sequences, whose positions with gaps ≥ 20% were excluded, were generated using WebLogo 89 to visualize the frequency of nucleotides of each position in the dataset. The sequence logos of introns of Travisia, except for GK1732 and GK1734 whose introns were not fully determined, were also created.

Data availability
Assembled mitogenome, raw reads, and cox1 sequences are available on NCBI repository (LC677172, LC670759-LC670765, DRA013124, PRJDB12599, SAMD00424219). Raw datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.