Novel gene rearrangement in the mitochondrial genome of Muraenesox cinereus and the phylogenetic relationship of Anguilliformes

The structure and gene sequence of the fish mitochondrial genome are generally considered to be conservative. However, two types of gene arrangements are found in the mitochondrial genome of Anguilliformes. In this paper, we report a complete mitogenome of Muraenesox cinereus (Anguilliformes: Muraenesocidae) with rearrangement phenomenon. The total length of the M. cinereus mitogenome was 17,673 bp, and it contained 13 protein-coding genes, two ribosomal RNAs, 22 transfer RNA genes, and two identical control regions (CRs). The mitochondrial genome of M. cinereus was obviously rearranged compared with the mitochondria of typical vertebrates. The genes ND6 and the conjoint trnE were translocated to the location between trnT and trnP, and one of the duplicated CR was translocated to the upstream of the ND6. The tandem duplication and random loss is most suitable for explaining this mitochondrial gene rearrangement. The Anguilliformes phylogenetic tree constructed based on the whole mitochondrial genome well supports Congridae non-monophyly. These results provide a basis for the future Anguilliformes mitochondrial gene arrangement characteristics and further phylogenetic research.


Results and discussion
Genome structure and composition. The complete mitochondrial genome of M. cinereus is 17,673 bp (GenBank accession number MT571331), which is within the published length range of the bony fish's mitochondrial genome (16,369 bp) ( Table 1). The structure of the moray mitochondrial genome was different from other bony fishes, it includes 13 PCGs, 22 tRNAs, two rRNAs (12S and 16S rRNA), a light chain replication source (O L ) and two control-region s (CR) (Fig. 1). And the gene rearrangement was different from some other fish mitochondria. To be precise, ND6 binding trnE was transferred between tRNAT and tRNAP, and a replicated CR was transferred upstream of the ND6 gene ( Fig. 1, Table 2). In the vertebrate mitochondrial genome, the presence of replicated CR was considered a special feature [46][47][48] . The base composition of M. inereus mitochondrial genome was: A = 32.1%, T = 27.6%, C = 23.8% and G = 16.6%, respectively (Table 3). Overall, the mitotic genome was very compact. However, 89 base pairs of 12 gene spacers were found in the mitochondrial genome of moray eel, ranging in length from 1 to 35 bp. Most of the gaps were found in the area where rearrangement occurred, including 35 bp between Cytb and trnT, 4 bp between trnT and CR1, 1 bp between trnE and trnP, and 23 bp between CR2 and trnP. The AT-skew of the mitochondrial genome was positive and the GC-skew was negative, respectively 0.076 and -0.179, indicating that As and Cs are more abundant than Ts and Gs.
PCGs and codon usage. The total length of the 13 PCGs in the M. cinereus mitochondrial genes were 11,454 bp, and they encode 3,818 amino acids. Genes encoding 13 proteins include seven NADH dehydrogenases (ND1-ND6 and ND4L), three cytochrome c oxidases (COI-COII), two ATPases (ATP6 and ATP8) and one cytochrome b (cytb). The 13 PCGs range in size from 168 bp (ATP8) to 1857 bp (ND5) ( Table 3). Like the typical mitochondrial genome of vertebrates 49,50 , there are twelve genes in the H-strand and only ND6 genes in the L-strand.
The initiation codon of the 13 protein-coding genes used the typical initiation codon ATG, except for GTG for the COI gene. Seven PCGs (ND1, COI, ATP8, ATP6, COIII, ND4L, and ND5) were terminated with a stop codon TAA, three (ND2, ND3, and ND6) were terminated with TAG, and one (Cyt b) was terminated with AGA, in addition, ND4 And COII terminate with an incomplete nucleotide T-( Table 2). This result was very similar to that of Lu et al 13  www.nature.com/scientificreports/ a common phenomenon [51][52][53] . For genes with TAA as the stop codon, one of the most common interpretations is that produced by polyadenylation after transcription 54 . In the 13 PCGS, the values of COI, ATP6, COII, ND3, and ND4 of the AT-skew and GC-skew are negative, and the rest is both positive and GC-skew is negative (Table 3). According to the codon degeneracy pattern, the amino acids serine and leucine are encoded by six synonymous codons, and the remaining amino acids are encoded by four or two codons. This result also appears in the research results of Vandana et al 55 . The most used amino acids are Leu (15.82%), Ile (8.07%) and Ala (7.96%), the few most used amino acids are Asp (2.12%), Arg (1.94%) and Cys (0.84%) (Fig. 2a). The relative synonymous www.nature.com/scientificreports/ codon usage (RSCU) value for M. cinereus for the third position is shown in Fig. 2b. The usage of both two-and four-fold degenerate codons was biased toward the use of codons abundant in A, while there was an overall bias against G.
Transfer RNAs, ribosomal RNAs and CR. There were also 22 tRNAs in the mitochondrial genes of M.
cinereus, like those of other vertebrates. Of these 22 tRNAs, 14 tRNAs were encoded on the heavy chain and the remaining eight (tRNA-Gln, tRNA-Ala, tRNA-Asn, tRNA-Cys, tRNA-Tyr, tRNA-Ser, tRNA-Glu and tRNA-Pro) were encoded on the light chain ( Fig. 1). Among the 22 tRNAs, except that tRNA-Ser UCA lacks the entire dihydrouridine arm, all other tRNAs have a typical clover structure (Fig. 3). In this case, tRNA-Ser lacks the dihydrouridine arm and is often seen in the mitochondrial of other vertebrates 13,52 . The length of M. cinereus mitochondrial tRNA is 1564 bp, and the length of 22 tRNA is between 66 and 76 bp; the base composition was A = 32.0%, T = 25.3%, C = 23.1% and G = 19.6% (Tables 1, 3). The AT and GC skew values of tRNA genes were 0.116 and − 0.081, respectively, which indicated that As and Cs were more abundant than Ts and Gs. The origin of light chain replication (OL) was usually located within a WANCY cluster, approximately two-thirds of the genomic distance from CR, and can fold into a stable stem ring secondary structure. Among the 22 tRNAs encoding 20 amino acids, two amino acids have two anti-codon sites, these two amino acids were Ser and Leu, and their anticodons were TGA/GCT and TAA/TAG, respectively. The small coding subunit (12S rRNA) and large coding subunit (16S rRNA) appeared on both sides with trnF and trnL (UUA) , which were located on the H-chain and separated by the trnV gene. The two rRNA genes are 2,668 bp in total length, and the base composition is A = 35.1%, T = 21.7%, C = 22.6%, and G = 20.6%. The AT-skew value was positive (0.236) and the GC-skew www.nature.com/scientificreports/ value was negative (− 0.047), which indicates that there were more adenine and cytosine nucleotides in rRNAs ( Table 3). The lengths of the two CRs were 902 bp and 944 bp, respectively, and the total length was 1846 bp, of which the ratio of AT was 66.3% (Tables 1, 3). The AT ratio of the CR region is higher than that of other parts of mitochondrial genes, so the CR region is also called the "AT-rich region", which was also common in the mitochondria of other fish. Both AT and GC skew values were 0.092 and − 0.066, indicating that the number of adenine and cytosine nucleotides was higher than that of thymine and guanine nucleotides. The palindrome sequence motifs "TACAT" and "ATGTA" related to the termination of heavy chain replication were found in both CRs, and had been reported in other study 56 (Fig. 4).
Gene rearrangement. Compared with the gene arrangement in the vertebrate mitochondrial genome, the gene order in the moray mitochondrial genome obviously rearranged 57 (Fig. 1). The position of the three genes (ND6, trnE and CR) in the moray M. cinereus mitochondria had changed. In general, ND6 and the bound trnE www.nature.com/scientificreports/ were located between the ND5 and Cytb genes, and only one CR region was located at the end of the mitochondrial genome. However, the position of ND6 and trnE in the mitochondrial genome of M. cinereus changed, and a CR was copied (Fig. 5). In this study, except for ND6, trnE translocation and CR repeat, the remaining gene sequence was the same as the gene sequence in the vertebrate mitochondrial genome (Fig. 5).
How did the rearrangement occur in the M. cinereus mitochondrial genome? Considering the gene rearrangement model previously described in this study, we must consider which model is best suited to explain the gene rearrangement observed in the M. cinereus mitochondrial genome. Recombination model was only suitable for the exchange and reversal of small fragments, and this model was relatively rare in the mitochondrial genome 58,59 . Therefore, this recombination model is not suitable for explaining the mitochondrial gene rearrangement of M. cinereus. Regarding the TDNL and DRRL models, these two models were most used in genome rearrangement, where genes were clustered with the same polarity (L-or H-chain coding) and their relative order has not changed 27,43 . Therefore, the two models are also not suitable to explain the phenomenon of molecular rearrangement of M. cinereus. This tandem duplication and random loss (TDRL) model explains well the rearrangement of genes with redundant genes. The TDRL model was due to the incomplete deletion of repeated genes leading to the existence of intergenic spacers or pseudogenes 23,60,61 . Therefore, in this study, the TDRL model was suggested for rearrangement events, because gene rearrangements with repeated CRs were observed in the mitochondrial genome of M. cinereus, as described previously in the mitochondrial genome of parthenogenetic lizards 62 . The evidence of the TDRL model was indicated by the presence of pseudogenes or duplicate genes and the position of the gene spacer. There are four intervals in the rearrangement region of the mitochondrial genome of moray eel, located between ND5 and Cytb, trnT and CR1, ND6 and trnE, trnE and trnP, this phenomenon provides the basis for this model (Table 2).
Based on the principle of parsimony, we made assumptions about the region of the moray mitochondrial genome rearrangement, assuming the intermediate steps of gene rearrangement as follows. First, after the normal mitochondrial gene block (ND6-trnE-Cyt b-trnT-trnP-CR) is completely replicated, a gene block (ND6-trnE-Cyt b-trnT-trnP-CR-ND6-trnE-Cyt b-trnT-trnP-CR) (Fig. 5a,b). Then, after consecutive copies, other genes (ND6, trnE, Cyt b, trnT and trnP) were randomly lost in addition to CR. After the copied genes are randomly lost, the positions of the two genes in the red dotted box and the two genes in the cyan dotted box in Fig. 5c are swapped, and then inserted after the two genes in the red dotted box CR2. Therefore, after such copying and random loss, both CRs are preserved. We speculate that both CRs retain their original functions. In other animal mitochondrial genomes, similar hypotheses have been proposed 47 .
Phylogenetic analysis. To further study the evolutionary status of M. cinereus in the Anguillaridae, we selected 14 closely related families and two outgroups (Saccopharynx lavenbergi and Eurypharynx pelecanoides 63 ) to construct evolutionary trees (BI and ML) to analyze phylogenetic relationships. After removing highly differentiated regions, a phylogenetic tree was constructed with 10,987 bp sequence. The results show that the topological structure of the ML tree and the BI tree are basically the same. Therefore, we merge two trees together to form a tree. In addition, the BI tree has a higher support value than the ML tree (Fig. 6) 8 also had mitochondrial genes lacking ND6 and trnE, which were also clustered, which was also consistent with our results. In this study, we found an interesting phenomenon. The five families with gene rearrangement at the bottom formed a www.nature.com/scientificreports/ branch, and those without gene rearrangement formed a separate branch at the top of the tree (The cyan ellipse in Fig. 6 indicates that no gene rearrangement had occurred; the red ellipse indicated that gene rearrangement www.nature.com/scientificreports/ had occurred). These results indicate that the origin of eels is a diverse evolution. If the new gene sequence originates from a single ancestral species in the Congroidei suborder, then this pattern of presence/absence may be a good phylogenetic marker for identifying monoline populations as Kumazawa and Nishida 68 and Macey et al 18 suggested, they had a higher vertebrate relationship. Therefore, more advanced methods can be considered to classify the controversial Anguilliformes species. There are still some phylogenetic mismatches based on morphological and molecular data, so more eel mitochondrial genomes should be sequenced to support this hypothesis in future research.

Conclusion
With the advancement of genetics research, many people believe that the richness of molecular information is superior to morphological data, and molecular analysis has become the most commonly used methods in the study of biological system development. Therefore, in this study, we sequenced and assembled the complete  www.nature.com/scientificreports/ mitochondrial gene of M. cinereus and described its characteristics, which contains 37 genes and two control regions. After comparing with the typical vertebrate mitochondrial genes, we found that M. cinereus mitochondrial gene rearrangement obviously occurred, the rearrangement part of the gene ND6 and trnE were transferred between trnT and trnP, accompanied by CR repeat. The most suitable model to explain this rearrangement phenomenon is the duplication-random loss model. The two phylogenetic trees (BI and ML) created using the mitochondrial genomes from 46 Anguilliformes were basically consistent with previous molecular studies on the interrelationships between the main Anguilliform lineages and different families, although some of these families were slightly related different. Both phylogenetic trees strongly support the non-monophyly of Congridae, providing a basis for the more advanced classification of Anguilliformes. In addition, our research results provide a theoretical basis for in-depth understanding of the mechanism and evolution of M. cinereus gene rearrangement and phylogenetic studies of eel.   75 . The 13 PCGs used did not include ND6 and were not used because of their heterogeneous base composition and consistent poor phylogenetic performance 33,76 . Sequences were aligned with default parameters using Clustal X 2.0 77 , and manually checked using BioEdit 78 . Use software Gblock 79 to eliminate ambiguous sequences. Substitution vs. the Tamura-Nei (TN93) genetic distance in pairwise comparisons was used to test for substitution saturation using DMABE version 7.2.3 75 . The third codon position shows significant saturation (Fig. S1), so it is only defined as purine and pyrimidine. Based on Bayesian inference (BI) and maximum likelihood (ML) two methods, using PhyML 80 and MrBayes 3.2.6 81 software for phylogenetic analyses. Based on the Akaike information criteria (AIC), parallel uses Modeltest 3.7 82 of 56 models for ML analysis, and MrModeltest 2.2 83 of 24 models for BI analysis to determine the best evolutionary model and points out that GTR + G + I was the analysis dataset of best-fit alternative models. Perform Bootstrap analysis (1,000 repetitions) to assess the relative level of support for ML analysis 84,85 . Use "Lset" and "Prset" for Bayesian phylogenetic analysis and allow the program to converge to the best estimate of model parameters. Other parameter settings are as follows: each Markov chain starts from a random tree, runs for 2 million generations, and samples a tree every 100 generations (a total of 20,000 trees are sampled) to ensure the independence of the samples. In order to improve the mixing ability of the Markov chain, the Metropolis coupling Markov chain Monte Carlo (MCMCMC) method was used, and three heating chains (temperature = 0.5) and one cold chain were operated simultaneously. To guarantee the stationarity had been reached, the average standard deviation of split frequencies was set below 0.01. www.nature.com/scientificreports/