Different gene rearrangements of the genus Dardanus (Anomura: Diogenidae) and insights into the phylogeny of Paguroidea

Complete mitochondrial genomes (mitogenomes) can provide useful information for phylogenetic relationships, gene rearrangement, and molecular evolution. In this study, the complete mitogenomes of two hermit crabs, Dardanus arrosor and Dardanus aspersus, were sequenced for the first time and compared with other published mitogenomes of Paguroidea. Each of the two mitogenomes contains an entire set of 37 genes and a putative control region, but they display different gene arrangements. The different arrangements of the two mitogenomes might be the result of transposition, reversal, and tandem duplication/random loss events from the ancestral pancrustacean pattern. Genome sequence similarity analysis reveals the gene rearrangement in 15 Paguroidea mitogenomes. After synteny analysis between the 15 Paguroidea mitogenomes, an obvious rearranged region is found in D. aspersus mitogenome. Across the 13 protein-coding genes (PCGs) tested, COI has the least and ND6 has the largest genetic distances among the 15 hermit crabs, indicating varied evolution rates of PCGs. In addition, the dN/dS ratio analysis shows that all PCGs are evolving under purifying selection. The phylogenetic analyses based on both gene order and sequence data present the monophyly of three families (Paguridae, Coenobitidae, and Pylochelidae) and the paraphyly of the family Diogenidae. Meanwhile, the phylogenetic tree based on the nucleotide sequences of 13 PCGs shows that two Dardanus species formed a sister group with five Coenobitidae species. These findings help to better understand the gene rearrangement and phylogeny of Paguroidea, as well as provide new insights into the usefulness of mitochondrial gene order as a phylogenetic marker.


Materials and methods
Sampling, DNA extraction, mitogenome sequencing, and assembly. Specimens of D. arrosor and D. aspersus were collected from Zhoushan Province, China (29° 45′ 32″ N, 121° 45′ 30″ E). Specimens were immediately preserved in 95% ethanol until DNA extraction. The SQ Tissue DNA Kit (OMEGA) was used to extract the total genomic DNA from muscle tissue following the manufacturer's instructions. The genomic DNA was sent to Shanghai Origingene Biopharm Technology Co., Ltd. for library preparation and high-throughput sequencing. The libraries were constructed by using the VAHTS Universal Plus DNA Library Prep Kit, with an insert size of 150 bp. Paired-end sequencing with a read length of 150 bp was performed on an Illumina Hiseq 6000 platform. Adapters and low-quality bases were removed using cutadapt v1. 16 33 with the following parameters: -q 20 -m 20. Trimmed reads shorter than 50 bp were discarded. Quality control of raw and trimmed reads was performed using FastQC v0.11.5 (http:// www. bioin forma tics. babra ham. ac. uk/ proje cts/ fastqc/). The filtered clean data were assembled and mapped to complete mitogenome sequence using NOVOPlasty v2.7.2 34 .
Mitogenome annotation and sequence analysis. The newly assembled mitogenomes of D. arrosor and D. aspersus were annotated using the software of Sequin (version 15.10, http:// www. ncbi. nlm. nih. gov/ Sequin/). The boundaries of protein-coding and ribosomal RNA genes were performed using NCBI-BLAST (http:// blast. ncbi. nlm. nih. gov). Transfer RNA genes were manually plotted, according to the secondary structure predicted by the MITOS Web Server 35 and tRNAscan-SE 1.21 36 . The control region was determined by the locations of adjacent genes. Finally, circular mitogenome maps of D. arrosor and D. aspersus were drawn with the BLAST Ring Image Generator (BRIG) v0.95 37 .
The base composition and relative synonymous codon usage (RSCU) were obtained using MEGA X 38 . The strand asymmetry was calculated using the following formulas: AT-skew = (A − T)/(A + T); GC-skew = (G − C)/ (G + C) 39 . Furthermore, we chose the complete mitogenome of Pagurus filholi as the reference genome for comparative genomic analysis. Genome sequence similarity among the 15 Paguroidea species was carried out using the BRIG tool. Synteny analysis between the genomes was performed using Mauve v2.4.0 40 . To estimate the evolutionary-selection constraints on 13 PCGs in the superfamily Paguroidea, the nonsynonymous (dN) and synonymous (dS) substitution rates were calculated using Mega X. The genetic distances of 13 PCGs were also estimated using Mega X based on the Kimura 2-parameter (K2P) substitution model. Mitochondrial gene order comparisons and phylogeny. CREx 41 was used to compare the mitochondrial gene order and infer the gene rearrangement scenarios based on common intervals. CREx considers four types of rearrangement events: reversals (R), transpositions (T), reverse transpositions (RT), and tandem-duplication-random-losses (TDRL). MLGO web server 42 was used to infer a phylogeny from gene order data.
Phylogenetic analysis. Phylogeny of the Paguroidea was inferred based on 13 available complete mitogenomes expanded with the two newly determined ones ( Table 1). The species Helicana wuana and H. latimera from Grapsoidea were used as outgroups. Fasta files with the nucleotide sequences for all 13 PCGs were extracted from the GenBank files using PhyloSuite 43 . The MAFFT program 44 integrated into PhyloSuite was executed to align multiple sequences in normal-alignment mode, and ambiguously aligned regions were identified and moved by Gblocks 45 . Subsequently, the sequences were concatenated into a single alignment and converted into input files (Phylip and Nexus format) for phylogenetic analyses. Phylogenetic trees were built under maximum likelihood (ML) and Bayesian inference (BI) methods. The ML analysis was conducted using IQ-TREE 46 (Tables 2, 3). Meanwhile, 27 overlapping nucleotides are located in seven pairs of neighboring genes for both mitogenomes. These overlapping nucleotides vary in length from 1 to 7 bp, and the longest overlap is located between ATP8 and ATP6 as well as ND4 and ND4L (Tables 2,  3). The base composition of D. arrosor is A = 33.3%, T = 34.6%, C = 15.7%, G = 16.4% and that of D. aspersus is A = 33.4%, T = 32.6%, C = 15.7%, G = 18.3%. The AT content is 67.9% in D. arrosor and 66.0% in D. aspersus, thus exhibiting a strong AT bias (Tables S1, S2).
Except for ND5 (uses GTG as the start codon) and ND3 (uses TTG as the start codon) in D. aspersus mitogenome (Tables 2, 3), the remaining PCGs initiate with typical ATN codons. As for the stop codon, the majority of PCGs stop with TAA or TAG except for ND5 (uses a single T as the stop codon) and ND6 (uses GAC as the stop codon) in the two mitogenomes (Tables 2, 3). The GC-skew values of five PCGs (ND5, ND4, ND4L, ND1, and ND3) are positive, indicating they are encoded by the L-strand, whereas the remaining eight exhibit negative values, indicating they are encoded by the H-strand (Tables S1, S2).
Twenty-two tRNAs of D. arrosor and D. aspersus mitogenomes are scattered throughout the entire mitogenome (Tables 2, 3). The total length of 22 tRNAs is 1455 bp in D. arrosor and 1460 bp in D. aspersus (Tables S1, S2). All of the tRNAs can be folded into typical cloverleaf secondary structures except for the tRNA-Ser (TCT) in the two mitogenomes (Figs. S3, S4). The lack of DHU arm in tRNA-Ser (TCT) is thought to be a common phenomenon in metazoan mitogenomes 12,53 . The 16S rRNA and 12S rRNA genes of D. arrosor and D. aspersus are located between ND1 and tRNA-Val and between tRNA-Val and CR, respectively. The AT content of the two rRNAs is 73.3% in D. arrosor, which is higher than that of D. aspersus (70.1%) (Tables S1, S2).

Codon usage bias in Paguroidea mitogenomes. Codon usage bias is a phenomenon in which specific
codons are used more frequently than other synonymous codons by certain organisms during the translation of genes to proteins. In this study, the relative synonymous codon usage (RSCU) of 15 hermit crabs is roughly identical. Except for Pagurus longicarpus and Pylocheles mortensenii, which miss codons, the other 13 species have all 62 available codons. The lost codons usually belong to GC-rich codon-families (Fig. S5, Table S3). The RSCU values for the codons NNU and NNA are usually greater than one, suggesting a strong AT bias in the third www.nature.com/scientificreports/ codon position (Fig. S5, Table S3). This result supports the hypothesis that the codon usage bias in PCGs and the AT bias of the third codon position are positively correlated 54,55 .
Comparative genomic analysis of Paguroidea species. Using the P. filholi mitogenome as the reference sequence, all available mitogenomes in the superfamily Paguroidea were compared using BRIG. The results reveal the gene rearrangement in 15 Paguroidea mitogenomes (Fig. 1). The mitogenomes of the family Paguridae are observed to be fairly conserved, with about 80% sequence identity in most regions (six innermost rings in Fig. 1). However, the mitogenomes of the species under the families Coenobitidae, Diogenidae, and Pylochelidae are quite different from the family Paguridae, as can be seen from the larger gap regions in the BRIG map (nine outermost rings in Fig. 1). By using the Mauve analysis, we identified five large genomic homologous regions (marked A-E in Fig. 2). These homologous regions are commonly presented in all 15 Paguroidea mitogenomes. For the B region, it has the maximum length diversification and is greatly contributed to the genome size variation between Paguroidea mitogenomes (Fig. 2). In addition, we found that homologous regions D and E are rearranged in D. aspersus mitogenome. The two homologous regions show an E-D order in D. aspersus mitogenome, while the other hermit crabs display a D-E order (Fig. 2). www.nature.com/scientificreports/ To estimate the evolutionary-selection constraints on 13 PCGs in the superfamily Paguroidea, we perform dN/ dS analysis for each PCG. The dN/dS ratios for all PCGs are less than 1, indicating that these genes are evolving primarily under purifying selection. Among them, the lowest dN/dS value (0.113) for COI gene indicates the strongest purifying selection, whereas the highest dN/dS value (0.707) for ATP8 gene shows a highly relaxed purifying selection (Fig. 3). In general, the dN/dS values indicate that the evolution of Paguroidea mitogenomes has been dominated by purifying selection. Besides, we conduct genetic distance analysis for 13 PCGs. COI gene possesses the least genetic distance (average 0.237), and ND6 gene captures the largest value (average 0.494), representing the most conserved and variable genes, respectively (Fig. 3).
Mitochondrial gene order and rearrangements. The gene arrangement in the mitogenomes of D.
arrosor and D. aspersus is shown in Fig. 4. The gene order of the two mitogenomes belonging to the same genus is different. Compared with the gene order in ancestral crustaceans (the pancrutacean ground pattern) mitogenomes 56 , the gene order in D. arrosor and D. aspersus mitogenomes underwent large-scale gene rearrangements. For D. arrosor, at least six gene clusters (or genes) significantly differ from the typical order, involving 12 tRNAs (L 2 , G, A, S 1 , P, L 1 , I, Q, M, W, C, and Y), and two PCGs (ND3 and ND2). Of these six gene rearrangements, a single L 2 is inverted from the downstream of COI in the H-strand to downstream of the G in the www.nature.com/scientificreports/ L-strand (Fig. 4A①). The G-ND3-A-S 1 cluster is inverted from the downstream of COIII in the H-strand to downstream of the CR in the L-strand (Fig. 4A②). A single P moves from the downstream of T to downstream of the S 2 (Fig. 4A③). A single L 1 is inverted from the downstream of ND1 in the L-strand to downstream of the COI in the H-strand (Fig. 4A④). The I-Q-M-ND2 cluster is divided into two sections, one (I, M, and ND2) is shifted to downstream of K. The other (Q) is shifted to the end of the linear mitogenome (Fig. 4A⑤). The W-C-Y cluster order is changed into Y-W-C order (Fig. 4A⑥). For D. aspersus, there are also at least six gene clusters (or genes) that differ significantly from the typical order, but the genes involved are different from D. arrosor. is shifted to downstream of ND4L (Fig. 4A③). Based on the CREx analysis, transposition, reversal, and TDRL may be involved in the large-scale gene rearrangements in D. arrosor and D. aspersus mitogenomes (Figs. S6, S7). The 15 hermit crabs exhibit six types of gene organization (Fig. 5). The mitogenomes of the family Paguridae possess three types of gene order (Type I, Type II, and Type III in Fig. 5). Relative to the remaining three types of gene order (Type IV, Type V, and Type VI in Fig. 5), these three types of gene order are more similar. The mitogenomes of the family Diogenidae possess two types of gene order (Type IV and Type V in Fig. 5). Only the gene arrangement of one gene cluster (T-ND6-Cyt b-S 2 ) is found to be different between the two gene orders. For the remaining two families, Coenobitidae and Pylochelidae, each has only one type of gene order (Type IV and Type VI in Fig. 5). Among them, Coenobitidae shares one of the two gene orders of Diogenidae (Type IV). These results are consistent with the conclusion from the gene order-based phylogenetic tree (Fig. 5). In the gene order tree, all Paguridae species cluster into a clade, showing the closest relationship (Clade I). Species of the family Coenobitidae and Diogenidae are clustered together as a group (Clade II). As the only representative of the family Pylochelidae (Clade III), P. mortensenii forms a seperate branch. Our results support that comparisons of mitochondrial gene rearrangements, to some extent, are a useful tool for phylogenetic studies. www.nature.com/scientificreports/ Phylogenetic analysis. In the present study, the phylogenetic relationships among Paguroidea were reconstructed based on the nucleotide sequences of 13 PCGs using maximum likelihood (ML) and Bayesian (BI) methods. The phylogenetic trees (ML tree and BI tree) show an identical topology; thus, only one topology (BI) with both support values is displayed (Fig. 6). The results show that D. arrosor and D. aspersus are most closely related, forming part of the family Diogenidae. Three Diogenidae species are separated into two clades, two   Besides, of the four families included in the phylogenetic tree, almost all families except Diogenidae form a monophyletic clade. However, the paraphyly of the family Pylochelidae was originally proposed by Richter and Scholtz 57 and has been confirmed by many previous researches 58,59 . Since there is only one representative of the family Pylochelidae in our study, the monophyly of this taxon should be treated with caution.

Discussion
In the present study, our phylogenetic reconstruction based on the nucleotide sequences of 13 PCGs recovered a robust tree (Fig. 6). For a long time, the phylogenetic status of Diogenidae has been controversial. Most previous researches based on morphological features considered it to be a paraphyletic clade. But Forest 60,61 suggested that the family Diogenidae is an ancient monophyletic group. In recent years, an increasing number of molecular studies, including ours, have supported the paraphyly of this taxon. For example, Tsang et al. 's used two nuclear protein genes to conduct phylogenetic inference and clearly pointed out that the family Diogenidae is a paraphyletic clade 62 . In their study, the genus Coenobita (Coenobitidae) is embedded within the branch of the family Diogenidae. Based on mitochondrial gene sequences, Landschoff and Gouws's research recovered the paraphyly of the family Diogenidae as well 63 . However, there are few studies on the phylogenetic relationships among the genera of the family Diogenidae. Previous studies mainly focused on exploring the phylogenetic relationships of the infraorder Anomura, involving only a few genera and species of the family Diogenidae. Accordingly, increased taxon sampling is required to conclusively resolve the phylogenetic relationships within the family Diogenidae and the superfamily Paguroidea. Besides, we reconstructed the phylogeny of Paguroidea based on the gene order (Fig. 5). In the family-level relationships within Paguroidea, the gene order information seems to be reliable for phylogenetic inference. A good illustration is that the monophyly of three families (Paguridae, Coenobitidae, and Pylochelidae) and the paraphyly of the family Diogenidae are reconfirmed in the gene order tree (Figs. 5, 6). Our conclusion is in accordance with previous workers, who suggested that gene rearrangements, to some extent, contain phylogenetic information. For example, Shao et al. 64 compared the mitochondrial gene arrangements of 12 anomurans and found that Munidopsis lauensis and M. verrilli are most closely related to Shinkaia crosnieri. Based on the comparative analysis of mitochondrial gene arrangement within Coleoidea, Akasaki et al. 65 concluded that order Octopoda might be the most ancestral among this subclass Coleoidea. However, the potential to resolve the phylogenetic relationships within families based on gene order alone is clearly inferior to sequence-based approaches. One example is that the monophyly of two genera (Dardanus and Coenobita) is not recovered in the gene order tree (Fig. 5). In future studies, it may be possible to resolve some long-standing phylogenetic controversies by integrating gene order and sequence data.
For most families of the order Decapoda, congeners belonging to the same family share the same gene arrangement generally. Accordingly, it is acceptable to apply gene rearrangement as a molecular marker for phylogenetic inference 6,15,16 . However, there are some exceptions. For example, the family Camptandriidae Stimpson, 1858 possess two different gene arrangements (unpublished), and the freshwater crabs Potamidae Ortmann, 1896 possess at least nine main types of gene rearrangement 66 . In the present study, we even found that two closely related species of the genus Dardanus capture different gene rearrangements. These examples challenge the www.nature.com/scientificreports/ utility of gene rearrangement as a molecular marker in phylogenetic studies. So it triggers a thought-provoking question that why the mitogenome gene arrangement differs between very closely related species? One possible hypothesis is that the mitogenome gene rearrangement is a continuous and dynamic process and may occur very recently even after speciation events. In future studies, more relevant data are essential to verify this hypothesis. www.nature.com/scientificreports/