Characterization of the mitochondrial genome of an ancient amphipod Halice sp. MT-2017 (Pardaliscidae) from 10,908 m in the Mariana Trench

Small amphipods (Halice sp. MT-2017) with body length <1 cm were collected from the Challenger Deep (~10,920 m below sea level). The divergence time of their lineage was approximately 109 Mya, making this group ancient compared to others under study. The mitochondrial genome of Halice sp. shared the usual gene components of metazoans, comprising 13 protein coding genes (PCGs), 22 transfer RNAs (tRNAs), and 2 ribosomal RNAs (rRNAs). The arrangement of these genes, however, differed greatly from that of other amphipods. Of the 15 genes that were rearranged with respect to the pancrustacean gene pattern, 12 genes (2 PCGs, 2 rRNAs, and 8 tRNAs) were both translocated and strand-reversed. In contrast, the mitochondrial genomes in other amphipods never show so many reordered genes, and in most instances, only tRNAs were involved in strand-reversion-coupled translocation. Other characteristics, including reversed strand nucleotide composition bias, relatively higher composition of non-polar amino acids, and lower evolutionary rate, were also identified. Interestingly, the latter two features were shared with another hadal amphipod, Hirondellea gigas, suggesting their possible associations with the adaptation to deep-sea extreme habitats. Overall, our data provided a useful resource for future studies on the evolutionary and adaptive mechanisms of hadal faunas.

Phylogenetic analysis and divergence time estimation. The hadal Halice sp. MT-2017 from the Mariana Trench clustered robustly with the abyssal species collected at 2567.6 m from the Iceland Basin 28 , and separated from others in shallower water (289.4-510.9 m) using Bayesian and maximum likelihood methods based on partial cytochrome c oxidase subunit I (cox1) barcodes (Fig. S1). In terms of all the taxa in Halice used in the phylogram, although none of them had been nominated a specific name, the genetic distances between them, calculated by the p-distance, were large enough to delineate them at the species level (Table S2), considering that 0.03 of the p-distance had commonly been used as a threshold for amphipod species demarcation 28 . The vertical and geographic distribution of Halice discovered in hadal trenches and Icelandic waters conformed to the "tropical submergence" hypothesis, which proposed that closely related species lived in shallower waters at high latitudes and deeper waters at low latitudes, a reflection of adaptation to cold temperature 29 . Other evidence can also be seen in hydrothermal vents, an ecosystem different from those of Icelandic waters and hadal trenches, where Halice (H. hesmonectes, with no barcodes submitted) were distributed only in the vicinity of low temperature vent openings (2-8 °C) 30 . Considering the fact that Hirondellea (which H. gigas belonged to) was a shallower-Antarctic and deep-sea genus 31 , also in line with "tropical submergence", hadal amphipods supposedly originated or derived from relatively higher latitudes. Interfamilial phylogenetic analyses were performed with a dataset of 13 amphipods as ingroup based on the concatenated nucleic acid and amino acid sequences of 13 protein coding genes (PCGs) using the maximum likelihood (ML) method. The topologies of the two phylogenetic trees were nearly congruent in our study, illustrating eight separate clades which corresponded to eight identified families or superfamilies (Fig. S2). The p-distance analysis showed that irrespective of the clade represented by Pardaliscidae, the inter-group genetic distances in the other seven superfamilies ranged from 32.43% (between Lysianassoidea and Talitroidea) to 40.28% (between Hadzioidea and Caprelloidea) (Table S3). Regarding Pardaliscidae, the genetic distance to the superfamily Talitroidea was the shortest (38.65%) and the distance to Caprelloidea was the longest (41.52%). Given the minimal inter-clade divergence for Pardaliscidae (38.65%) falling into the range of the inter-superfamily divergence The dendrogram with 95% credible intervals (CIs) for divergence time estimation was constructed from Bayesian analysis, revealing a similar topology to the ML trees with high posterior probabilities (Fig. 2). Although the credible intervals were large in some nodes due to the lack of fossil calibration, some interesting results still could be inferred. The hadal amphipods under study (Halice sp. MT-2017 and H. gigas) were polyphyletic, with members diverging at different times and belonging to different taxonomic classifications. The hadal cladogenesis for Pardaliscidae (which Halice belongs to) was approximately 109 Mya (95% CI: 76.58-143.57 Mya) during the Cretaceous period. They could belong to one of the ancient faunas that survived the catastrophe in the Late Cretaceous (Maestrichtian) 32,33 , during which uncertain factors, such as comet impacts, volcanic eruptions, acid rain, sea level transgressions, and sea level regressions, eradicated more than half of the marine species 33 . The discovery of relatively primitive relict groups in the deep sea has also occurred for many other taxa, such as starfishes, isopods, and bivalves. The great depths of the ocean are speculated to provide refuge for their continued existence 34 . Nevertheless, in more cases, the deep-sea faunas were relatively "young, " occurring no earlier than the Cenozoic period (65.5 Mya to present) 15 . H. gigas in the present study dated back to around 58 Mya (95% CI: 38.46-77.50 Mya), a divergence time earlier than that of the Halice lineage, and possibly went through the boundary of the Paleocene and Eocene periods, during which a climate-driven anoxia or dysoxyia caused extinctions in the deep sea 35,36 . Only the taxa with strong resistance would escape the radical extinction and show allopatric speciation 36,37 . Considering the significant discrepancy in body sizes (<1 cm for Halice sp. MT-2017 vs 2-5 cm for H. gigas, trapped concurrently), and the phylogenetic status of Halice sp. MT-2017 and H. gigas, the evolution of these two species could have proceeded by different biological processes and physiological mechanisms.
Mitochondrial gene rearrangement. The gene arrangements of eight superfamilies were compared to the hypothetical ancestral pancrustacean (hexapods and crustaceans) gene order 38 (Fig. 3). There were 15 genes in Halice sp. MT-2017 that showed altered locations, among which 12 were both translocated and strand-reversed. The reverse-stranded-translocation event, which has rarely been discovered in the rRNAs and PCGs of other amphipod mitochondrial genomes, resulted in the switch of transcriptional polarity in relevant genes of Halice sp. MT-2017, including two rRNAs, two PCGs (nad1, nad5), and eight tRNA genes (trnL [CUN], trnP, trnF, trnE, trnV, trnN, trnS [AGY], and trnA). Regarding PCGs, gene rearrangements in other amphipod mitochondrial genomes mostly occurred in nad6 and cytb 27 . For Halice sp. MT-2017, however, the gene orders of nad6 and cytb were identical to those of the pancrustacean ground pattern. Alternatively, the changes in the order of PCGs were focused on the rearrangements of nad1 and nad5 (Fig. 3). This pattern of change for PCGs was not exclusive to Halice sp. MT-2017, but has only been seen in some specific cases, such as the rearrangements of nad1 in Pallaseopsis kesslerii (Gammaroidea) 23 and nad5 in Caprella mutica (Caprelloidea) 39 . Regarding the altered gene order of tRNAs with respect to the ancestral pancrustacean pattern, the translocation of trnG and the typical  derived pattern of the gene string trnA, trnS (AGN), trnN, trnE, and trnR were assumed to be two apomorphic features in the extant amphipod species 39,40 (Fig. 3). However, Halice sp. MT-2017 retained the unaltered position of trnG relative to the location in the ancestral pancrustacean mitochondrial genome and the alternation of the tRNA order gave rise to a unique tRNA string (trnE, trnN, trnS [AGY], trnA, trnQ, trnC, and trnY) that was different from the above apomorphic tRNA gene block (trnA, trnS [AGN], trnN, trnE, and trnR). Moreover, this special tRNA string has not been seen in other metazoan mitochondrial genomes in the MitoZoa database 41 . A similar phenomenon also occurred in the mitochondrial genome of Parhyale hawaiiensis 42,43 , which had a specific tRNA cluster, trnI, trnA, trnN, trnR, and trnT, and also without related MitoZoa records. In summary, Halice sp. MT-2017 showed particular rearrangements and polarity alternations in its mitochondrion genes, which were seldom seen in most amphipods from other superfamilies. Further, CREx 44 was applied to deduce the gene rearrangement scenarios with reference to the ancestral pancrustacean gene pattern. The mitochondrial genome of Halice sp. MT-2017 underwent two gene transpositions (trnN and trnV), one inversion-coupled transposition (trnR), three wide-ranging reversals of strands-one involving a gene block unexpectedly composed of 20 genes and the other two including trnP and a gene string (trnS [UCN], cytb, nad6, trnP, trnT, nad4l, nad4, and trnH), respectively-and two complex tandem duplications with subsequent random gene loss (TDRLs) to its present gene order (Fig. S3). The large scale of this inversion event has not been observed in other amphipods; however, it is a characteristic usually seen in the mitochondrial genomes of echinoderms 45 and metastriate ticks 46,47 . Mechanistically, the long-range reversions in the mitochondrial genome could be well explained by intra-mitochondrial recombination 48,49 , in which both DNA breakage and reconnection are required. A hot spot for the recombination event was thus posited to be relevant to the non-coding AT-rich CR with the origin of replication [50][51][52][53] . In the Halice sp. MT-2017 mitochondrial genome, moreover, excluding the CR between the two rRNAs, there were intergenic non-coding small spacers spread over 17 locations, ranging from 1 to 100 bp, and comprising 256 bp in total, with a rich AT content of 84.38%. Both the non-coding CR and small spacers were particularly scattered adjacent to genes involved in the above deduced rearrangement courses (Figs 3 and S3) (e.g., the intergenic spacers flanking the gene string from trnH to trnS2 and the gene trnP, which were involved in reversal 2 and reversal 3, respectively; the CR between rrnS and rrnL; and the spacers abutting trnP, which were allowed for the TDRL 1). These non-coding regions probably exhibited the traced relics generated from the antecedent gene rearrangement events 38 .
For most metazoan mitochondrial DNA, there was a notable lack of genetic recombinations 20,21 . The significant mitochondrial recombination in this hadal species could be explained as a special approach to escape the consequences of Muller's ratchet 54,55 . The "ratchet mechanism" states that deleterious mutations would accumulate more easily, especially in a population without recombination. For mitochondrial DNA, this inexorable process would bring about higher and higher mutational levels, to the degree of complete dysfunctionality and even extinction of the genome. Therefore, gene recombination could be a survival strategy for the hadal Halice sp. MT-2017 to offset the high mutational rates of mitochondrial DNA 56 . Base composition bias of the mitochondrial genome. The nearly complete mitochondrial genome of Halice sp. MT-2017 had an AT content of 74.40%, which was comparable to the typical AT richness in many other amphipod species (Table 2; from 62.24% to 76.03% for complete mitochondrial genomes and from 69.79% to 74.35% for incomplete ones). A comparison of AT contents in 13 PCGs across eight superfamilies showed that there was no distinct pattern observed in both strands (Fig. 4a).
The mitochondrial genome of Halice sp. MT-2017 was skewed away from C in favor of G, resulting in a positive value (0.224) for GC skewness, which was opposite to that of most other amphipods, as shown in Table 2. To further explore this reversed direction of skewness, the skew values of 13 PCGs were compared. As shown in Fig. 4b, six superfamilies represented by Onisimus nanseni, Caprella scaura, Pseudoniphargus gorbeanus, Brachyuropus grewingkii, P. hawaiiensis, and Gondogeneia antarctica followed the common pattern of malacostracans 57 , where the genes encoded on the light strand showed negative GC skew values and, conversely, the genes on the heavy strand exhibited positive GC skew values. However, for Halice sp. MT-2017 and M. longipes (from the family Metacrangonyctidae) 24 inhabiting subterranean waters 58 , the PCGs on the light strand showed a reversed pattern in which almost all GC skew values were positive, with the exception of atp8 and nad6 in M. longipes. This GC skew inversion was also detected in nad4 of Halice sp. MT-2017 and cytb of M. longipes on the heavy strand (Fig. 4b). Regarding AT skewness (Fig. 4c), it was noted that the overall AT skewness for most superfamilies under study was more intensively inclined on the heavy strand than on the light strand. On the contrary, inclinations of AT skewness in Halice sp. MT-2017 and M. longipes showed a reversed pattern for the two strands, suggesting the reversion of stand bias for these two species. As to the other hadal species, H. gigas, which belonged to the same superfamily (Lysianassoidea) as the amphipod O. nanseni in the present study, the GC skew values for all of its 13 PCGs were positive; however, this overall pattern did not accord well with that of O. nanseni (Fig. 4b). Therefore, there was the possibility that the mitochondrial genome of H. gigas had also suffered strand bias reversion during its evolution, although the two unjointed contigs in its mitochondrial DNA impeded our reassurance of this hypothesis. An explanation for the strand bias reversion could be related to the reversal of mitochondrial CR, the inversion of which would change the mutational constraints for the two mitochondrial strands during DNA replication, transcription, or both 59 . CR inversion has been postulated in Metacrangonyctidae 24 based on the inverted polarity of trnS (UCN) and cytb near the CR. Halice sp. MT-2017 in the present study showed a similar case with inverted rrnS and rrnL flanking the CR. Based on the analysis in the present study, the taxa showing the reversed pattern did not appear to be phylogenetically clustered or have related ecological habitats; therefore, the reversal of the ordinary strand bias probably occurred independently multiple times during the evolution of amphipods, especially in the relatively ancient superfamilies represented by Halice sp. MT-2017 and M. longipes (Fig. 2).   (Fig. 5A), with Leu and Ser the most frequently used amino acids, accounting for approximately a quarter of the amino acids in total, and Cys, Arg, and Glu being rarely used. Nevertheless, there were still small variations in the frequency of each amino acid between different species. Considering that the amino acid composition and properties could have an influence on the function of proteins 60,61,62 , these two parameters were compared between the two hadal species (Halice sp. MT-2017 and H. gigas) and the other 11 amphipods. Statistical t-tests showed that the percentages of non-polar amino acids from the two hadal mitochondrial genomes (64.50 ± 0.39%) were significantly higher than those of the amphipods in shallow water (62.61 ± 0.67%) (Fig. 5B). Accordingly, the composition of the polar uncharged amino acids and charged amino acids were remarkably higher in the non-hadal amphipods than in the hadal amphipods (Table S4). Therefore, polarity seemed to play an important role in protein stability under the conditions of the hadal environment. It has been reported that only tens of atmospheres of pressure would be necessary to cause dysfunction in the protein activity of shallow water species 63 . The fauna in the hadal trench would likely have evolved special mechanisms to cope with the thousands of atmospheres of pressure in the deep sea. At denaturing pressure, membranes or related processes are among the most sensitive to hydrostatic pressure 64 . As the 13 PCGs of mitochondria were all transmembrane proteins embedded in the hydrophobic lipid chains of the membrane 65,66 , the increase in non-polar amino acid content might be conducive to the compaction interaction between the membrane proteins and the lipid chains in the mitochondrial membrane, thus ameliorating the potential of damage caused by the pressure on the membrane. The maintenance of the mitochondrial structure provided a premise for these hadal amphipods to sustain metabolism, growth, and even reproduction, and would be an adaptation agreeable with the extreme environment in the hadal trench.  Table S5; it was consistent with the canonical types of invertebrate mitochondrial codes 67 . All of the 13 PCGs were initiated with the ATN, which was the typical start codon for the metazoans 26 . Regarding the stop codons, 12 of the 13 PCGs used TAN as their termination codon, whereas only cox2 terminated with a single T. This truncated stop codon was believed to be completed by post-transcriptional polyadenylation 68 . The most frequently used codons in Halice sp. MT-2017 included TTA (Leu, 9.94%), TTT (Phe, 9.26%), ATT (Ile, 7.14%), and ATA (Met, 5.57%). Moreover, a preference for these four codons was also observed in other non-hadal species (Table S4). In H. gigas, apart from the TTT (Phe, 9.04%), TTA (Leu, 7.98%), and ATT (Ile, 7.20%) mentioned above, TTG (4.75%) encoding Leu was the codon also used more frequently, which was not found in other amphipods. Overall, there was a bias in favor of AT-rich codons in all of the currently studied amphipods, which was still notable for other arthropods 69 .
Nonrandom usage of synonymous codons is a common phenomenon in nature 70 . The effective number of codons (ENC) is a simple metric of the synonymous codon usage bias, ranging from 20, if all the amino acids are encoded by only one codon, to 62, when all the synonymous codons are equally used 71 . For all amphipods in the present study, the ENC values ranged from 34.32 (H. gigas in Lysianassoidea) to 52.97 (B. grewingkii in Gammaroidea). The ENC of Halice sp. MT-2017 was 41.83, indicating that approximately two-thirds of the possible codons were employed effectively in its mitochondrial genome. Although the ENCs were species-specific among all amphipods under study, there were two codon usage patterns when referring to their relationship with the GC contents at the third synonymous codon position (GC3) (Fig. 5C). The first pattern showed a linear correlation between the ENC values and the GC3s (R 2 = 0.834, p < 0.01). In this pattern, the synonymous codon usage was associated with the G + C content of the mitochondrial DNA and the ENCs reflected the species-specific mutational bias around different mitochondrial genomes 72 . Most amphipods under study displayed this pattern. However, some species showed a significant deviation from the linear association, such as P. gorbeanus and H. gigas in Fig. 5C. The number of the effectively used codons (34.32) in H. gigas was lower than that estimated by the regression formula (39.41), indicating a relatively strong bias of codon usage in H. gigas. The codon usage bias of H. gigas reflected the natural selection for certain codons, through which highly expressed genes exhibited  (Table 2), which was the highest value compared to other complete rRNAs of the amphipods under study. The AT and GC skew values were negative (−0.006) and positive (0.340), respectively, which was analogous to that observed for other amphipods ( Table 2). The rrnS and rrnL were 735 bp and 1,061 bp in length, respectively, with 1,796 bp in total (Table 1), and the two rRNA genes were located on the light strand, which was different from the positions of rRNAs observed for other amphipods on the heavy strand.
Evolutionary rate estimation. The mitochondrial PCGs of amphipods from different taxonomic classifications and ecological environments were used to estimate the non-synonymous/synonymous substitution (dN/dS) ratios under the branch model, assuming the rate consistency along each codon site in the branch 80 . The results showed that all of the evolutionary rates (dN/dS ratios) referring to the whole mitochondrial genome under study were less than one, indicating that the function of the mitochondrial DNA was well-maintained during evolution. Moreover, it was noteworthy that the hadal species (Halice sp. MT-2017 and H. gigas) demonstrated smallest evolutionary rate values than did the amphipods in other habitats ( Table 3). The slower evolutionary rate of mitochondrial genomes in deep-sea species has also been discovered in isopods 81 , which may be related to the relatively constant environment in the deep sea. To explore the genes making contributions to the overall slower evolutionary rates of these two hadal mitochondrial genomes, the dN/dS ratios were calculated in 13 PCGs separately. The results showed that nad4 and cox2 had lower evolutionary rates for both hadal amphipods; nad6 exhibited a lower evolutionary rate for Halice sp. MT-2017, and cox3, nad4l, and nad5 showed lower evolutionary rates for H. gigas (Table S6). The slower evolutionary rates for these genes indicated that they were under a stronger purification selection 80 , which was critical in removing the disadvantageous mutations and maintaining mitochondrial gene functions 22 .

Materials and Methods
Sampling and DNA extraction. Individuals   AT skew = (A − T)/(A + T) and GC skew = (G − C)/(G + C), in which A, T, G, and C were the contents of four bases 92 . The codon usage was analyzed using the Sequence Manipulation Suite 93 . The relative synonymous codon usage was calculated with MEGA 6.0 94 . The ENC 71 was determined using the INCA 2.1 software program 95 . The percentage of each amino acid was calculated by summarizing all of its corresponding codons. A two-tailed t-test was performed using the 't.test' function in R software (3.5.1) to calculate differences and the significant levels (p-value) for amino acid contents, as well as their corresponding property groups, between the deep-sea species (Halice sp. MT-2017 and H. gigas) and the amphipods from shallower waters (listed in Table 3).

Substitution rate estimation.
To estimate the dN/dS ratios, standard branch models were performed on the 13 concatenated mitochondrial PCGs and 13 separate PCGs, respectively, with the 'codeml' program in the PAML 4.7 software package. A "free-ratio" model was set and the ambiguous characters and the alignment gaps were removed 80 . phylogenetic inference. In the aspect of DNA barcoding, eight available partial cox1 (cytochrome c oxidase subunit I) sequences of Pardaliscidae in Genbank were used to illustrate the taxonomic placement of the hadal Halice sp. MT-2017 specimens collected below 10,000 m. Eight cox1 sequences from other closely related families were also included in the tree construction. Related accession numbers are listed in Table S8. All of the sequences were aligned using MUSCLE 3.8.31 96 . Based on the well-aligned 603 nt sequence, Bayesian and maximum likelihood methods were used to construct phylogenetic trees with MrBayes 3.2.6 97 and RaxmlGUI 1.3 98 respectively, using the GTR + G + I model as recommended by jModelTest 2 99 . Four independent runs of four MCMC chains were performed for Bayesian analysis. Chains were run for five million generations, and the first 25% of generations were discarded as burn-in. The node stability of the maximum likelihood tree was assessed with 1,000 bootstrap replicates. Phylogenetic analyses were also performed based on the mitochondrial genomes of Halice sp. MT-2017 and those of other 12 amphipods belonging to the seven distinct superfamilies available in GenBank, with four species of isopods used as outgroups. Detailed information of the sequences used was shown in Table S8. Both nucleotide and amino acid sequences of the 13 PCGs were aligned using MUSCLE 3.8.31 96 separately. Removing the poorly aligned regions and concatenated conserved sequences were performed using Gblocks 0.91b 100 with default stringent parameters. After being trimmed by Gblocks, the remaining nucleotide and amino acid datasets consisted of 9,732 nt and 2,816 aa, respectively. Phylogenetic analysis for each dataset was carried out using the ML method. Regarding the nucleotide sequences, the GTR + G + I model was selected by jModelTest 2 99 . For the amino acid dataset, the MtArt + G + I + F model was selected by ProtTest 3.4 101 . ML analysis was carried out using RaxmlGUI 1.3 98 for the nucleotide dataset and PhyML 3.0 102 for the amino acid dataset, both of which were conducted with 1,000 bootstrap replicates. Genetic distances between clades were computed using Mega 6.0 94 with the p-distance mode for both cox1 and 13 concatenated PCGs sequences.
There was a lack of an appropriate fossil record for the calibration of a molecular clock with regard to amphipods 103 ; instead, geological events for molecular calibration were used. The formation of the shallow lake, Lake Baikal, occurred approximately 27-35 Mya, and the molecular study revealed that the main Baikal amphipods (Eulimnogammarus vittatus and B. grewingkii in the present study) occurred at a comparable time to the formation of Lake Baikal 104,105 . BEAST v1.8.4 106 was implemented to estimate divergence times. An uncorrelated relaxed lognormal clock with the GTR + I + G substitution model was used, and a Yule process was set to the tree prior. A normal distribution was applied to the tree calibration node and the most recent common ancestor of Baikalian Scientific RepoRts | (2019) 9:2610 | https://doi.org/10.1038/s41598-019-38735-z amphipods was set at 30 ± 2 Mya. Following a burn-in of the initial 50% of cycles, divergence times were sampled once every 1,000 generations from 600 million MCMC iterations. The sampled trees and the associated 95% highest posterior density distributions around the estimated node ages were annotated in TreeAnnotator v1.8.4 (BEAST software). Visualization of the tree was realized in FigTree v1.4.3. The effective sample sizes (ESSs) were used for determining the Bayesian statistical significance of each parameter in TRACER v1.5 (ESS > 200) 107 .