Comparative mitogenomics of the Decapoda reveals evolutionary heterogeneity in architecture and composition

The emergence of cost-effective and rapid sequencing approaches has resulted in an exponential rise in the number of mitogenomes on public databases in recent years, providing greater opportunity for undertaking large-scale comparative genomic and systematic research. Nonetheless, current datasets predominately come from small and disconnected studies on a limited number of related species, introducing sampling biases and impeding research of broad taxonomic relevance. This study contributes 21 crustacean mitogenomes from several under-represented decapod infraorders including Polychelida and Stenopodidea, which are used in combination with 225 mitogenomes available on NCBI to investigate decapod mitogenome diversity and phylogeny. An overview of mitochondrial gene orders (MGOs) reveals a high level of genomic variability within the Decapoda, with a large number of MGOs deviating from the ancestral arthropod ground pattern and unevenly distributed among infraorders. Despite the substantial morphological and ecological variation among decapods, there was limited evidence for correlations between gene rearrangement events and species ecology or lineage specific nucleotide substitution rates. Within a phylogenetic context, predicted scenarios of rearrangements show some MGOs to be informative synapomorphies for some taxonomic groups providing strong independent support for phylogenetic relationships. Additional comparisons for a range of mitogenomic features including nucleotide composition, strand asymmetry, unassigned regions and codon usage indicate several clade-specific trends that are of evolutionary and ecological interest.

Traditional molecular systematic research has been largely dependent on phylogenetic reconstructions based on few molecular markers obtained via individual targeted polymerase chain reaction (PCR) and Sanger sequencing approaches [1][2][3][4][5][6][7] . More recently, the advent of second-generation sequencing (SGS), coupled with the development of simplified and streamlined laboratory and bioinformatic pipelines [8][9][10][11][12] , has seen a proliferation of eukaryotic and prokaryotic genomic resources. Notably, advances in genomic approaches have greatly increased the efficiency of recovering complete and near-complete mitochondrial genomes from species across the animal and plant kingdoms, making it a popular candidate for phylogenetic and comparative genomic studies, in addition to its role as a useful barcoding locus for species identification 13 . Despite the rapid growth in mitogenome resources, current datasets predominately come from small and disconnected studies on single or a limited number of related species, introducing sampling biases and impeding research of broad taxonomic relevance [14][15][16] . While there is a general consensus that new mitogenome resources remain valuable for comparative genomic and systematic research, improved representation of mitogenome resources for poorly-represented biological lineages is still required for studies aimed at addressing fundamental evolutionary, genomic and systematic questions 14 . Table 1. Specimen sources and mitogenome accession numbers for 21 decapod species (from 7 infraorders) contributed in this study.
www.nature.com/scientificreports www.nature.com/scientificreports/ Established as a gene order common to insects and crustaceans 67 , the pancrustacean ground pattern (coloured orange and labelled Gr) is predominantly observed for basal decapod groups such as the suborder Dendrobranchiata and infraorders Caridea, Achelata and Glypheidea, appears in a limited number of species within the Astacidea and Gebiidea, and is absent in all species sequenced so far in the other infraorders. Within the context of the inferred phylogeny, gene rearrangement events shared by members of specific clades provide examples of unifying evolutionary signatures (synapomorphies) from higher levels such as the infraorder-specific St1 pattern shared by the four analysed Stenopodidea species, through the superfamily-specific As1 for species in the Astacoidea, and family-specific rearrangements at the base of the Upogebiidae clade illustrated by the Ge1 pattern.

Lack of correlation between MGO variation and substitution rates.
Overall, evidence of episodic positive selection is observed in 10 out of 13 mitochondrial PCGs (ATP6, COX1, COX2, COX3, CYTB, ND1, ND2, ND4, ND5, ND6) and for main branches of most infraorders/suborders, with the exception of Dendrobranchiata, Glypheidea and Axiidea (Supplementary Data S4). Notably, a higher frequency of statistically significant selection is reported in Caridea mostly detected along branches leading to the genera Alpheus, Palaemon and Macrobrachium. On the other hand, accelerated evolutionary rates are observed for multiple decapod lineages across all infraorders (Supplementary Data S4). Based on observation, while there appears to be a link between substitution rates and heightened gene rearrangements in some taxonomic clades, for example in Parastacoidea (southern hemisphere crayfish) or Caridea, there are also instances where accelerated rates are  Decapod phylogenetic tree. This cladogram was inferred using the maximum-likelihood method based on Dataset I (13 mitochondrial PCGs, 10 359 nucleotide alignment). Clades are coloured according to the different infraorders. The outer colour strip in the phylogenetic tree represents the distribution of mitochondrial gene orders (MGO) in various infraorders and summarises a total of 59 different MGOs across the 246 different decapod species analysed, labelled for each infraorder in the panels below. Orange-coloured MGO labelled with 'Gr' refers to the pancrustacean ground pattern; other derived MGOs are numerically labelled and attached with a 2-letter infraorder prefix. MGOs that differ from the ground pattern are a result of a series of CREx-predicted gene rearrangement events: transposition (T), reversal (r), reverse transposition (rT), duplication (d), deletion (x) and tandem duplication-random loss (tdrl). Yellow-or red-coloured circles on some nodes reflect the level of uncertainty for the TreeREx reconstruction of each ancestral MGO, with red exhibiting highest level of uncertainty, yellow for mid-level and no circle for consistent reconstruction (see Babbucci,et al. 42 for details). Subsequent outer rings indicate, to the best of our knowledge, the possible environments (terrestrial, freshwater, marine, vents/seeps) inhabited by each decapod species.
www.nature.com/scientificreports www.nature.com/scientificreports/ observed for lineages with highly conserved MGOs as well. However, across Decapoda as a whole, the Spearman correlation analysis does not suggest a link between variable MGOs and nucleotide/amino acid substitution rates (Supplementary Data S4).

Mitogenome characteristics vary within and between decapod infraorders. Applying the
Hyper-Empirical Relative Mitochondrial Evolutionary Speed (HERMES) index developed by Plazzi, et al. 41 , the amount of mitochondrial evolution was estimated based on several genomic features found to be phylogenetically congruent in their tested datasets. These genomic features include the root-to-tip distance for each species (RtoTdist), ML distance from the E. pacifica outgroup (MLdist), percentage of Unassigned Regions (URs), amount of mitochondrial identical gene arrangements (AMIGA) for PCGs and strand usage skew (SUskew). The HERMES index appears to be informative for our dataset with reasonable goodness-of-fit statistics: Tucker-Lewis Index (TLI) 70 = 0.939; root mean square of the residuals (SRMR) = 0.053; root mean squared error of approximation (RMSEA) = 0.058, all close to boundaries suggested by Hu and Bentler 71 . The highest communality was for AMIGA at 85.5% whereas other variables scored at only 10 to 20%, resulting in a mean of 29.1% (i.e. the HERMES index accounts for 29.1% of the total variability of the source matrix). Since gene order is the key parameter in this system, the HERMES factor analysis in Fig. 3 separates individuals into two distinct groups. The first group exhibits low HERMES index and consists of individuals with MGO that is identical to or highly similar to the pancrustacean ground pattern (Gr) while the second group comprises mostly of individuals that have undergone tdrl events resulting in large rearrangements of mitochondrial PCGs (Fig. 2).
Several notable observations can be made on the clustering of data points based on various mitogenomic features, as observed in principal component analysis (PCA) plots in Fig. 4. The plot in Fig. 4a is based on the same five variables in the previous HERMES analysis. While most individuals with MGOs identical or similar to the pancrustacean ground pattern tend to form a tight cluster, some of these are distinctly separated from others based on the proportion of their unassigned regions, most of them either containing more than one control region (e.g. Metanephrops, Munida) or long intergenic regions (e.g. Geothelphusa, Longpotamon) (Fig. 4a). Additionally, Fig. 4b-d summarise nucleotide composition, asymmetry (skew) information and amino acid composition respectively. The AT content separates most of the brachyuran crabs from caridean shrimps, whereas GC-skew distinguishes the two Astacidea superfamilies (Astacoidea, Parastacoidea) (Fig. 4b). The two burrowing mud shrimp infraorders, Axiidea and Gebiidea, are often clearly split into different quadrants for plots based on nucleotide features, whereas data points representing anomuran species are generally widely scattered across all PCA plots.

Discussion
This study adds to the growing list that has benefited from the use of museum-preserved specimens for mitogenomic studies 37,72-77 , with mitogenomes for 21 species strategically sampled to fill important taxonomic gaps, particularly in under-represented decapod infraorders. The higher level topologies of all three Maximum likelihood trees reconstructed in this study are congruent at the base and top of the trees (Fig. 1), consistently recovering: [1] The suborder Dendrobranchiata as sister group to the rest of egg-brooding Pleocyemata, [2] Caridea as basal to Pleocyemata, [3] Stenopodidea as sister to all of Reptantia, [4] Axiidea and Gebiidea as separate www.nature.com/scientificreports www.nature.com/scientificreports/ infraorders, and [5] Anomura and Brachyura placed as sister taxa high in the tree, generally in agreement with previous reports 1,11,36,37 though with points of distinction with others 2,3,6,78,79 . While the higher level topology of the Bayesian-inferred tree generally exhibits similar observations, there are deviations with respect to the relationship between Caridea and Stenopodidea (recovered as sister clades) and the shuffle in positions of Axiidea, Anomura and Brachyura. Generally, incongruent topologies show that inter-relationships among lobster and crayfish infraorders still remain unresolved 1,2,6,7,78,80,81 , requiring further taxonomic sampling and/or additional nuclear gene information. It is also noteworthy that the taxonomically-impoverished infraorder Procarididea 5,82 is absent from our study, but is considered by most to likely be one of the basal lineages within the Decapoda 5 .
The phylogenetic analysis presented in this study represents one of the most comprehensive samplings of decapod species, based on mitochondrial data. While genome-based phylogenies are still rare for the Decapoda due to the generally large sizes of its genomes (1 to 40 Gbp), the increasingly lower costs of sequencing are enabling studies to target more genomic loci through genome skimming or anchored hybrid enrichment methods [83][84][85] . However, these still lack the fine-grained resolution obtained in this study. Two noteworthy studies that have been recently published have attempted to elucidate decapod inter-relationships based on genomic (Wolfe,et al. 84 ; 94 decapod species, 410 loci, greater than 86 000 bp) or transcriptomic data (Schwentner,et al. 86 ; 16 decapod species, 81 to 684 orthogroups, 17 690 to 242 530 amino acid positions). Mitogenome-based maximum likelihood phylogenies in this study share similarities in parts of their topologies with that obtained from these studies, specifically in the recovery of shrimp and prawn groups as basal clades and the traditional Meiura (Brachyura + Anomura) as more derived lineages, increasing the confidence of these nodes based on support from different underlying genetic data. Nevertheless, several nodes are still in contention. This study recovers a paraphyletic Caridea and Stenopodidea clade whereas both Wolfe, et al. 84 and Schwentner, et al. 86 recovered these infraorders as sister clades. The position of Achelata also remains unresolved in most trees as well as the interrelationships among crayfish and lobster groups, as well as the positions of the traditional thalassinids (Axiidea, Gebiidea). Overall, topologies inferred in this study are generally more similar to that obtained by Wolfe, et al. 84 .
The contrasting MGO patterns across all available decapod mitogenomes have revealed some interesting findings. Most notably, a number of highly-rearranged gene orders are observed, occurring in unequal frequencies across various decapod infraorders or at lower taxonomic levels, with a level of diversity higher than or at least comparable to the levels observed in other metazoan groups 42,47,49,63,87,88 . We see the prevalence of the pancrustacean ground pattern (Gr) 67 , and/or other highly-similar MGOs in infraorders at both the basal and more derived positions, indicating that this highly-conserved ground pattern is likely to be ancestral for decapods, with modified MGOs within each infraorder being more derived traits 35 . Hence, instances of clade-specific MGO patterns or MGO "hot spots" suggest that the utility of MGOs as synapomorphies is amplified for certain decapod groups, acting as unifying evolutionary signatures that can be used to either support or question existing classifications at a range of taxonomic levels 31,32,34,35,62,66,67,69 . For example, brachyuran species associated with the Varunidae (superfamily Grapsoidea) and Macrophthalmidae (superfamily Ocypodoidea) that form sister clades 31,66,89,90 also share the Br2 MGO pattern, which provides complementary support to other phylogenetic studies that call for a re-evaluation of classifications for the two large paraphyletic superfamilies 33,[89][90][91] . Nonetheless, it is equally important that we continue to be cautious when evaluating MGO information within a phylogenetic context 42,66,92 , keeping in mind the caveats related to potential homoplastic or convergent arrangements 42,87,93,94 , a saturation of signals from reversible rearrangement events 95,96 or possibly inaccurate inferences from unresolved phylogenies 66 .
In contrast to findings that found a positive correlation between gene rearrangements and elevated nucleotide substitution rates in insect and bivalve mitogenomes 41,45 , our correlation analysis indicates no such association for the Decapoda (Spearman correlation in Supplementary Data S4). Although the exact drivers of MGO rearrangements remain unclear, other studies have noted broad associations with adaptations to extreme ecological niches and lifestyles 34,62,66,68,97,98 . Similarly, accelerated rates of MGO rearrangements occur in various decapod lineages such as burrowing crayfish 34 , freshwater crabs 66 , and other species adapted to extreme deep-sea environments or temperatures 66,98 . However, again, our study indicates no such correlation between MGOs and ecologically-similar decapod groups. For example, freshwater crabs and crayfish (southern hemisphere parastacids) display highly variable MGOs, while other freshwater groups such as the shrimps and northern hemisphere crayfish maintain the same gene order among all sampled species (although the members of northern hemisphere crayfish group have a large distinctive rearrangement that acts as a synapomorphy for the superfamily). The same observation applies to other crab and shrimp species inhabiting deep sea vent niches, or burrowing axiid mud shrimps, which exhibit only a few rearranged MGOs.
However, ecological transitions and the evolution of a distinct Bauplan or the adoption of different lifestyles (e.g. parasitism or adaptation to freshwater) may be achieved via different evolutionary pathways 99-101 and the consequences for energy demands for an organism may be just one of a plethora of factors including thermal tolerance 102 , aerial exposure 103 , sensitivity to salinity 104 , light 105 or metal concentrations 106 . In addition to the regulation of their transcription and translation 107,108 , mitochondria are very dynamic structures often undergoing continuous fusion, fission and motility, events that can potentially affect their bioenergetic capacities 109 . However, the extent of these rearrangements and the cues that trigger them are still poorly understood. Thus, we do not rule out an association of heightened MGO rearrangements and profound changes at deep taxonomic levels (e.g. the two groups of crayfish) to clade-specific adaptations and ecological/life history transitions, but these cannot be confidently determined until we achieve a better understanding of the stressors involved at the cellular and physiological level and the associated adaptive responses.
Previous studies have compared decapod MGO patterns at lower taxonomic levels across the Decapoda 31,32,34,35,60,66,110,111 . However, this study is one of the few to investigate the extent of MGO diversity at the ordinal level and a large number of species (246 species), the other being a phylomitogenomic study on 86 malacostracan mitogenomes 38 . Other large-scale comparative MGO analyses for various metazoan groups have also reported on the distribution of MGOs, some showing relatively high rates of MGO evolution from www.nature.com/scientificreports www.nature.com/scientificreports/ across major taxonomic groups such as beetles, echinoids, batoids and elasmobranchs 64,65,119,120 . While decapod mitogenomes clearly showcase a high diversity of MGO patterns as a whole (59 MGOs across 246 species), it is premature to claim that this degree of diversity is unusual or comparable to that of other metazoan groups since results are influenced by the number of mitogenomes compared (i.e. scale of comparison), the taxonomic level of interest or as a result of unbalanced or biased sampling. We therefore highly recommend interpreting the findings from any comparative analyses with caution especially those still lacking representation for major groups and again, urge for more consideration to be given to better sampling strategies.
Comprehensive studies on the evolutionary trends of mitogenomes (both small and large scale) are also emerging for various animal groups, following the increased availability of mitogenomic resources 41,119,[121][122][123][124][125][126] . To determine if there are other aspects of the decapod mitogenome that may show clade-specific associations, aside from the arrangement of genes, we compared additional aspects of the molecular architecture and composition of mitogenomes across the major decapod groups. Mitogenomes analysed in this study are generally around 16 kbp in length, though some outliers were identified (17-20 kbp) with lengths inflated by a higher proportion of unassigned regions, i.e. containing more than one control region (e.g. Metanephrops species 61 ) or relatively long stretches of intergenic spaces such as in most Potamoidea species 127,128 . Though proportions of unassigned regions may potentially be the discerning feature for some taxonomic groups 129 , this notion should also be treated with caution as the sizes of these regions may be influenced by the assembly or annotation methods used to predict genes, exemplified by the identification of three trnL genes by Segawa and Aotsuka 127 as opposed to only two copies reported following re-annotations by MITOS in this study 10 . But generally, decapod mitogenome sizes are relatively stable compared to those of other metazoan groups such as bivalves, nematodes and sponges that exhibit strong heterogeneity in size 88 , with some larger molecules spanning lengths of over 20 kbp and even up to 48 kbp 130 .
Strand and nucleotide composition asymmetries are other mitochondrial characteristics that have been noted to vary across animal groups 49,50,[131][132][133] . In this study, we point out observed trends from PCA plots that suggest molecular signatures for certain decapod groups. Similar to insects, annelids and arachnids, decapod mitogenomes are generally AT-rich, as opposed to lower compositional AT bias often observed in chordates (e.g. fishes, birds, reptiles) 134 . Within the Decapoda itself, recognizable clusters were observed for Caridea (average: 65%, range: 59% to 70%) and Brachyura (average: 71%, range: 65% to 77%) with minor overlaps. Other noteworthy features include positive GC-skew values that appear to be a unifying characteristic for all northern hemisphere freshwater crayfish (Astacoidea) as well as for the four Coenobita anomuran species included in this study, as opposed to negative GC-skew values for most other decapods. Further, species from infraorders Axiidea and Gebiidea are well separated due to substantial differences in compositional bias and asymmetry, which, in addition to previous reports from phylogenetic and MGO analyses 6,35,135,136 , is consistent with their status as separate infraorders instead of united in what was once the Thalassinidea 1,78,137,138 . On the other hand, data points for anomuran species are dispersed across all plots, highlighting their higher plasticity and mitogenomic variability within this infraorder notorious for its morphological and ecological diversity, including the convergent evolution of the "crab" form 66,69,101,139,140 and taxonomic controversies [141][142][143][144] .
For all discussed groups with their own unique signatures, it is of interest to see if these patterns still hold when more mitogenomes are included in future analyses. Though this study is limited in scope to comparisons across whole mitogenomes, we recognise that this only skims the surface of the range of possible compositional comparisons (e.g. at the gene level) and that there are a myriad of other factors that can be included in further detailed investigations, e.g. testing different codon sites 43,44 or the effects of these asymmetries on nucleotide versus amino acid compositions 50,145 .
All Maximum-likelihood trees reconstructed from the three mitochondrial-based datasets (Datasets I to III) were rooted with E. pacifica (Euphausiacea), which is often treated as sister clade to the Decapoda 159 . Support at each node is evaluated with 1000 ultrafast bootstrap replicates (UFBoot) 160 and 1000 SH-aLRT replicates (SH) 161 .

Mitochondrial gene order (MGO) analysis.
Prior to gene order analyses, public mitogenome entries downloaded from GenBank were manually inspected for inaccuracies via cross-checks against results from the MITOS webserver 10 . Any discrepancies found (e.g. incorrect strands specified, missing genes, mislabelling, etc) were corrected and new GenBank files were generated (Supplementary Data S5). Mitogenomes were then processed through the MitoPhAST v3.0 pipeline, which compares and clusters MGOs into groups according to gene order patterns 66 . Similar to Tan, et al. 66 , the output Fasta format file containing MGO for every mitogenome was pre-processed to remove missing genes in one or more species and also to retain only a single copy of duplicated genes. Using each of the generated Maximum-likelihood trees (nucleotide-and amino acid-based) as a phylogenetic guide, gene re-arrangement pathways and putative ancestral MGOs were reconstructed with TreeREx v.1.85 54 . To obtain more accurate inferences, TreeREx results were further used to guide pairwise comparisons using web-based CREx 55 to enable the inclusion of fuller sets of genes that would have been excluded in the TreeREx analysis. The interactive Tree of Life (iTOL) online phylogenetic tool 162 was used to illustrate the distribution of MGOs across various infraorders in the phylogenetic tree.

Comparison analyses of other mitochondrial features.
Excluding individuals with partial mitogenomes, Hyper-Empirical Relative Mitochondrial Evolutionary Speed (HERMES) index is generated for each of 239 individuals using HERMES v.1.0 41 to estimate the amount of mitochondrial evolution. Given a maximum likelihood tree, the program computes the root-to-tip distance for each species (RtoTdist) and ML distance from the E. pacifica outgroup (MLdist) then merges these with other mitochondrial characteristics including the percentage of Unassigned Regions (URs), amount of mitochondrial identical gene arrangements (AMIGA) for PCGs and strand usage skew (SUskew). Also, codon usage counts in each individual were obtained with EMBOSS v.6.6.0 163 with minor adjustments for the Invertebrate Mitochondrial Code, followed by the calculation of relative synonymous codon usage (RSCU) values by taking the ratio of the actual number of times a codon appears to the expected frequency of the codon if all synonymous codons for the same amino acid are used equally 164 . Various mitogenome characteristics were then summarised by Principal Component Analysis (PCA) using Pearson correlation carried out with XLSTAT v.2018.5.52460 165 . Episodic selection, evolutionary rates and spearman correlation test. Without making an a priori assumption on the likelihood of specific lineages undergoing episodic positive selection, the aBSREL method 166 implemented in the command-line version of the HyPhy v.2.3.11 package 167 was used on the codon alignment of each mitochondrial protein-coding gene to test each branch in the maximum -likelihood phylogeny inferred from Dataset I, to inspect whether a proportion of sites have evolved under positive selection (ω > 1). Further, evolutionary rates were estimated from the same codon alignments using BEAST v.2.5.0 168 . An uncorrelated, lognormal relaxed clock model was applied to each partition and the Yule Model was used as the tree prior since taxa consist of individuals from different species, with Euphausia pacifica set as outgroup. Two BEAST MCMC runs of 6 × 10 9 were performed and convergence was checked with Tracer v.1.7.1 169 , applying a burn-in of 20% and checking for sufficient Effective Sample Size (ESS) (>200). Finally, trees from both MCMC runs were combined and a maximum clade credibility tree was constructed. Rates were visualised with Figtree v.1.4.3 170 . Correlations among different MGOs, habitat and substitution rates were measured by the Spearman correlation coefficient carried out with XLSTAT v.2018.5.52460 165 , details in Supplementary Data S4.

Conclusion
Moving forward, further research is needed to contrast the composition and architecture of mitogenomes across other metazoan groups to determine if our observed trends are common or specific to the Decapoda. As more mitogenomic data is made available, coupled with bioinformatics tools such as the MGO analysis feature in the MitoPhAST pipeline, it becomes increasingly feasible to conduct more complete and larger-scale comparisons for any animal group of interest in the future, the bottleneck now being to ensure that annotations in public database entries are accurate, which in this study was still a manual process. We have been fortunate to benefit from mitogenomes generated by various research groups that have contributed to enriching these resources for the order Decapoda but also realise that there is still need for new mitogenomes, carefully sampled to achieve a more balanced representation of infraorders. More importantly, by demonstrating what is possible for large-scale comparative MGO analyses in this study, it is our hope to inspire the undertaking of future research equivalent to or surpassing the extent of this study, contributing as a collective to the eventual reformation of the field of mitochondrial genomics.

Data Availability
Assembled mitogenomes are available on NCBI repository (See Table 1 for GenBank SRA accession numbers). Raw datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.