First mitochondrial genome for the red crab (Charybdis feriata) with implication of phylogenomics and population genetics

In this study, we first described the complete mitochondrial genome for the red crab (Charybdis feriata), elucidated its phylogenetic relationship among 20 species within Decapoda, and estimated the population genetic diversity. The mitochondrial genome was 15,660 bp in size and encoded 13 protein-coding genes, 22 transfer RNA (tRNA) genes, and two ribosomal RNA genes. The gene arrangement of the mitochondrial genome was the same as that of its sister species, C. japonica. Phylogenomic analysis suggested that genus Charybdis should be classified into subfamily Portuninae but not into subfamily Thalamitinae. Moreover, a total of 33 haplotypes of complete cytochrome c oxidase subunit I gene were defined in 70 individuals of C. feriata derived from three localities. Haplotype diversity and nucleotide diversity values among three localities indicated a high level of genetic diversity in C. feriata. AMOVA analysis suggested a low level of genetic differentiation among the three localities (FST = 0.0023, P > 0.05). Neutrality tests and mismatch analysis revealed that C. feriata might have undergone a population expansion event that possibly occurred in the last 61,498 to 43,814 years. This study should be helpful to better understand the evolutionary status, and population genetic diversity of C. feriata and related species.

The red crab, Charybdis feriata (Crustacea: Decapoda: Portunidae) (Linnaeus, 1758), also known as the crucifix crab, is a large swimming crab species that is broadly distributed in the Indo-Pacific sea areas, including Japan, China, Indonesia, Australia, India, Pakistan, Oman, and South Africa 1,2 . C. feriata can be easily distinguished by its striking red and white colour pattern and the distinct cross on the median surface of its carapace 3 . Young crabs usually live in the sandy shore, whereas adults prefer to inhabit the areas of sandy and muddy bottoms at depths from 30 m to 60 m 4 . Given its fast growth speed, large size, good flavor, and high market demand, this species is considered as one of most valuable fisheries resources and has become a potentially important target for aquaculture, domestication, and stock enhancement 5,6 . The wild females usually weigh 200 -350 g, but the males can grow up to 1 kg 7 . In the last few decades, the catching production and the wild resources of C. feriata have been decreasing on a yearly basis 8 due to over-exploitation and environmental deterioration. Despite its economic importance, studies have been limited to reproductive biology 7,8 , larvae characteristics 9,10 , and fishery biology 11 . Little information could be available about the germplasm resource and population genetic structure for this species by now, except that moderate variation was reported for two wild populations sampled from Shanwei City and Zhoushan City, China based on the cytochrome c oxidase subunit I (COI) gene and microsatellite markers, respectively 12,13 . To estimate and conserve this important crab resource, genetic studies such as population genetic diversity and evolutionary history should be carried out.
The genus Charybdis (De Haan, 1833) is an important group in family Portunidae. For a long time, the taxonomic status of this genus has remained controversial. Several studies have recommended to classify it into subfamily Portuninae of family Portunidae [14][15][16] . However, some other studies suggested to assign it to subfamily Thalamitinae of family Portunidae 17,18 . Molecular studies based on COI, 16S rRNA, and RAPD 19,20 supported the latter opinion. However, the phylogenetic analysis based on 13 protein-coding genes from the mitochondrial genome 21 supported the former opinion. Thus, in order to better solve this problem, more studies need to be carried out in the future.
Mitochondrial genome is a typically closed-circular molecule ranging approximately from 14 to 18 kb in size, and it consists of 13 protein-coding genes, 22 transfer RNA (tRNA) genes, 2 rRNA genes, and a control region. It is thought to be an ideal marker for studies on population genetic diversity, molecular phylogeny, and species identification because of its high mutation rate, simple structure, abundant distribution, and maternal inheritance [22][23][24] . Thus far, complete mitochondrial genomes have been reported in many crustacean species, such as Litopenaeus vannamei 25 , C. japonica 21 , Scylla serrata 26 , and S. paramamosain 23 . Although several mitochondrial gene sequences of C. feriata are present in the GenBank database, the complete mitochondrial genome information is still not available by now. The lack of complete mitochondrial genome has limited the development of population genetic diversity and molecular evolution for this species.
The purpose of this study is to report the complete mitochondrial genome for C. feriata, elucidate its evolutionary status, and estimate population genetic diversity and differentiation. This work should be helpful to better understand the evolutionary status and population genetic diversity of C. feriata and other related crustacean species.

Materials and methods
Sampling and DNA extraction. A total of 70 wild individuals of C. feriata were sampled from the southeastern coasts of China, with 21 from Hainan Island (named HN), 24 from the city of Xiamen (named XM), and 25 from the city of Zhoushan (named ZS) (Fig. 1). Animals were killed by a lethal dose of MS-222. Muscle tissues were collected and fixed in 99% ethanol at room temperature. Genomic DNA was extracted using the traditional proteinase K and phenol-chloroform extraction protocol as described by Ma 27 . Primers, PCR, and sequencing. First, partial sequences of three genes (12 S rRNA, 16 S rRNA, and COI) of C. feriata and the complete mitochondrial genomes of three closely related crab species (C. japonica, S. paramamosain, and Portunus trituberculatus) were downloaded from the GenBank database. Then the three genes were confirmed by resequencing. Next, the complete mitochondrial genome of C. feriata was generated by overlapping PCR with specific or degenerate primers (Supplementary Table 1), and sequencing. Furthermore, the complete COI gene sequence was employed to evaluate the population genetic diversity and genetic differentiation of C. feriata population. A pair of primers (COI-f: 5′ -AATAAGAAAGTTAATAACTTGTGTT-3′ and COI-r 5′ -GAAGAAAAGTATCTTCCTAGTAGG-3′ ) with an anealing temperature of 52 °C were successfully designed. Seventy individuals collected from three localities (HN, XM, and ZS) were evaluated in this study.
PCRs were carried out in a 25 μ L volume that included 0.4 μ M each primer, 0.2 mM each dNTP, 1 × PCR buffer, 1.5 mM MgCl 2 , 0.75 unit Taq polymerase, and approximately 100 ng template DNA at the following conditions: one cycle of denaturation at 94 °C for 4 min; 37 cycles of 30 s at 94 °C, 50 s at a primer-specific annealing temperature (Supplementary Table 1), and 50 s at 72 °C. As a final step, the products were extended for 7 min at 72 °C. The PCR products were separated on 1.5% agarose gels and directly sequenced in both directions by using an ABI Prism 3730 automated DNA sequencer (PE Corporation). The sequences were edited and assembled using two softwares, EditSeq and SeqMan (DNASTAR).
Complete mitochondrial genome analysis. The graphical map of the complete mitochondrial genome (Fig. 2) was drawn using the online software OrganellarGenomeDRAW (http://ogdraw. mpimp-golm.mpg.de/) 28 . The genome structure was determined by sequence comparisons with the known complete mitochondrial genomes of the closely related species, including S. paramamosain 23 and C. japonica 21 . tRNAs were identified by their proposed clover-leaf secondary structure and anticodons by using the web-based tRNA-scan SE 1.21 program (http://lowelab.ucsc.edu/tRNAscan-SE/) 29 with default search mode. Protein-coding genes were translated into amino acids by using the software MEGA 4.0 30 . The codon usage of protein-coding genes and the nucleotide composition of the mitochondrial genome were also determined using MEGA 4.0. Finally, the complete mitochondrial genome DNA sequence was deposited into the GenBank database by using the software Sequin 12.30 (http://www.ncbi.nlm.nih.gov/ Sequin/). Protein-coding genes were aligned using Clustal W in MEGA 4.0 with default settings. As a result, gene ND6 showed high heterogeneity that consistently causes poor phylogenetic performance 31 . Thus, the remaining 12 protein-coding genes alignments were concatenated to a single multiple sequence alignment. Then the multiple sequence alignment was formatted and analyzed using RAxML web-servers (http://embnet.vital-it.ch/raxml-bb/index.php) 32 . The CAT model was used to estimate the evolutionary rate of the 12 protein-coding genes. Maximum likelihood (ML) search was carried out after bootstraps. The phylogenetic tree was drawn by the software FigTree v1.4.2.
Population genetic analysis. Haplotypes were identified using software Dna SP version 5.0 33 and deposited into the GenBank database. For each locality and overall locality, haplotype diversity (h) and nucleotide diversity (π) were calculated using Dna SP version 5.0. Molecular variance (AMOVA) analysis was carried out using software Arlequin version 3.11 34 to explain the genetic structure and differentiation among these three localities. Significant level of the test was assessed using 1000 permutations of each pairwise comparison. Neutrality tests including Ewens-Watterson 35  with 10000 bootstrap replicates were also performed using Arlequin 3.11. The histogram of mismatch distribution was constructed using the software Network version 4.6.1.2 (http://www.fluxus-engineering. com/) 41 . The median-joining network of haplotypes was also constructed using software Network version 4.6.1.2. The rough time of population expansion was estimated using the following equation t = τ/2u 42 , where t is the time since population expansion, τ is the mutational time scale, which is calculated using software Arlequin version 3.11, and 2u is calculated using the equation 2u = μ × length of sequence × generation time, where μ is the mutation rate. Given the lack of a calibrated mutation rate of COI gene of C. feriata, the mutation rates of COI gene ranging from 1.66% to 2.33% per million years of Sesarma 43 were used. In addition, a generation time of one year was also used.

Results and Discussion
Genome organization. The mitochondrial genome of C. feriata was a typically circular molecule with 15,660 bp in size (GenBank accession no. KF386147) and consisted of 13 protein-coding genes, 22 tRNA genes, two rRNA genes, and a control region (Fig. 2) as found in most metazoan species such as Lutra lutra 44 , C. Japonica 21 , and S. paramamosain 23 . This genome was slightly smaller than those of most sequenced crab species under Decapoda but larger than that of P. gigas, whose size was 15,515 bp 21,23,26,45 . Such a small genome was mainly due to the short size of the control region (762 bp). Moreover, the lengths of other regions of the genome were approximately equal among these species. Heavy strand (H-strand)  (Table 1) of the 37 genes were completely identical to those of reported species of Portunids, such as C. japonica 21 and S. paramamosain 23 . However, the location of tRNA His (Table 1) was different from that of most arthropods. In most arthropods, tRNA His was between NAD4 and NAD5, whereas it was found between tRNA Glu and tRNA Phe in C. feriata. The phenomenon of gene rearrangements in mitochondrial genome was a relatively common event in crustacean species 25 . The tandem duplication of gene regions was thought to be the most acceptable mechnism of mitochondrial gene rearrangement, in this case, the slipped-strand mispairing took place first, and then followed by gene deletions 46 .
Ten gene overlaps and ten intergenic spacers were found in mitochondrial genome of C. feriata, most of them have been reported in many other species mitochondrial genomes 23,24,47 . The total length of overlaps and intergenic spacers were 24 and 105 bp, with ranges from 1 to 7 and from 2 to 27 bp, respectively. The overall A + T content of C. feriata mitochondrial genome was 70.15%, which was similar to those of C. japonica, S. olivacea, S. serrata, and P. trituberculatus (Table 2). However, different regions had different A + T contents. The control region had the highest A + T content (78.74%), whereas the protein-coding region had the lowest A + T content (68.60%).
Protein-coding genes. A total of 13 protein-coding genes were identified, of which 9 (COI, COII, COIII, APT6, ATP8, ND2, ND3, ND6, and Cyt b) were encoded by H-strand, and 4 (ND1, ND4, ND4L, and ND5) were encoded by L-strand. These genes consisted of 11,182 bp in length and coded 3716 amino acids in total. All 13 genes were initiated by the start codon ATN (ATG, ATA, and ATT), with an exception (GTG) in ATP8 ( Table 1). The typical stop codon (TAA or TAG) was detected in nine protein-coding genes (ATP6, ATP8, ND1, ND2, ND3, ND4, ND4L, ND5, and ND6), whereas the remaining four genes (COI, COII, COIII, and Cyt b) were ended by incomplete stop codons (T--or TA-). Variable start codons and incomplete stop codons have been reported in many other mitochondrial genomes. For example, four types of start codons (ATT, ATG, ATA, and ACG) were detected in mitochondrial genome of Myrmeleon immanis 48 . Two and one incomplete stop codons were found in Lutjanus russellii and S. paramamosain mitochondrial genomes 23,49 . For the incomplete stop codon, the missed nucleotides may be produced by post-transcriptional polyadenylation 50 . In addition, ND6 had the highest A + T content (72.78%), whereas COIII had the lowest A + T content (64.85%) ( Table 3).
Transfer and ribosomal RNA genes. A total of 22 tRNA genes ranging from 64 to 74 bp in length were identified from the mitochondrial genome of C. feriata. All of them were capable of folding into a typically clover-leaf secondary structure (Fig. 3). In the closely related crabs C. japonica and S. paramamosain, tRNA Ser (AGN) could not form a secondary structure because it lacked the dihydrouracil (DHU) arms 21,23 . Fourteen tRNA genes were located on H-strand, whereas the remaining eight were located on L-strand. All tRNA genes had a common length of 7 bp for the aminoacyl stem and an invariable size of 7 bp for the anticodon loop. Variable nucleotide lengths of tRNAs were found at the DHU, TΨC, and anticodon arms. All these 22 tRNA genes possessed the common anticodons of Decapods mitochondrial genomes, except that tRNA Lys and tRNA Ser (AGN) possessed TTT and TCT anticodons rather than CTT and GCT, respectively. Seven unmatched base pairs were found in 22 tRNA genes, which was lower than the number detected from tRNA genes of S. paramamosain 23 . The overall A + T content of 22 tRNA was 71.76%, with the highest content (84.62%) in tRNA Asp and the lowest content (57.35%) in tRNA Lys . Both 16S and 12S rRNA genes were found on the L-strand of the mitochondrial genome. They were located between tRNA Leu (CUN) and the putative control region and were separated by tRNA Val . The sizes of 16S rRNA and 12S rRNA genes were 1321 and 843 bp, and the A + T contents were 74.26% and 71.89%, respectively.

Length (bp)
A + T (%)   Non-coding regions. A total of 11 non-coding regions were identified in the mitochondrial genome of C. feriata. The major non-coding region (762 bp in length) was found between 12S rRNA and tRNA Ile , which was considered to be the putative control region. The other 10 non-coding regions were small, ranging from 2 to 27 bp in length. The A + T content of control region was higher (78.74%) than that of other regions in mitochondrial genome. The high rate of A + T content was due to the existence of A/T repeated motifs. In control region, TA, AT, TAA, AAT, and TTA were found to be the most abundant motifs. Additionally, microsatellite sequences were detected, such as (TA) 3 , (AT) 3 , (TA) 4 , (AT) 4 , and (TA) 11 . Microsatellites were also identified from control region of the mitochondrial genome in M. immanis and Nymphes myrmeleonoides 48 . In addition, the nucleotide composition of control region in this study was 41.99% for A, 7.87% for G, 36.75% for T, and 13.39% for C (Table 3).

Length (bp) A + T (%) Reference
Phylogenetic relationship. The taxonomic status of genus Charybdis within Portunidae has been a highly contentious issue for a long time. In this study, we estimated the evolutionary relationship of C. feriata within Decapoda by reconstructing a phylogenetic tree. This tree was created based on 12 concatenated protein-coding genes from the mitochondrial genome of 21 species. From the tree topologies (Fig. 4), we found that C. feriata and C. japonica first formed a monophyletic group and showed the closest relationship to each other. Together with C. sapidus and P. trituberculatus, they then formed another monophyletic group. Furthermore, these four species showed a sister relationship with another four species of genus Scylla. Thus, our results supported the opinion of classifying genus Charybdis into subfamily Portuninae of family Portunidae. The same suggestion was also proposed by Liu and Cui 21 .
Population genetic diversity and differentiation. The complete COI gene sequence (1534 bp) was employed to estimate the genetic diversity and differentiation of C. feriata population. A total of 33 haplotypes (Table 4) were identified from 70 individuals, of which 14 were from HN locality, 11 from XM locality, and 16 from ZS locality. H9 was the most abundant haplotype, which was present in each locality. A high level of genetic diversity (Table 5) was found with h ranging from 0.819 to 0.867 and π ranging from 0.0011 to 0.0013 per locality. The h value was slightly higher than that (0.787) reported by Huang 12 . In our previous study, moderate genetic variation of C. feriata was detected by microsatellite markers 13 . Moreover, a high genetic diversity was also observed in other marine animals, such as S. serrata 51 and Salmo salar 52 . The following factors, including life history charateristics, environmental heterogeneity, and large population sizes, may help in maintaining a high level of genetic diversity among marine animals 53,54 . AMOVA analysis indicated that 99.77% of the total genetic variation was contributed by within-localities variation, whereas only 0.23% was caused by among-localities variation (F ST = 0.0023, P > 0.05) ( Table 6). In addition, no significant genetic differentiation was found among three localities ( Table 7). The above analysis showed that the genetic differentiation among three localities of C. feriata was at a low level, suggesting a single population in East China Sea and South China Sea. The summer southwest monsoonal wind and winter northeast monsoonal wind drive seasonal ocean current   (Table 8). Mismatch distributions analysis (Table 9) indicated that the estimated effective population size after population growth was significantly larger than that before population growth. In addition, a star-like topology was produced based on 33 haplotypes (Fig. 5). Of these haplotypes, H9 was located in the center of this topology, and it was closely linked with the majority of other haplotypes, thereby suggesting that it is the ancestral haplotype in C. feriata. All above analysis indicated that C. feriata might have undergone a population expansion event. Meanwhile, the high h (between 0.819 and 0.867) and low π (between 0.0011 and 0.0013) also suggested that C. feriata underwent population    Table 9. Mismatch distributions analysis of Charybdis feriata. τ, units of mutational time; θ 0 , θ before population growth; θ 1 , θ after population growth; SSD, sum of the square deviations between the observed and expected mismatch; P SSD , the probability of SSD; Rag, raggedness index; P Rag , the probability of raggedness; HN, Hainan locality; XM, Xiamen locality; ZS, Zhoushan locality.
expansion after a period of low effective population size. Sudden population expansion can affect population genetic diversity and haplotypes, and in this process more haplotypes were generated by mutation than were removed by genetic drift 57 . We further deduced that the population expansion event of C. feriata might have occurred between 61,498 and 43,814 years ago. This period of population expansion is a little bit later than the Last Interglacial complex (140-75 kya) 58 . This findings showed that the Last Interglacial complex might have played an important role in demographic history of C. feriata. Further, a big changes of nutrient concentrations and sea water temperature could affect the population distribution of marine orgnisms too 59 .

Conclusion
This study first described the complete mitochondrial genome of C. feriata, which was 15,660 bp in length, including a typical set of 37 genes and a control region. Phylogenomic analysis results supported that genus Charybdis should be classified into subfamily Portuninae rather than into subfamily Thalamitinae. Furthermore, a high level of genetic diversity and a low level of differentiation of C. feriata were found, and a population expansion event was deduced to have occurred between 61,498 and 43,814 years ago. This study should be helpful for studies on evolution and phylogeny, population genetic structure, and conservaton genetics for C. feriata and related species [60][61][62] .