Mitochondrial genome of Chinese grass shrimp, Palaemonetes sinensis and comparison with other Palaemoninae species

The mitogenome of Chinese grass shrimp, Palaemonetes sinensis, was determined through Illumina sequencing, and the basic characteristics and gene arrangement were analyzed. The mitogenome of P. sinensis was 15955 bp in length, consisting of 13 protein-coding genes (PCGs), 22 tRNA genes, 2 rRNA genes and one control region, with tightly packed. 33 of these genes were encoded on the heavy strand, and the remainders encoded on the light strand. The composition of P. sinensis mitogenome presented a strong A + T bias, which account for 66.7%. All PCGs were initiated by a canonical ATN codon, except nad5, which was initiated by GTG. The termination codons of the PCGs were TAA, TAG and T–. The secondary structures of 22 tRNAs of P. sinensis had the typical clover structure, except of trnS1 owing to the lack of dihydroxyuridine (DHU) arm. Gene order comparison of P. sinensis and previously-sequenced Palaemoninae revealed a unique translocation between trnT and trnP in Macrobrachium. The phylogenetic analyses showed that three Exopalaemon species formed a monophyletic group and then clustered with two Palaemon species and P. sinensis successively whereas Macrobrachium clustered with Palaemon capensis in the other clade.

. Graphical map of the mitogenome of P. sinensis. PCGs and ribosomal RNA genes are shown using standard abbreviations. Genes for transfer RNAs are abbreviated using a single letter. S1 = AGN, S2 = UCN, L1 = CUN, L2 = UUR. CR = control region. PCGs are green, tRNAs are yellow, rRNAs are blue, and CR is grey. Outside line and inside line indicate heavy strand and light strand, respectively. Bold line represents transcribed strand.
Protein-coding genes. The PCG region formed 69.98% of the P. sinensis mitogenome, and was 11,166 bp in length totally. Among the 13 PCGs, nine genes (cox1, cox2, atp8, atp6, cox3, nad3, nad6, cyt b and nad2) were coded on H strand, while the rest four genes (nad5, nad4, nad4l and nad1) were on L strand. The 13 PCGs ranged in size from 159 to 1725bp (Table 2). Each PCG was initiated by a canonical ATN codon, except nad5 which was initiated by a GTG codon. The termination codons of the PCGs were TAA, TAG and T. Eight of 13 PCGs, cox1, cox2, atp6, cox3, nad3, nad5, nad4l and nad6 used a typical TAA termination codon, as well as atp8, nad4, nad1 and nad2 terminated with TAG, but cyt b had an incomplete termination codon, a single T ( Table 1).
The number of bases in the 13 PCGs was A > T > C > G, and the A + T content of 13 PCGs was 63.7%, showed a strong A + T bias (Table 3), as well as a strong A and C bias, with the AT-skew GC-skew was 0.093 and −0.289, respectively. The slightly positive value of AT-skew for P. sinensis indicated a higher occurrence of A compared to T nucleotides, whereas that of the other mitogenomes were all negative. In addition, GC-skew value for P. sinensis was the biggest negative comparing to that of other mitogenomes (−0.012 to −0.080). With the exception of P. www.nature.com/scientificreports www.nature.com/scientificreports/ sinensis, species of Macrobrachium showed slight smaller negative AT-skew (−0.138 to −0.150) and bigger negative GC-skew (−0.052 to −0.080) ( Table 3).
The average frequency of the protein-coding genes codon and was calculated and shown in Table 4. The preference codon (most frequently used to encode same amino acid) was shown in bold font and their RSCU of them were all greater than 1. RSCU was an important index to reflect the preference degree of codon usage intuitively 27 . The results of this study showed that the codons of all protein-coding genes had strong preference, and most RSCU of NNU and NNA (i.e. the codon with the third site U or A) were greater than 1, with higher frequency of usage. And this result was consistent with the result of E. carinicauda 10 . transfer RnAs, ribosomal RnAs, and cR region. Same to most Palaemoninae, P. sinensis mitogenome contained a set of 22 tRNAs genes ( Figure 1). The tRNAs sequences ranged between 63 and 69 bp and exhibited a strong A + T bias (64.8%). Furthermore, they showed a positive AT skew (0.062) ( Table 3). Fourteen tRNA genes were present on the H strand and eight were on the L strand. The secondary cloverleaf structure of 15 tRNAs was examined using tRNAscan-SE 28 , while the secondary cloverleaf structure of all the 22 tRNAs could examined using MITOS 29 . As a result, all the tRNA genes had the typical cloverleaf structure, except the trnS1 gene, whose dihydroxyuridine (DHU) arm was replaced by a simple loop (Figure 2), which is a common feature in most Palaemonidae mitogenomes 10 .
The rrnL and rrnS genes were located between trnL1 and trnV and between trnV and CR, respectively. The rrnL was 1298 bp, while rrnS was 790 bp in length. The CR was 1159 bp, and situated between rrnS and trnI. This region contains 84.8% A + T content, and had a positive AT skew (0.101) and negative GC skew (−0.145) ( Table 3).
Compared with other Palaemoninae 10,23-26 , CR of P. sinensis mitogenome had different size. That was a common phenomenon, because it was generally believed that the length of control region has the largest variation of mitogenome 16 . P. sinensis had the highest composition of A nucleotides, and the lowest composition of G nucleotides, as well as the highest AT-skew value (Table 3).
Occurrence of mitochondrial gene order rearrangement was common in Malacostraca 32-34 . Shen et al. identified nine different rearrangements in the comparison of 23 Pancrustacea mitogenome archived in GenBank, and found the same translocation between E. carinicauda and M. rosenbergii, which was identical to this study 10 . Wang et al. inferred that this invasion between trnP and trnT might be the unique mitochondrial character of genus of Exopalaemon 23 . However, from the results of present study, because the other three genera were all consistent with the same gene order pattern, Macrobrachium was supposed to be the unique genus due to its rearrangement ( Figure 3). phylogenetic analysis. Although Palaemonidae was the second most species-rich shrimp family including 118 genera and 981 species 1 , there were only ten complete mitogenome (excluding P. sinensis) archived in GenBank so far. Phylogenetic analyses were based on the concatenated PCGs derived from 11 Palaemoninae mitogenomes belonging to four genera (Palaemonetes, Palaemon, Exopalaemon and Macrobrachium) ( Table 2). As a result, same phylogenetic tree with high nodal support values for each cluster was established by both ML and BI analyses ( Figure 4). Apart from the out-group, four species of Macrobrachium clustered with P. capensis   www.nature.com/scientificreports www.nature.com/scientificreports/ in one main clade. In the other main clade, three Exopalaemon species formed a monophyletic group and then clustered with the other two Palaemon species and P. sinensis successively.
The phylogenetic relationship within subfamily Palaemoninae Rafinesque, 1815, has been always debatable in their morphological cladistics study and molecular phylogeny. Pereira demonstrated the paraphyly in Palaemon, Palaemonetes, and Macrobrachium according to the analysis a matrix of 81 morphological characters in 172 species 3 . Ashelby et al. strongly supported that Palaemonetes, Exopalaemon, Coutierella, and certain Palaemon belonged to single monophyletic clade based on the analyses of mitochondrial 16S rDNA and nuclear Histone (H3) genes in Palaemoninae 6 . Therefore, Ashelby et al. suggested a further re-appraisal of morphological characters combined with further genetic work at generic-level were needed to establish a reliable classification in Palaemoninae 6 .
In this study, apparent heterogeneity of Macrobrachium was proved by both topology and mitogenome gene order rearrange. This result supported previous study by Kim et al. 24 and Shen et al. 10 . And also genus Exopalaemon present monophyly with high support values. Interestingly, the only species which does not distribute in Asia-Australia in this study, P. capensis merged into Macrobrachium clade with comparatively low support value. Adult P. capensis inhabit in freshwater after a more saline planktonic larval phase 35 . Its life cycle reflects the evolutionary history of freshwater palaemonid shrimp. And in this study, P. sinensis, M. bullatum, P. capensis characterized by abbreviated larval development, which has been considered as a primitive trait took place early in the origin of the family Palaemonidae 36 . The Palaemon and Palaemonetes clade confirmed their morphological similarity demonstrated by merging of species of both genera 37 , while, Cuesta et al. 5 38 suggested that Palaemon and Palaemonetes were more similar, and both different from Macrobrachium according to the analysis of mitochondrial 16S rDNA. However, taking into account the tiny proportion of archived mitogenome (11 species from 3 genera in 372 species from 26 genera), more mitogenome from more complete taxon are indispensable to reveal the phylogenetic relationship within Palaemoninae.

and Botello & Alvarez
The first complete mitogenome of genus of Palaemonates, P. sinensis was determined in this study. This result can help us to understand the basic features and gene arrange of this species. As for PCGs, in the comparison with other known mitogenomes of Palaemoninae, P. sinensis characterized by highest composition of A and C nucleotides, as well as the lowest composition of T and G nucleotides. Additionally, P. sinensis has slight positive AT-skew value and the biggest negative GC-skew value, whereas the other species all have negative AT-skew values. As for control region, P. sinensis featured by highest composition of A nucleotides, and the lowest composition of G nucleotides, as well as the highest AT-skew value. Gene order comparison of P. sinensis and previously-sequenced Palaemoninae revealed a conservative order among genera of Palaemonetes, Palaemon and Exopalaemon, and a unique translocation between trnT and trnP in Macrobrachium. The phylogenetic analysis using Bayesian Inference (BI) and Maximum Likelihood (ML) based on concatenated set of nucleotide sequences of 13 PCGs indicated that Exopalaemon formed a monophyletic group and then clustered with two Palaemon species and P. sinensis successively whereas Macrobrachium formed a monophyletic group and then clustered with P. capensis in the other clade.

Materials and Methods
Sample collection and DNA extraction. The P. sinensis were collected from Shenyang Longwei Lake, Liaoning, China (41°50′33.7″N; 123°35′22.3″E). The whole body of one individual shrimp was immediately preserved in liquid nitrogen until DNA extraction. Total genomic DNA was extracted using the TIANamp Marine Animals DNA Kit (TIANGEN, Beijing, China), and the quality of extracted DNA was assessed by electrophoresis on a 1% agarose gel and Thermo Scientific NanoDrop 2000.  www.nature.com/scientificreports www.nature.com/scientificreports/ Genome assembly and annotation. After random break by Covairs ultrasonic breaker, DNA was fragmented for constructed genomic DNA library using Whole Genome Shotgun (WGS) strategy, which was sequenced by Illumina Miseq instrument based on NGS technology. Colinear analysis for mitochondrial splicing sequences obtained by A5-miseq v20150522 39 , SPAdesv3.9.0 40 and BLAST v2.2.31 (https://blast.ncbi.nlm.nih. gov/Blast.cgi), were performed by using software mummer v3.1 41 to determine the position relation of contig sequences. The complete mitogenome sequence was revised and confirmed by pilon v1.18 42 . All these procedures were performed by Shanghai Personal Biotechnology Co., Ltd., China.
The locations of putative protein-coding genes and rRNA genes were preliminarily predicted by software DOGMA 43 and MITOS 29 , and the precise location was identified by the mitogenome of the related species based www.nature.com/scientificreports www.nature.com/scientificreports/ on Palaemoninae sequences archived in GenBank. Identification of initiation and termination codons were carried out by using an alignment generated through ClustalX version 2.0 44 , with other related species sequences as references, and verified by utilizing ORF finder and Blastn of NCBI. The location and secondary structure of tRNA genes were predicted and annotated using MITOS 29 and tRNAscan-SE with default settings 28 . Nucleotide composition and the relative synonymous codon usage (RSCU) were determined using MEGA 7 45 .
To  46 . Online mitochondrial visualization tool mtviz was utilized to drawn the graphical diagram of the complete mitogenome (http://pacosy.informatik.uni-leipzig.de/mtviz/mtviz). In the end, the complete mitochondrial DNA sequence was uploaded to GenBank database under the accession number MH880828. phylogenetic analysis. Eleven others complete mitogenome sequences of subfamily Palaemoninae (ten species) were obtained from GenBank (https://www.ncbi.nlm.nih.gov/genbank/) for phylogenetic analysis within Palaemoninae. GenBank sequence information of eleven species was shown in Table 2. In addition, the mitogenome of Panulirus stimpsoni (GQ292768.1) and Panulirus ornatus (GQ223286.1) were employed as an out-group taxon from GenBank. Nucleotide sequences from 13 mitogenome PCGs were aligned using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/). Moreover, Gblocks was utilized to remove poorly aligned region and divergent site 47 .
The optimal nucleotide substitution models were given by jModelTest (v2.0) 48,49 through online server Phylemon 2 (http://phylemon.bioinfo.cipf.es/evolutionary.html) and MEGA 7 45 based on Akaike Information Criterion (AIC) value for Maximum Likelihood method (ML) and Bayesian Information Criterion (BIC) value for Bayesian inference (BI). Consequently, GTR + I + G was selected as the best-fit evolutionary model for ML analysis by both MEGA 7 45 and jModelTest 48,49 , whilst GTR + I + G and Tpm3uf + I + G were considered as the best model for BI analyses given by MEGA 7 45 and jModelTest 48,49 , respectively. Because Tpm3uf model was not implemented in Mrbayes v3.2.1 50 , it was replaced by the closest over-parameterized model (GTR) 51,52 . As a result, GTR + I + G model was selected for further phylogenetic analysis.
Afterwards, ML analysis was performed on 1000 bootstrapped datasets by MEGA 7 45 . The BI analysis was carried out as 4 simultaneous Markov chain Monte Carlo (MCMC) for 100,000 generations, sampled every 100 generations by using Mrbayes v3.2.1 50 , the average standard deviation of split frequencies was less than 0.01. Both topology tree and the Bayesian posterior probilities (PP) was derived after the first 250 "burn-in" trees were excluded.

Data availability
The data set supporting the results of this article is available at NCBI (GenBank No. MH880828).