Introduction

Palaemonidae, as the second most species-rich family in Caridean, including 134 genera and 934 extant species1, is widely distributed in almost any aquatic habitat. Palaemonidae are divided into two subfamilies: Pontoniinae Kingsley, 1879 (108 genera, 562 species) and Palaemoninae Rafinesque, 1815 (26 genera, 372 extant species). Despite of the numerical dominance of Pontoniinae, researchers have done more works on Palaemoninae due to its wide distribution, economic value and ecological importance. Nevertheless, the phylogenetic relationship within this subfamily is still disputed because the current classification system failed to describe their underlying evolutionary relationship2,3,4. For example, Pereira pointed out the paraphyly on generic level based on his cladistic analysis of morphological characteristics3. Murphy & Austin found species belonging to three different genera, Macrobrachium intermedium, Palaemon serenus, and Palaemonetes australis formed a monophyletic assemblage instead of with their congeneric species4. The topology given by Cuesta et al. showed that species from Palaemon and Palaemonetes clustered according to global geographical distribution and by genera with the exception of the Australian Palaemonid shrimps, which demonstrated the dichotomy between Palaemon and Palaemonetes genera (absence/presence of the mandibular palp) was phylogenetically questionable5. Ashelby et al. suggested the reevaluation of morphological traits to separate the genus of Palaemon, Palaemonetes, Exopalaemon and Coutierella because some species from those genera present monophyly6. However, most molecular studies on Palaemoninae are based on the analysis of partial sequences of 16S rRNA and fragment of the nuclear genes histone3 (H3)6,7. Certainly, analysis of other gene sequences (both from mitogenome and nuclear genome) are necessary to improve the understanding of phylogenetic relationship amongst Palaemonid shrimps.

Animal mitochondrial DNAs are typically circular molecules, approximately between 14 and 18 kb in length, normally containing 13 protein-coding genes (PCGs), two ribosomal RNA genes (rrnL & rrnS), 22 transfer RNA (tRNA) genes, and one control region (CR)8,9. It has been widely accepted that mitogenome has rapid evolutionary rate and lack of genetic recombination8.

It had becoming increasingly popular to employ entire mitogenomes for phylogenetic relationship analyses10,11,12, which was due to the following reasons. Firstly, complete mitogenomes often reveal more genetic information because single genes or partial DNA sequences are often too short to provide adequate phylogenetic information13. Secondly, combination of mitochondrial and nuclear genomes makes model selection diffcult14, and the addition of rRNA makes alignment ambiguous15, Thirdly, some genome-level characters which are significantly important for phylogeny, such as gene order rearrangement, must be detected by comparison of entire mitogenomes16,17,18. Lastly, NGS make the complete mitogenome acquirement economically and not as time consuming as before19. So far ten mitogenome of Subfamily Palaemoninae which belonged to three genera (Palaemon, Exopalaemon and Macrobrachium) were determined, whereas none of Palaemonetes has been reported.

Chinese grass shrimp, Palaemonetes sinensis (Sollaud, 1911), is one of the important species of Palaemoninae, and widely distributed in China, Myanmar, Vietnam, Japan, southeastern Siberia and Sakhalin, with crucial ecological value and a certain degree of ornamental and economical value20,21,22. Except mitochondrial 16S rDNA and nuclear Histone (H3) gene sequences5,6, there was no report about mitogenome of P. sinensis. In this study, the complete mitogenome of P. sinensis was obtained through NGS. As the first mitogenome of Palaemonetes, it would contribute to a better understanding of phylogenetic relationship within Palaemoninae, particularly among Palaemonetes and other genera mentioned above. Additionally, it is of great importance for future study of genetic biodiversity of P. sinensis.

Results and Discussion

Genome composition

Approximate 5.2 million clean reads were obtained from raw sequences data and were clustered into 4,169,064 high quality reads. After further assembly, a circular mitogenome of 15,955 bp in length was finally generated (Figure 1). Compared with the other Palaemoninae species, P. sinensis mitogenome was slightly smaller than that of Palaemon serenus (15,967 bp), but it is in the range of the known Palaemoninae mitogenomes (15,694–15,967 bp). Nucleotide BLAST (blastn) of the entire mitogenome between P. sinensis and closely related species presented high similarity (77% with Exopalaemon annandalei, 76% with Palaemon gravieri and Exopalaemon modestus).

Figure 1
figure 1

Graphical map of the mitogenome of P. sinensis. PCGs and ribosomal RNA genes are shown using standard abbreviations. Genes for transfer RNAs are abbreviated using a single letter. S1 = AGN, S2 = UCN, L1 = CUN, L2 = UUR. CR = control region. PCGs are green, tRNAs are yellow, rRNAs are blue, and CR is grey. Outside line and inside line indicate heavy strand and light strand, respectively. Bold line represents transcribed strand.

The gene content of P. sinensis mitogenome was same as that of all known Palaemoninae, including 13 PCGs, 2 rRNA genes, and 22 tRNA genes plus a putative control region (Table 1 and Figure 1). Quite similar with Exopalaemon10,23 and Palaemon24,25,26, 23 of the 37 genes were coded on the H strand whereas the remaining 14 genes were transcribed on the L strand. Like most of Caridea, the mitogenomes of P. sinensis in this study were closely aligned, with only a small number of base overlapping between adjacent genes, indicating that RNA transcription and protein translation were more efficient (Table 1).

Table 1 Annotation of P. sinensis mitogenome.

The genome composition (A: 36.2%, G: 12.1%, T: 30.5%, C: 21.3%) presented a strong A + T bias, which account for 66.7% of the bases, and showed a AT skew ([A − T]/[A + T] = 0.085) and negative GC skew ([G − C]/[G + C] = −0.275). The AT skew was similar with E. annandalei (0.086) and higher than that of Palaemon and Exopalaemon (−0.049 in E. modestus to 0.057 in Exopalaemon carinicauda), but lower than that of Macrobrachium (0.100 in M. lanchesteri to 0.157 in M. bullatum) (Table 2). The GC skew of P. sinensis was similar with most of other previously sequenced Palaemoninae mitogenomes (Table 2). However, different regions of mitogenome had different A + T contents. The CR had the highest A + T content (84.8%), whereas the PCG region had the lowest A + T content (63.7%) (Table 3).

Table 2 Genomic characteristics of Palaemoninae mitogenome acquired from GenBank.
Table 3 Composition and skewness in PCGs, tRNAs, rRNAs, and CR Region of different Palaemoninae mitogenomes.

Protein-coding genes

The PCG region formed 69.98% of the P. sinensis mitogenome, and was 11,166 bp in length totally. Among the 13 PCGs, nine genes (cox1, cox2, atp8, atp6, cox3, nad3, nad6, cyt b and nad2) were coded on H strand, while the rest four genes (nad5, nad4, nad4l and nad1) were on L strand. The 13 PCGs ranged in size from 159 to 1725bp (Table 2). Each PCG was initiated by a canonical ATN codon, except nad5 which was initiated by a GTG codon. The termination codons of the PCGs were TAA, TAG and T. Eight of 13 PCGs, cox1, cox2, atp6, cox3, nad3, nad5, nad4l and nad6 used a typical TAA termination codon, as well as atp8, nad4, nad1 and nad2 terminated with TAG, but cyt b had an incomplete termination codon, a single T (Table 1).

The number of bases in the 13 PCGs was A > T > C > G, and the A + T content of 13 PCGs was 63.7%, showed a strong A + T bias (Table 3), as well as a strong A and C bias, with the AT-skew GC-skew was 0.093 and −0.289, respectively. The slightly positive value of AT-skew for P. sinensis indicated a higher occurrence of A compared to T nucleotides, whereas that of the other mitogenomes were all negative. In addition, GC-skew value for P. sinensis was the biggest negative comparing to that of other mitogenomes (−0.012 to −0.080). With the exception of P. sinensis, species of Macrobrachium showed slight smaller negative AT-skew (−0.138 to −0.150) and bigger negative GC-skew (−0.052 to −0.080) (Table 3).

The average frequency of the protein-coding genes codon and was calculated and shown in Table 4. The preference codon (most frequently used to encode same amino acid) was shown in bold font and their RSCU of them were all greater than 1. RSCU was an important index to reflect the preference degree of codon usage intuitively27. The results of this study showed that the codons of all protein-coding genes had strong preference, and most RSCU of NNU and NNA (i.e. the codon with the third site U or A) were greater than 1, with higher frequency of usage. And this result was consistent with the result of E. carinicauda10.

Table 4 The codon number and relative synonymous codon usage in P. sinensis mitochondrial protein coding genes.

Transfer RNAs, ribosomal RNAs, and CR region

Same to most Palaemoninae, P. sinensis mitogenome contained a set of 22 tRNAs genes (Figure 1). The tRNAs sequences ranged between 63 and 69 bp and exhibited a strong A + T bias (64.8%). Furthermore, they showed a positive AT skew (0.062) (Table 3). Fourteen tRNA genes were present on the H strand and eight were on the L strand. The secondary cloverleaf structure of 15 tRNAs was examined using tRNAscan-SE28, while the secondary cloverleaf structure of all the 22 tRNAs could examined using MITOS29. As a result, all the tRNA genes had the typical cloverleaf structure, except the trnS1 gene, whose dihydroxyuridine (DHU) arm was replaced by a simple loop (Figure 2), which is a common feature in most Palaemonidae mitogenomes10.

Figure 2
figure 2

Predicted secondary structures of the 22 tRNA genes of the P. sinensis mitogenome.

The rrnL and rrnS genes were located between trnL1 and trnV and between trnV and CR, respectively. The rrnL was 1298 bp, while rrnS was 790 bp in length. The CR was 1159 bp, and situated between rrnS and trnI. This region contains 84.8% A + T content, and had a positive AT skew (0.101) and negative GC skew (−0.145) (Table 3).

Compared with other Palaemoninae10,23,24,25,26, CR of P. sinensis mitogenome had different size. That was a common phenomenon, because it was generally believed that the length of control region has the largest variation of mitogenome16. P. sinensis had the highest composition of A nucleotides, and the lowest composition of G nucleotides, as well as the highest AT-skew value (Table 3).

Gene arrangement

Among all known Palaemoninae sequences, gene order and orientation of the complete mitogenome of P. sinensis were identical to some previously-sequenced Palaemoninae, including three species of Palaemon (P. serenus, P. gravieri and P. capensis) and three species of Exopalaemon (E. annandalei, E. modestus and E. carinicauda) with the gene order was 5′-nad4L- trnP- trnT -nad6-3′)10,23,24,25,26. However, a rearrangement of translocation between trnP and trnT (gene order: 5′-nad4L-trnT-trnP-nad6-3′) was identified in all four Macrobrachium (Macrobrachium bullatum, Macrobrachium lanchesteri, Macrobrachium nipponense and Macrobrachium rosenbergii), which was similar with the out group (Panulirus stimpsoni30 & Panulirus ornatus31) in this study.

Figure 3
figure 3

Two gene order arrangement patterns in subfamily Palaemoninae. Genes are not drawn to scale, and they are transcribed from left to right except for those indicated by underlining.

Occurrence of mitochondrial gene order rearrangement was common in Malacostraca32,33,34. Shen et al. identified nine different rearrangements in the comparison of 23 Pancrustacea mitogenome archived in GenBank, and found the same translocation between E. carinicauda and M. rosenbergii, which was identical to this study10. Wang et al. inferred that this invasion between trnP and trnT might be the unique mitochondrial character of genus of Exopalaemon23. However, from the results of present study, because the other three genera were all consistent with the same gene order pattern, Macrobrachium was supposed to be the unique genus due to its rearrangement (Figure 3).

Phylogenetic analysis

Although Palaemonidae was the second most species-rich shrimp family including 118 genera and 981 species1, there were only ten complete mitogenome (excluding P. sinensis) archived in GenBank so far. Phylogenetic analyses were based on the concatenated PCGs derived from 11 Palaemoninae mitogenomes belonging to four genera (Palaemonetes, Palaemon, Exopalaemon and Macrobrachium) (Table 2). As a result, same phylogenetic tree with high nodal support values for each cluster was established by both ML and BI analyses (Figure 4). Apart from the out-group, four species of Macrobrachium clustered with P. capensis in one main clade. In the other main clade, three Exopalaemon species formed a monophyletic group and then clustered with the other two Palaemon species and P. sinensis successively.

Figure 4
figure 4

Topology derived from BI and ML of 13 concatenated mitochondrial PCGs from 14 mitogenome. Numbers beside the nodes indicate bootstrap probability of Bayesian posterior probabilities (BPP)/ML bootstrap support.

The phylogenetic relationship within subfamily Palaemoninae Rafinesque, 1815, has been always debatable in their morphological cladistics study and molecular phylogeny. Pereira demonstrated the paraphyly in Palaemon, Palaemonetes, and Macrobrachium according to the analysis a matrix of 81 morphological characters in 172 species3. Ashelby et al. strongly supported that Palaemonetes, Exopalaemon, Coutierella, and certain Palaemon belonged to single monophyletic clade based on the analyses of mitochondrial 16S rDNA and nuclear Histone (H3) genes in Palaemoninae6. Therefore, Ashelby et al. suggested a further re-appraisal of morphological characters combined with further genetic work at generic-level were needed to establish a reliable classification in Palaemoninae6.

In this study, apparent heterogeneity of Macrobrachium was proved by both topology and mitogenome gene order rearrange. This result supported previous study by Kim et al.24 and Shen et al.10. And also genus Exopalaemon present monophyly with high support values. Interestingly, the only species which does not distribute in Asia-Australia in this study, P. capensis merged into Macrobrachium clade with comparatively low support value. Adult P. capensis inhabit in freshwater after a more saline planktonic larval phase35. Its life cycle reflects the evolutionary history of freshwater palaemonid shrimp. And in this study, P. sinensis, M. bullatum, P. capensis characterized by abbreviated larval development, which has been considered as a primitive trait took place early in the origin of the family Palaemonidae36. The Palaemon and Palaemonetes clade confirmed their morphological similarity demonstrated by merging of species of both genera37, while, Cuesta et al.5 and Botello & Alvarez38 suggested that Palaemon and Palaemonetes were more similar, and both different from Macrobrachium according to the analysis of mitochondrial 16S rDNA. However, taking into account the tiny proportion of archived mitogenome (11 species from 3 genera in 372 species from 26 genera), more mitogenome from more complete taxon are indispensable to reveal the phylogenetic relationship within Palaemoninae.

The first complete mitogenome of genus of Palaemonates, P. sinensis was determined in this study. This result can help us to understand the basic features and gene arrange of this species. As for PCGs, in the comparison with other known mitogenomes of Palaemoninae, P. sinensis characterized by highest composition of A and C nucleotides, as well as the lowest composition of T and G nucleotides. Additionally, P. sinensis has slight positive AT-skew value and the biggest negative GC-skew value, whereas the other species all have negative AT-skew values. As for control region, P. sinensis featured by highest composition of A nucleotides, and the lowest composition of G nucleotides, as well as the highest AT-skew value. Gene order comparison of P. sinensis and previously-sequenced Palaemoninae revealed a conservative order among genera of Palaemonetes, Palaemon and Exopalaemon, and a unique translocation between trnT and trnP in Macrobrachium. The phylogenetic analysis using Bayesian Inference (BI) and Maximum Likelihood (ML) based on concatenated set of nucleotide sequences of 13 PCGs indicated that Exopalaemon formed a monophyletic group and then clustered with two Palaemon species and P. sinensis successively whereas Macrobrachium formed a monophyletic group and then clustered with P. capensis in the other clade.

Materials and Methods

Sample collection and DNA extraction

The P. sinensis were collected from Shenyang Longwei Lake, Liaoning, China (41°50′33.7″N; 123°35′22.3″E). The whole body of one individual shrimp was immediately preserved in liquid nitrogen until DNA extraction. Total genomic DNA was extracted using the TIANamp Marine Animals DNA Kit (TIANGEN, Beijing, China), and the quality of extracted DNA was assessed by electrophoresis on a 1% agarose gel and Thermo Scientific NanoDrop 2000.

Genome assembly and annotation

After random break by Covairs ultrasonic breaker, DNA was fragmented for constructed genomic DNA library using Whole Genome Shotgun (WGS) strategy, which was sequenced by Illumina Miseq instrument based on NGS technology. Colinear analysis for mitochondrial splicing sequences obtained by A5-miseq v2015052239, SPAdesv3.9.040 and BLAST v2.2.31 (https://blast.ncbi.nlm.nih.gov/Blast.cgi), were performed by using software mummer v3.141 to determine the position relation of contig sequences. The complete mitogenome sequence was revised and confirmed by pilon v1.1842. All these procedures were performed by Shanghai Personal Biotechnology Co., Ltd., China.

The locations of putative protein-coding genes and rRNA genes were preliminarily predicted by software DOGMA43 and MITOS29, and the precise location was identified by the mitogenome of the related species based on Palaemoninae sequences archived in GenBank. Identification of initiation and termination codons were carried out by using an alignment generated through ClustalX version 2.044, with other related species sequences as references, and verified by utilizing ORF finder and Blastn of NCBI. The location and secondary structure of tRNA genes were predicted and annotated using MITOS29 and tRNAscan-SE with default settings28. Nucleotide composition and the relative synonymous codon usage (RSCU) were determined using MEGA 745.

To describe base composition, AT skew = [A − T]/[A + T], GC skew = [G − C]/[G + C] were analyzed as described by Perna & Kocher46. Online mitochondrial visualization tool mtviz was utilized to drawn the graphical diagram of the complete mitogenome (http://pacosy.informatik.uni-leipzig.de/mtviz/mtviz). In the end, the complete mitochondrial DNA sequence was uploaded to GenBank database under the accession number MH880828.

Phylogenetic analysis

Eleven others complete mitogenome sequences of subfamily Palaemoninae (ten species) were obtained from GenBank (https://www.ncbi.nlm.nih.gov/genbank/) for phylogenetic analysis within Palaemoninae. GenBank sequence information of eleven species was shown in Table 2. In addition, the mitogenome of Panulirus stimpsoni (GQ292768.1) and Panulirus ornatus (GQ223286.1) were employed as an out-group taxon from GenBank. Nucleotide sequences from 13 mitogenome PCGs were aligned using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/). Moreover, Gblocks was utilized to remove poorly aligned region and divergent site47.

The optimal nucleotide substitution models were given by jModelTest (v2.0)48,49 through online server Phylemon 2 (http://phylemon.bioinfo.cipf.es/evolutionary.html) and MEGA 745 based on Akaike Information Criterion (AIC) value for Maximum Likelihood method (ML) and Bayesian Information Criterion (BIC) value for Bayesian inference (BI). Consequently, GTR + I + G was selected as the best-fit evolutionary model for ML analysis by both MEGA 745 and jModelTest48,49, whilst GTR + I + G and Tpm3uf + I + G were considered as the best model for BI analyses given by MEGA 745 and jModelTest48,49, respectively. Because Tpm3uf model was not implemented in Mrbayes v3.2.150, it was replaced by the closest over-parameterized model (GTR)51,52. As a result, GTR + I + G model was selected for further phylogenetic analysis.

Afterwards, ML analysis was performed on 1000 bootstrapped datasets by MEGA 745. The BI analysis was carried out as 4 simultaneous Markov chain Monte Carlo (MCMC) for 100,000 generations, sampled every 100 generations by using Mrbayes v3.2.150, the average standard deviation of split frequencies was less than 0.01. Both topology tree and the Bayesian posterior probilities (PP) was derived after the first 250 “burn-in” trees were excluded.