Introduction

Sponge-dwelling snapping shrimps in the genus Synalpheus (Decapoda: Alpheidae) are the only known clade of marine animals to have evolved eusociality1,2, a complex social organization that is best known in terrestrial insects such as ants, bees and termites3. Eusocial Synalpheus colonies typically consist of a single or a few queens1 and up to several hundred non-sterile workers of the two sexes4,5. At least nine described species of Synalpheus in the West Atlantic ‘gambarelloides’ group are known to be eusocial1. These species are characterized by their high reproductive skew, overlapping generations, and, in a few representatives in which behavioral observations have been made, cooperative defense of the host sponge2,6,7. Synalpheus belonging to the ‘gambarelloides’ clade represent a relatively young lineage that radiated between ~5 and 7 Mya8, yet eusociality has independently evolved at least four times in this genus1. Communal living, where multiple mating pairs live in the same sponge, has also evolved multiple times in this clade from pair-forming ancestors9. The social diversity, short evolutionary history, and similar ecology among Synalpheus shrimps make them an ideal group of marine animals to study the evolution of sociality. Moreover, despite a decade-long period of ecological dominance10, eusocial Synalpheus shrimps have experienced recent population declines11. Our knowledge of the biology of eusocial shrimps has increased substantially over the past decades. Yet, genomic resources are scarce in this group12, especially when compared to that of social insects. Such a lack of genomic knowledge is limiting our understanding of behavioral innovations in sponge-dwelling snapping shrimps. Therefore, this study focuses on the development of genomic resources that are pivotal to improve our understanding of evolutionary innovations in this and other groups of crustaceans.

Synalpheus microneptunus is found only in reefs along the west coast of Barbados in the eastern Caribbean Sea and is the only known eusocial species in Barbados13. Their colonies typically consist of <10 individuals with a single ovigerous female (i.e., the queen). They live in sponges belonging to Neopetrosia proxima and N. subtriangularis (previously Xestospongia) and may share the sponge host with S. belizensis, a pair-living species13. S. microneptunus comprised 25% of the total abundance of Synalpheus in Barbados, despite being one out of 14 Synalpheus species in the region. This observation mirrors the ecological dominance of eusocial species observed in Belizean coral reefs documented by a decade of field survey10. Synalpheus microneptunus is part of the S. paraneptunus species complex that shares several synapomorphic features that distinguish them from the rest of the S. gambarelloides group, including sparse, unorganized setae on the non-snapping minor chela (while most other species have organized rows of setae) and the “excavated” inner surface of the fingers of the minor chela14. Phylogenetically, S. microneptunus is closely related to S. duffyi, a eusocial species that have larger colony sizes and is geographically widespread (Cuba, Florida, Jamaica, and Panama).

In this study, we describe the complete mitochondrial genome of the eusocial sponge-dwelling snapping shrimp S. microneptunus. Specifically, we analyze the nucleotide composition and codon usage profiles of protein coding genes (PCGs), and examine selective constraints in PCGs. We also describe the secondary structure of each identified tRNA gene, and examine the putative D-loop/control region (CR). In addition, we examine the phylogenetic position of S. microneptunus among other caridean shrimps based on mitochondrial PCGs.

Methods

Field collection and sequencing

A single specimen collected from Barbados in 200813 was used for DNA extraction and low-coverage whole genome sequencing (LC-WGS). A detailed field collection protocol has been reported previously15. We extracted genomic DNA using several walking legs from this alcohol-preserved specimen using the Qiagen DNeasy Tissue Kit (Qiagen). Extracted DNA was quantified using a Qubit 3.0 Fluorometer with the dsDNA HS assay (ThermoFisher Scientific) and visualized on 2% agarose gels. For LC-WGS, we provided 1,500 ng of genomic DNA to Novogene (Chula Vista, CA) for TruSeq PCR-free library preparation (Illumina) and 150 bp pair-end sequencing on an Illumina NovaSeq to obtain at least 1X coverage according to published genome size16. LC-WGS reads from whole-cell extraction contain a high copy number of extranuclear sequences, and it has been shown to be an efficient and economical approach to assemble complete mitochondrial genomes17.

Mitochondrial genome assembly of Synalpheus microneptunus

The mitochondrial genome of S. microneptunus was de novo-assembled using the NOVOPlasty pipeline v. 1.2.317. NOVOPlasty uses a seed-and-extend algorithm that assembles organelle genomes from WGS data, starting from a related or distant single ‘seed’ sequence and an optional ‘bait’ reference mitochondrial genome17. For assembly, we used a previously published fragment of the COI gene from S. microneptunus (GenBank accession number KJ595111) as a seed and a kmer size of 39. We did not use a bait reference mitochondrial genome considering that there are no mitochondrial genomes from closely related (congeneric) species published and available in GenBank. Nuclear mitochondrial pseudogenes are abundant in the closely related genus Alpheus18 and may affect the assembly quality, resulting in many contigs. However, the adverse effect of mitochondrial pseudogenes is likely minimal when the contigs are being circularized.

Mitochondrial genome annotation and analysis

The newly assembled mitochondrial genome was first annotated in the MITOS web server (http://mitos.bioinf.uni-leipzig.de)19 using the invertebrate genetic code. Annotation curation, including start and stop codons corrections, were conducted using Expasy (https://web.expasy.org/)20 and MEGA X21. Genome visualization was conducted with OrganellarGenomeDRAW (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html)22. Nucleotide composition and codon usage profiles of PCGs were analyzed. Nucleotide composition was estimated in MEGA X. Codon usage for each PCG was predicted using the invertebrate mitochondrial code in the Codon Usage web server (http://www.bioinformatics.org/sms2/codon_usage.html)23. tRNA genes were identified and their secondary structures were predicted in the software MITFI24 as implemented in the MITOS web server. tRNA secondary structure was visualized in the Forna web server (http://rna.tbi.univie.ac.at/forna)25.

We explored the selective constraints in all mitochondrial PCGs of S. microneptunus. Overall values of KA (the number of nonsynonymous substitutions per nonsynonymous site: KA = dN = SA/LA), KS (the number of synonymous substitutions per synonymous site: KS = dS = SS/LS), and the KA/KS ratio (or ω or dN/dS) were estimated for each PCG in the software KaKs_calculator 2.026. KA and KS values were based on a pairwise comparison between S. microneptunus and Alpheus lobidens (GenBank accession number KP276147), a species belonging to a genus sister to Synalpheus27. We chose to use A. lobidens for the comparison because the mitochondrial genome of this species28 is best described among other Alpheus. Next, to identify positively selected sites along the length of each examined sequence, we also calculated the values of KA, Ks, and KA/KS along sliding windows of 57 bp that ‘slipped’ every 6 bp along each PCG. The γ-MYN model29 was used during calculations to account for variable mutation rates across sequence sites26. If PCGs are under no selection, positive selective constraint (purifying selection), or diversifying selection, the KA/KS ratio is expected to be equal to 1, >1, or <1, respectively26. To confirm that the observed ratios of KA/KS were not affected by the choice of the outgroup Alpheus species, we repeated the above analyses with all other species of Alpheus for which mitochondrial genomes were available in GenBank: A. bellulus, A. distinguendus, A. inopinatus, and A. randalli (GenBank accession numbers: MH796167, NC_014883, MG551491, and MH796168, respectively).

The presence of inverted repeats in the putative D-loop/CR of S. microneptunus was explored with the ‘EMBOSS:einverted’ web server (http://www.bioinformatics.nl/cgi-bin/emboss/einverted) using the default options30. The presence and number of microsatellites (Simple Sequence Repeats, SSRs) were investigated with the 'Microsatellite repeats finder’ web server using the default options (http://insilico.ehu.es/mini_tools/microsatellites)31. The RNAstructure web server (http://rna.urmc.rochester.edu/RNAstructureWeb/Servers/Predict1/Predict1.html)32 was used to predict the lowest free energy secondary structure of the putative control region with particular attention to the presence of stem-loops.

Lastly, we examined the phylogenetic position of S. microneptunus among other species of caridean shrimps (Decapoda: Caridea). The newly assembled and annotated mitochondrial genome of S. microneptunus and those of a total of 63 other species of carideans available in the GenBank database were used for phylogenetic analyses conducted using the MitoPhAST pipeline v2.033. Phylogenetic analyses included a total of 27 different genera in the infraorder Caridea. Outgroups included one species of lobster (Stereomastis sculpta [Polychelida]), two species of stenopodid shrimps (Stenopus hispidus and Spongicola levigatus [Stenopodidea]) and two species of prawns (Penaeus vannamei and P. monodon [Penaeoidea]). MitoPhAST extracts all 13 PCG nucleotide sequences from species available in GenBank and others provided by the user (i.e., S. microneptunus), translates each PCG nucleotide sequence to amino acids, conducts alignments for each PCG amino acid sequence using Clustal Omega34, removes poorly aligned regions with trimAl35, partitions the dataset and select best fitting models of sequence evolution for each PCG with ProtTest36 and uses the concatenated and partitioned PCG amino acid alignments to perform a maximum likelihood phylogenetic analysis in the software IQ-TREE37. The robustness of the ML tree topology was assessed by 1,000 bootstrap reiterations of the observed data. We also optimized the resulting amino acid sequence matrix under Bayesian Inference in MrBayes v3.2.7a38 with the same partitioning and model scheme as the ML search in IQ-TREE. Because MrBayes does not natively support one of the best fit models of molecular evolution identified with ProtTest (MtZoa), we manually implemented the substitution rate and state frequency priors from Rota-Stabelli et al.39. Our search spanned 5 million generations with four chains set to default temperature. We disregarded 25% of sampled trees as burn-in and sampled every 500 cycles. We assessed effective sample size in Tracer v 1.740 and convergence of runs through the average standard deviation of split frequencies.

Results and Discussion

Using 51,305,421 paired-end sequences (SRA: SRX6711388), we completely assembled and circularized the mitochondrial genome of S. microneptunus with a coverage of 22×(GenBank accession number MN750781). The complete mitochondrial genome of S. microneptunus was 15,603 bp in length and comprised 13 protein-coding genes (PCGs), two ribosomal RNA genes (rrnS [12 S ribosomal RNA] and rrnL [16 S ribosomal RNA]), and 22 transfer RNA (tRNA) genes. The mitochondrial genome of S. microneptunus was compact with only a few intergenic spaces and overlaps among gene junctions (Fig. 1, Table 1). Most of the PCGs and tRNA genes were encoded on the heavy strand, while only four PCGs (in order from 5′ to 3′: nad5, nad4, nad4l, and nad1), two ribosomal RNA genes and 8 tRNA genes (trnF, trnH, trnP, trnL1, trnV, trnQ, trnC, and trnY) were encoded in the light strand (Fig. 1, Table 1). A single, long intergenic space of 834 bp was assumed to be the D-loop/control region (Fig. 1, Table 1). The gene order observed in S. microneptunus is identical to that reported for most caridean shrimps33 and corresponds to the presumed Pancrustacean (Hexapoda + Crustacea) ground pattern28. Interestingly, the gene order observed in S. microneptunus is different from that reported in the closely related genus Alpheus28,41,42,43,44,45. However, whether or not mitochondrial gene synteny is useful to reveal genealogical relationships within the Caridea and other decapod crustaceans remains to be addressed.

Figure 1
figure 1

Circular genome map of Synalpheus microneptunus mitochondrial DNA. The annotated map depicts 13 protein-coding genes (PCGs), two ribosomal RNA genes (rrnS: 12 S ribosomal RNA and rrnL: 16 S ribosomal RNA), 22 transfer RNA (tRNA) genes, and the putative control region. The inner circle depicts GC content along the genome. The putative D-Loop/control region is not annotated. Genome assembly and initial annotation were done with NOVOPlasty17 and MITOS19, and visualized with OrganellarGenomeDRAW22.

Table 1 Arrangement and annotation of the mitochondrial genome of Synalpheus microneptunus.

In the mitochondrial genome of S. microneptunus, 12 out of the 13 PCGs exhibited conventional invertebrate and Pancrustacean mitochondrial start codons (ATA, ATG, and ATC) (Table 1). Cox1 featured an alternative putative start codon (ACG) that was previously reported for other decapod crustaceans and references therein46, but was found only in a few caridean shrimps (i.e., Nautilocaris saintlaurentae47; Macrobrachium rosenbergii48). Eleven PCGs ended with a complete and conventional stop codon (TAA or TAG) (Table 1). The genes cox2, cox3, and cob each terminated with an incomplete stop codon T. Truncated stop codons are often observed in crustacean mitochondrial genomes43,49 and are hypothesized to be completed via post-transcriptional poly-adenylation50.

The mitochondrial genome of S. microneptunus contained an A  +  T bias with an overall base composition of A = 36.6%, T = 28.0%, C = 23.9%, and G = 11.4% at the heavy strand. This A  +  T bias is within the known range reported for mitochondrial genomes in caridean shrimps43. The most frequently used codons found in the PCGs of S. microneptunus were UUA (Leu, N = 223 times used), UUU (Phe, N = 186), and AUU (Ile, N = 182). Least frequently used codons (excluding termination codons) included CCG (Pro, N = 9), CGG (Arg, N = 8), and AGC (Ser, N = 8) (Suppl. Mat. Table S1).

All PCGs in the mitochondrial genome of S. microneptunus exhibited overall KA/KS ratios <<1 (Fig. 2). This indicates that these PCGs are generally evolving under purifying selection. Examination of KA/KS ratios in sliding windows across the length of each PCG further indicated that purifying selection is acting along most of the length of each PCG (Suppl. Mat. Fig. S1). The results were very similar when the above analyses were performed using other species of Alpheus as outgroups (Suppl. Mat. Figs. S2S6), confirming the general pattern of purifying selection in PCGs. Selective pressure in mitochondrial PCG has been poorly studied in crustaceans but a similar pattern of widespread purifying selection in mitochondrial PCGs has been observed in other arthropods, including decapod crustaceans, and references therein46,51. Interestingly, regardless of Alpheus outgroup, the genes atp8 and nad6 exhibited higher KA/KS ratios than other genes, but with values lower than 1. These two genes were also found to have higher KA/KS ratios than other mitochondrial genes between two Alpheus species42. This suggests that selective pressures may tend to be relaxed in these genes across Alpheidae. However, broad investigations of the selective pressures in mitochondrial genes across Caridean species, or crustaceans in general, are sparse42,47. It is possible that eusociality may drive changes in the rate of evolution in mitochondrial genes due to prolonged longevity in the queens52,53, longer generation time54,55, and reduced effective population size56,57. Whether the higher KA/KS ratios observed in a few PCGs in S. microneptunus are driven by eusociality or result from other unknown (e.g., ecological) differences between Synalpheus and Alpheus remains to be investigated in further comparative analyses.

Figure 2
figure 2

Selective pressure analysis in the protein coding genes (PCGs) of Synalpheus microneptunus. (a) Shows the KA/KS ratios calculated using the γ-MYN model for each of the 13 PCGs. (b) Shows the estimate of KA, KS and KA/KS using a sliding window of length 57 bp and a step length of 6 bp for cox1. See methods and results for further details. Results of sliding-window analyses of all PCGs are shown in Suppl. Mat. Fig. S1.

The mitochondrial genome of S. microneptunus encoded tRNA genes that ranged in length from 58 (tRNA-Ser1) to 71 (tRNA-Ser2) bp. All tRNA genes, except tRNA-Ser, exhibited a standard ‘cloverleaf’ secondary structure as predicted by MIFTI (Fig. 3). In the tRNA-Ser1 gene, the stem and loop of the pseudouridine arm (T-arm) was missing. Complete (stem and loop) or partial (loop only) tRNA arm deletions are known to occur in other decapod crustaceans46,51, and references therein and the function of these tRNA may be complemented by elongation factors58.

Figure 3
figure 3

Secondary structure of tRNAs in the mitochondrial genome of Synalpheus microneptunus visualized in the Forna web server25.

The rrnS and rrnL genes identified in the mitochondrial genome of S. microneptunus were 800 and 1371 nucleotides long, respectively. These genes were located close to each other between tRNA-L1 and the putative D-loop/CR, but separated by tRNA-V (Fig. 1, Table 1). As shown to occur in other crustaceans, including caridean shrimps, the two genes were highly A + T biased. The overall base composition of the rrnL gene was A = 29.4%, T = 39.9%, C = 8.2%, and G = 22.5%. In turn, that of the rrnS gene was A = 29.2%, T = 38.8%, C = 8.8%, and G = 23.1%.

The 834 bp long intergenic region assumed to be the D-loop/CR was located between the 12 S ribosomal RNA and tRNA-I (Fig. 1) in S. microneptunus. The region was heavily A + T rich with an overall base composition: A = 42.0%, T = 37.5%, C = 15.2%, and G = 5.3%. Visual examination of this non-coding region revealed multiple mononucleotide adenine and thymine repeats along the entire stretch of this intergenic region. The region has an imperfect inverted repeat located in positions 176–221 and 227–275 (detected by EMBOSS:einverted) and multiple AT-rich dinucleotide and trinucleotide microsatellites along the entire stretch of the CR (detected by microsatellite Repeat Finder) (Suppl. Mat. Fig. S3). The secondary structure prediction analysis in RNAStructure (assuming 27 °C and other default options) resulted in a lowest free energy configuration (change in Gibbs free energy [ΔG] = − 104 kcal/mol) that featured various stem-loop structures interspersed along the length of the region (Suppl. Mat. Fig. S7). Only a few studies have characterized the putative D-Loop/CR in crustaceans46 and references therein. In some species this long non-coding region appears to be relatively well organized (i.e., in the non-decapod branchiopod genus Daphnia59 and in the decapod Chinese spiny lobster Panulirus stimpsoni60). While in other species (e.g., the Caribbean spiny lobster Panulirus argus46) and here in S. microneptunus, there is no obvious organization in the D-Loop/CR.

The ML and BI phylogenetic trees (68 terminals, 3636 amino acid characters, and 1864 parsimony informative sites) support the monophyly of the Caridea and placed S. microneptunus in a monophyletic clade (family Alpheidae) sister to representatives from the genus Alpheus. The above relationship supports the monophyly of the family Alpheidae in agreement with results from previous phylogenetic studies using a combination of partial mitochondrial and nuclear genes61 (Figs. 45) or using mitochondrial PCGs but with a more limited sample of caridean shrimps42,43,44. Additional well supported clades within the Caridea included the families Alvinocaridae, Atyidae, Palaemonidae, and Pandalidae. While the monophyly of these caridean families was supported in both ML and BI analyses, the relationships among families was found to be sensitive to optimality criteria. These differences are reflected in the low support we recovered for the node leading to Atyidae + (Palaemonidae + Alpheidae) + Pandalidae in the ML tree. Our BI analysis recovered a monophyletic (Palaemonidae + Alpheidae) clade; however all other inter-familial relationships are recovered as distinct from the ML topology, and with high support. The sister relationship herein observed between the families Palaemonidae and Alpheidae was also supported by a recent phylogenomic study62. The same phylogenomic analysis62 resolved Atyidae as the sister group to all other caridean taxa, however, none of our analyses recover this relationship. Our results suggest that mitochondrial genomes contain enough phylogenetic information to delineate monophyly of higher taxa within the Caridea (at superfamily and family levels), but the relationships among monophyletic clades may be sensitive to marker choice and reconstruction methodology.

Figure 4
figure 4

Maximum-likelihood phylogenetic tree based on the amino acid alignments of 13 protein coding genes in the mitochondrial genome of Synalpheus microneptunus and 43 caridean shrimps. Outgroups included one species of lobster (Stereomastis sculpta), two species of stenopodid shrimps (Stenopus hispidus and Spongicola levigatus) and two species of prawns (Penaeus vannamei and P. monodon). Number at each node represents bootstrap values. The analysis included a total of 68 terminals, 3636 amino acid characters, and 1864 parsimony informative sites. The tree was drawn using Mesquite v3.663.

Figure 5
figure 5

Bayesian inference phylogeny from optimization of amino acid sequences for 13 protein coding mitochondrial genes across 68 terminals. Posterior probability support indicated at nodes where values are not equal to 1.0. The tree was drawn using Mesquite v3.663.

Conclusions

This study assembled and analyzed the first mitochondrial genome of a eusocial marine invertebrate, the caridean shrimp S. microneptunus. The complete mitochondrial genome of S. microneptunus will enhance the genomic resources in the only known group of eusocial animals in the sea and allow further investigations of the relationship between complex social behaviors (i.e., eusociality and communal living)9 and selective pressures and rates of molecular evolution in mitochondrial genomes.