The complete mitochondrial genome of the eusocial sponge-dwelling snapping shrimp Synalpheus microneptunus

In the marine realm, eusociality is only known to have evolved within a clade of sponge-dwelling snapping shrimps in the genus Synalpheus. Deciphering the genomic underpinnings of eusociality in these marine shrimps has been limited by the sparse genomic resources in this genus. Here, we report, for a eusocial shrimp Synalpheus microneptunus, a complete mitochondrial genome (22X coverage) assembled from short Illumina 150 bp paired-end reads. The 15,603 bp long mitochondrial genome of S. microneptunus is AT-rich and includes 13 protein-coding genes (PCGs), 2 ribosomal RNA genes, 22 transfer RNA genes and an 834 bp intergenic region assumed to be the D-loop. The gene order is identical to that reported for most caridean shrimps and corresponds to the presumed Pancrustacean ground pattern. All PCGs showed signs of purifying selection, with KA/KS <<1 across the whole PCGs and most sliding windows within PCGs. Maximum-likelihood and Bayesian inference phylogenetic analyses of 13 PCGs and 68 terminals supports the monophyly of the Caridea and the family Alpheidae. The complete mitochondrial genome of the eusocial shrimp Synalpheus microneptunus will contribute to a better understanding of the selective pressures and rates of molecular evolution in marine eusocial animals.

Sponge-dwelling snapping shrimps in the genus Synalpheus (Decapoda: Alpheidae) are the only known clade of marine animals to have evolved eusociality 1,2 , a complex social organization that is best known in terrestrial insects such as ants, bees and termites 3 . Eusocial Synalpheus colonies typically consist of a single or a few queens 1 and up to several hundred non-sterile workers of the two sexes 4,5 . At least nine described species of Synalpheus in the West Atlantic 'gambarelloides' group are known to be eusocial 1 . These species are characterized by their high reproductive skew, overlapping generations, and, in a few representatives in which behavioral observations have been made, cooperative defense of the host sponge 2,6,7 . Synalpheus belonging to the 'gambarelloides' clade represent a relatively young lineage that radiated between ~5 and 7 Mya 8 , yet eusociality has independently evolved at least four times in this genus 1 . Communal living, where multiple mating pairs live in the same sponge, has also evolved multiple times in this clade from pair-forming ancestors 9 . The social diversity, short evolutionary history, and similar ecology among Synalpheus shrimps make them an ideal group of marine animals to study the evolution of sociality. Moreover, despite a decade-long period of ecological dominance 10 , eusocial Synalpheus shrimps have experienced recent population declines 11 . Our knowledge of the biology of eusocial shrimps has increased substantially over the past decades. Yet, genomic resources are scarce in this group 12 , especially when compared to that of social insects. Such a lack of genomic knowledge is limiting our understanding of behavioral innovations in sponge-dwelling snapping shrimps. Therefore, this study focuses on the development of genomic resources that are pivotal to improve our understanding of evolutionary innovations in this and other groups of crustaceans.
Synalpheus microneptunus is found only in reefs along the west coast of Barbados in the eastern Caribbean Sea and is the only known eusocial species in Barbados 13 . Their colonies typically consist of <10 individuals with a single ovigerous female (i.e., the queen). They live in sponges belonging to Neopetrosia proxima and N. subtriangularis (previously Xestospongia) and may share the sponge host with S. belizensis, a pair-living species 13 . S. microneptunus comprised 25% of the total abundance of Synalpheus in Barbados, despite being one out of 14 Synalpheus species in the region. This observation mirrors the ecological dominance of eusocial species observed in Belizean coral reefs documented by a decade of field survey 10 . Synalpheus microneptunus is part of the S. paraneptunus species complex that shares several synapomorphic features that distinguish them from the rest of the S. gambarelloides group, including sparse, unorganized setae on the non-snapping minor chela (while most other species have organized rows of setae) and the "excavated" inner surface of the fingers of the minor chela 14 . Phylogenetically, S. microneptunus is closely related to S. duffyi, a eusocial species that have larger colony sizes and is geographically widespread (Cuba, Florida, Jamaica, and Panama).
In this study, we describe the complete mitochondrial genome of the eusocial sponge-dwelling snapping shrimp S. microneptunus. Specifically, we analyze the nucleotide composition and codon usage profiles of protein coding genes (PCGs), and examine selective constraints in PCGs. We also describe the secondary structure of each identified tRNA gene, and examine the putative D-loop/control region (CR). In addition, we examine the phylogenetic position of S. microneptunus among other caridean shrimps based on mitochondrial PCGs.

Methods
field collection and sequencing. A single specimen collected from Barbados in 2008 13 was used for DNA extraction and low-coverage whole genome sequencing (LC-WGS). A detailed field collection protocol has been reported previously 15 . We extracted genomic DNA using several walking legs from this alcohol-preserved specimen using the Qiagen DNeasy Tissue Kit (Qiagen). Extracted DNA was quantified using a Qubit 3.0 Fluorometer with the dsDNA HS assay (ThermoFisher Scientific) and visualized on 2% agarose gels. For LC-WGS, we provided 1,500 ng of genomic DNA to Novogene (Chula Vista, CA) for TruSeq PCR-free library preparation (Illumina) and 150 bp pair-end sequencing on an Illumina NovaSeq to obtain at least 1X coverage according to published genome size 16 . LC-WGS reads from whole-cell extraction contain a high copy number of extranuclear sequences, and it has been shown to be an efficient and economical approach to assemble complete mitochondrial genomes 17 . Mitochondrial genome assembly of Synalpheus microneptunus. The mitochondrial genome of S. microneptunus was de novo-assembled using the NOVOPlasty pipeline v. 1.2.3 17 . NOVOPlasty uses a seed-and-extend algorithm that assembles organelle genomes from WGS data, starting from a related or distant single 'seed' sequence and an optional 'bait' reference mitochondrial genome 17 . For assembly, we used a previously published fragment of the COI gene from S. microneptunus (GenBank accession number KJ595111) as a seed and a kmer size of 39. We did not use a bait reference mitochondrial genome considering that there are no mitochondrial genomes from closely related (congeneric) species published and available in GenBank. Nuclear mitochondrial pseudogenes are abundant in the closely related genus Alpheus 18 and may affect the assembly quality, resulting in many contigs. However, the adverse effect of mitochondrial pseudogenes is likely minimal when the contigs are being circularized.
Mitochondrial genome annotation and analysis. The newly assembled mitochondrial genome was first annotated in the MITOS web server (http://mitos.bioinf.uni-leipzig.de) 19 using the invertebrate genetic code. Annotation curation, including start and stop codons corrections, were conducted using Expasy (https://web. expasy.org/) 20 and MEGA X 21 . Genome visualization was conducted with OrganellarGenomeDRAW (https:// chlorobox.mpimp-golm.mpg.de/OGDraw.html) 22 . Nucleotide composition and codon usage profiles of PCGs were analyzed. Nucleotide composition was estimated in MEGA X. Codon usage for each PCG was predicted using the invertebrate mitochondrial code in the Codon Usage web server (http://www.bioinformatics.org/sms2/ codon_usage.html) 23 . tRNA genes were identified and their secondary structures were predicted in the software  25 .
We explored the selective constraints in all mitochondrial PCGs of S. microneptunus. Overall values of K A (the number of nonsynonymous substitutions per nonsynonymous site: K A = d N = S A /L A ), K S (the number of synonymous substitutions per synonymous site: K S = d S = S S /L S ), and the K A /K S ratio (or ω or d N /d S ) were estimated for each PCG in the software KaKs_calculator 2.0 26 . K A and K S values were based on a pairwise comparison between S. microneptunus and Alpheus lobidens (GenBank accession number KP276147), a species belonging to a genus sister to Synalpheus 27 . We chose to use A. lobidens for the comparison because the mitochondrial genome of this species 28 is best described among other Alpheus. Next, to identify positively selected sites along the length of each examined sequence, we also calculated the values of K A , K s , and K A /K S along sliding windows of 57 bp that 'slipped' every 6 bp along each PCG. The γ-MYN model 29 was used during calculations to account for variable mutation rates across sequence sites 26 . If PCGs are under no selection, positive selective constraint (purifying selection), or diversifying selection, the K A /K S ratio is expected to be equal to 1, >1, or <1, respectively 26 . To confirm that the observed ratios of K A /K S were not affected by the choice of the outgroup Alpheus species, we repeated the above analyses with all other species of Alpheus for which mitochondrial genomes were available in GenBank: A. bellulus, A. distinguendus, A. inopinatus, and A. randalli (GenBank accession numbers: MH796167, NC_014883, MG551491, and MH796168, respectively).
The presence of inverted repeats in the putative D-loop/CR of S. microneptunus was explored with the 'EMBOSS:einverted' web server (http://www.bioinformatics.nl/cgi-bin/emboss/einverted) using the default options 30 . The presence and number of microsatellites (Simple Sequence Repeats, SSRs) were investigated with the 'Microsatellite repeats finder' web server using the default options (http://insilico.ehu.es/mini_tools/ microsatellites) 31 . The RNAstructure web server (http://rna.urmc.rochester.edu/RNAstructureWeb/Servers/ Predict1/Predict1.html) 32   ). MitoPhAST extracts all 13 PCG nucleotide sequences from species available in GenBank and others provided by the user (i.e., S. microneptunus), translates each PCG nucleotide sequence to amino acids, conducts alignments for each PCG amino acid sequence using Clustal Omega 34 , removes poorly aligned regions with trimAl 35 , partitions the dataset and select best fitting models of sequence evolution for each PCG with ProtTest 36 and uses the concatenated and partitioned PCG amino acid alignments to perform a maximum likelihood phylogenetic analysis in the software IQ-TREE 37 . The robustness of the ML tree topology was assessed by 1,000 bootstrap reiterations of the observed data. We also optimized the resulting amino acid sequence matrix under Bayesian Inference in MrBayes v3.2.7a 38 with the same partitioning and model scheme as the ML search in IQ-TREE. Because MrBayes does not natively support one of the best fit models of molecular evolution identified with ProtTest (MtZoa), we manually implemented the substitution rate and state frequency priors from Rota-Stabelli et al. 39 . Our search spanned 5 million generations with four chains set to default temperature. We disregarded 25% of sampled trees as burn-in and sampled every 500 cycles. We assessed effective sample size in Tracer v 1.7 40 and convergence of runs through the average standard deviation of split frequencies.

Results and Discussion
Using 51,305,421 paired-end sequences (SRA: SRX6711388), we completely assembled and circularized the mitochondrial genome of S. microneptunus with a coverage of 22×(GenBank accession number MN750781). The complete mitochondrial genome of S. microneptunus was 15,603 bp in length and comprised 13 protein-coding genes (PCGs), two ribosomal RNA genes (rrnS [12 S ribosomal RNA] and rrnL [16 S ribosomal RNA]), and 22 transfer RNA (tRNA) genes. The mitochondrial genome of S. microneptunus was compact with only a few intergenic spaces and overlaps among gene junctions (Fig. 1, Table 1). Most of the PCGs and tRNA genes were encoded on the heavy strand, while only four PCGs (in order from 5′ to 3′: nad5, nad4, nad4l, and nad1), two ribosomal RNA genes and 8 tRNA genes (trnF, trnH, trnP, trnL1, trnV, trnQ, trnC, and trnY) were encoded in the light strand (Fig. 1, Table 1). A single, long intergenic space of 834 bp was assumed to be the D-loop/control region (Fig. 1, Table 1). The gene order observed in S. microneptunus is identical to that reported for most caridean shrimps 33 and corresponds to the presumed Pancrustacean (Hexapoda + Crustacea) ground pattern 28 . Interestingly, the gene order observed in S. microneptunus is different from that reported in the closely related genus Alpheus 28,41-45 . However, whether or not mitochondrial gene synteny is useful to reveal genealogical relationships within the Caridea and other decapod crustaceans remains to be addressed.  Table 1). Cox1 featured an alternative putative start codon (ACG) that was previously reported for other decapod crustaceans and references therein 46 , but was found only in a few caridean shrimps (i.e., Nautilocaris saintlaurentae 47 ; Macrobrachium rosenbergii 48 ). Eleven PCGs ended with a complete and conventional stop codon (TAA or TAG) ( Table 1). The genes cox2, cox3, and cob each terminated with an incomplete stop codon T. Truncated stop codons are often observed in crustacean mitochondrial genomes 43,49 and are hypothesized to be completed via post-transcriptional poly-adenylation 50 .
The mitochondrial genome of S. microneptunus contained an A + T bias with an overall base composition of A = 36.6%, T = 28.0%, C = 23.9%, and G = 11.4% at the heavy strand. This A + T bias is within the known range reported for mitochondrial genomes in caridean shrimps 43 . The most frequently used codons found in the PCGs  Table S1).
All PCGs in the mitochondrial genome of S. microneptunus exhibited overall K A /K S ratios <<1 (Fig. 2). This indicates that these PCGs are generally evolving under purifying selection. Examination of K A /K S ratios in sliding windows across the length of each PCG further indicated that purifying selection is acting along most of the length of each PCG (Suppl. Mat. Fig. S1). The results were very similar when the above analyses were performed using other species of Alpheus as outgroups (Suppl. Mat. Figs. S2-S6), confirming the general pattern of purifying selection in PCGs. Selective pressure in mitochondrial PCG has been poorly studied in crustaceans but a similar pattern of widespread purifying selection in mitochondrial PCGs has been observed in other arthropods, including decapod crustaceans, and references therein 46,51 . Interestingly, regardless of Alpheus outgroup, the genes atp8 and nad6 exhibited higher K A /K S ratios than other genes, but with values lower than 1. These two genes were also found to have higher K A /K S ratios than other mitochondrial genes between two Alpheus species 42   www.nature.com/scientificreports www.nature.com/scientificreports/ the selective pressures in mitochondrial genes across Caridean species, or crustaceans in general, are sparse 42,47 . It is possible that eusociality may drive changes in the rate of evolution in mitochondrial genes due to prolonged longevity in the queens 52,53 , longer generation time 54,55 , and reduced effective population size 56,57 . Whether the higher K A /K S ratios observed in a few PCGs in S. microneptunus are driven by eusociality or result from other unknown (e.g., ecological) differences between Synalpheus and Alpheus remains to be investigated in further comparative analyses.
The mitochondrial genome of S. microneptunus encoded tRNA genes that ranged in length from 58 (tRNA-Ser1) to 71 (tRNA-Ser2) bp. All tRNA genes, except tRNA-Ser, exhibited a standard 'cloverleaf ' secondary structure as predicted by MIFTI (Fig. 3). In the tRNA-Ser1 gene, the stem and loop of the pseudouridine arm (T-arm) was missing. Complete (stem and loop) or partial (loop only) tRNA arm deletions are known to occur in other decapod crustaceans 46,51 , and references therein and the function of these tRNA may be complemented by elongation factors 58 .
The rrnS and rrnL genes identified in the mitochondrial genome of S. microneptunus were 800 and 1371 nucleotides long, respectively. These genes were located close to each other between tRNA-L1 and the putative D-loop/CR, but separated by tRNA-V (Fig. 1, Table 1). As shown to occur in other crustaceans, including caridean shrimps, the two genes were highly A + T biased. The overall base composition of the rrnL gene was A = 29.4%, T = 39.9%, C = 8.2%, and G = 22.5%. In turn, that of the rrnS gene was A = 29.2%, T = 38.8%, C = 8.8%, and G = 23.1%.
The 834 bp long intergenic region assumed to be the D-loop/CR was located between the 12 S ribosomal RNA and tRNA-I (Fig. 1) in S. microneptunus. The region was heavily A + T rich with an overall base composition: A = 42.0%, T = 37.5%, C = 15.2%, and G = 5.3%. Visual examination of this non-coding region revealed multiple mononucleotide adenine and thymine repeats along the entire stretch of this intergenic region. The region has an imperfect inverted repeat located in positions 176-221 and 227-275 (detected by EMBOSS:einverted) and multiple AT-rich dinucleotide and trinucleotide microsatellites along the entire stretch of the CR (detected by microsatellite Repeat Finder) (Suppl. Mat. Fig. S3). The secondary structure prediction analysis in RNAStructure (assuming 27 °C and other default options) resulted in a lowest free energy configuration (change in Gibbs free energy [ΔG] = − 104 kcal/mol) that featured various stem-loop structures interspersed along the length of the region (Suppl. Mat. Fig. S7). Only a few studies have characterized the putative D-Loop/CR in crustaceans 46 and references therein. In some species this long non-coding region appears to be relatively well organized (i.e., in the non-decapod branchiopod genus Daphnia 59 and in the decapod Chinese spiny lobster Panulirus stimpsoni 60 ). While in other species (e.g., the Caribbean spiny lobster Panulirus argus 46 ) and here in S. microneptunus, there is no obvious organization in the D-Loop/CR.
The ML and BI phylogenetic trees (68 terminals, 3636 amino acid characters, and 1864 parsimony informative sites) support the monophyly of the Caridea and placed S. microneptunus in a monophyletic clade (family Alpheidae) sister to representatives from the genus Alpheus. The above relationship supports the monophyly of the family Alpheidae in agreement with results from previous phylogenetic studies using a combination of partial mitochondrial and nuclear genes  or using mitochondrial PCGs but with a more limited sample of caridean shrimps [42][43][44] . Additional well supported clades within the Caridea included the families Alvinocaridae, Atyidae, Palaemonidae, and Pandalidae. While the monophyly of these caridean families was supported in both ML and BI analyses, the relationships among families was found to be sensitive to optimality criteria. These differences are reflected in the low support we recovered for the node leading to Atyidae + (Palaemonidae + Alpheidae) + Pandalidae in the ML tree. Our BI analysis recovered a monophyletic (Palaemonidae + Alpheidae) clade; however all other inter-familial relationships are recovered as distinct from the ML topology, and with high support. The sister relationship herein observed between the families Palaemonidae and Alpheidae was also supported by a recent phylogenomic study 62 . The same phylogenomic analysis 62 resolved Atyidae as the sister group to all other caridean taxa, however, none of our analyses recover this relationship. Our results suggest that mitochondrial genomes contain enough phylogenetic information to delineate monophyly of higher taxa within the Caridea (at superfamily and family levels), but the relationships among monophyletic clades may be sensitive to marker choice and reconstruction methodology.

conclusions
This study assembled and analyzed the first mitochondrial genome of a eusocial marine invertebrate, the caridean shrimp S. microneptunus. The complete mitochondrial genome of S. microneptunus will enhance the genomic resources in the only known group of eusocial animals in the sea and allow further investigations of the relationship between complex social behaviors (i.e., eusociality and communal living) 9 and selective pressures and rates of molecular evolution in mitochondrial genomes.

Data availability
Data is available at GenBank (MN750781 and SRX6711388).