Complete mitochondrial genomes of Trisidos kiyoni and Potiarca pilula: Varied mitochondrial genome size and highly rearranged gene order in Arcidae

We present the complete mitochondrial genomes (mitogenomes) of Trisidos kiyoni and Potiarca pilula, both important species from the family Arcidae (Arcoida: Arcacea). Typical bivalve mtDNA features were described, such as the relatively conserved gene number (36 and 37), a high A + T content (62.73% and 61.16%), the preference for A + T-rich codons, and the evidence of non-optimal codon usage. The mitogenomes of Arcidae species are exceptional for their extraordinarily large and variable sizes and substantial gene rearrangements. The mitogenome of T. kiyoni (19,614 bp) and P. pilula (28,470 bp) are the two smallest Arcidae mitogenomes. The compact mitogenomes are weakly associated with gene number and primarily reflect shrinkage of the non-coding regions. The varied size in Arcidae mitogenomes reflect a dynamic history of expansion. A significant positive correlation is observed between mitogenome size and the combined length of cox1-3, the lengths of Cytb, and the combined length of rRNAs (rrnS and rrnL) (P < 0.001). Both protein coding genes (PCGs) and tRNA rearrangements is observed in P. pilula and T. kiyoni mitogenomes. This analysis imply that the complicated gene rearrangement in mitochondrial genome could be considered as one of key characters in inferring higher-level phylogenetic relationship of Arcidae.

34 non-coding regions with a total of 13,642 bp with various lengths of 0-7,408 bp. The two long non-coding regions are located between cox3 and trnR2 (7,408 bp), trnR2 and cox1 (2,435 bp), respectively. In P. pilula, the overlaps occur two times and involve a total of 45 bp, which located between cox1 and nad5 (31 bp), trnS2 and cox3 (14 bp). There is no overlapping gene in the mitochondrial genome of T. kiyoni. The overlap between cox1 and nad5 was also observed in the mitogenome of S. broughtonii and A. vellicata 3,15 .
The A + T content, AT-skew, and GC-skew are three parameters, which were usually used in the investigation of the nucleotide-compositional behavior of mitochondrial genomes 27,28 . The nucleotide compositions of the complete mtDNA sequence for both of the Arcidae species are biased toward A and T ( Table 3). The A + T content is 62.73% in T. kiyoni and 61.16% in P. pilula. The non-coding region (NCR) show the highest A + T content (67.15% and 63.09%, respectively). In order to evaluate the base bias in the mitogenomes, we measured skewness in different gene regions of T. kiyoni and P. pilula mitochondrial genomes, and found the values of the AT-skew were mostly negative, as well as values of the GC-skew were all positive (Table 3).
Protein-coding genes and ribosomal RNA genes. The entire length of the PCGs of T. kiyoni was 10,545 bp, while that of P. pilula was 11,151 bp. The overall A + T content of the 12 PCGs was 61.63% in the T. kiyoni mitogenome, ranging from 59.91% (nad3) to 64.72% (atp6). In P. pilula mitogenome, the A + T content of the 12 PCGs was 60.36%, ranging from 57.01% (co2) to 63.58% (nad2).
In the mitochondrial genomes of T. kiyoni and P. pilula, all of the 12 PCGs have complete start codons e.g. ATG and ATA (Table 1). In the T. kiyoni mitochondrial genome, eight and four PCGs started with ATG and ATA, respec- tively, while in the P. pilula mitochondrial genome, five and seven PCGs started with ATG and ATA, respectively. All the 12 PCGs genes have complete stop codons, e.g. TAA, TAG. A total of 3515 and 3717 amino acids are encoded in T. kiyoni and P. pilula mitogenomes, respectively. The codon usage of T. kiyoni and P. pilula mitochondrial genomes (Table 4) are similar to that of other Arcidae species. All codons are used in both of the mitogenomes but with different frequencies. Amino acids coded by A + T-rich codon families (e.g. Phe, Tyr and Lys) are more frequent than amino acids coded by G + C-rich codon families (e.g. Pro and Arg). The ratio G + C/A + T-rich codons was 0.43 in T. kiyoni mitogenome, which is lower than that of P. pilula mitogenome. In both of the mitogenomes, G-ending codons are most abundent in NNY codon families and T-ending codons are most abundent in NNR and NNN codon families, and consequently, the (+ ) strands are T and G-rich, outlines another bias of Arcidae codon usage. There are 1.7 times more G than C and 2.7 times more T than A in the neutral sites of T. kiyoni. In the case of P. pilula, 3.4 times more G than C and 1.6 times more T than A were found at at the strand neutral sites. Codon usage bias was also observed in the vertebrate mitochondrial genomes, in which the two strands are exposed to different mutational pressures during replication, leading to an increased frequency of A and C in the (+ ) strand (or L-strand, in case of vertebrates) 27,29,30 . However, the Arcidae mtDNA showed the accumulation of T and G in the (+ ) strand, suggesting that a reversal of strand asymmetry have occurred in the members of these taxa.
The nonsynonymous (Ka) and synonymous (Ks) substitution rates reflect the evolutionary dynamics of protein-coding sequences across closely related species 31,32 . In order to detect the influence of selection pressure in Arcidae mitochondrial genomes, the number of Ka and Ks, and the ratio of Ka/Ks, were calculated for all  PCGs were all less than 1, indicating the existence of purifying (negative) selection in these species. Overall, the NADH dehydrogenase complex genes harbor more nonsynonymous substitutions than the cytochrome c oxidase subunit (cox1-cox3) genes and cytochrome b. This tendency was consistent with the hypothesis that the genes coding for the three subunits of the cytochrome c oxidase and cytochrome b had a higher degree of conservation than the NADH dehydrogenase genes 33 . Interestingly, nad2 showed an exceptionally high relative proportion of nonsynonymous changes and higher Ka/Ks ratio compared to the other mitochondrial coding genes. This pattern was also observed in vertebrate mitochondrial genomes, e.g. fishes, and may be associated with the distance from the origin of replication 34 . nad2 are found immediately upstream the major non-coding region (MNR), suggesting a minimum distance from the origin of replication. Thus, during replication, it will exposed as single-stranded for longer time compared to the other genes, rendering it more likely to accumulate mutations in the highly mutagenic environment of the mitochondrion 34,35 . Although all ratios less than 1 is consistent with purifying selection, the Ka/Ks ratio close to 1 is unusual for mt genes, positive selection cannot be ruled out entirely in nad2 gene. Identification of the rrnL genes in T. kiyoni and P. pilula were accomplished by comparison with other Arcidae rrnL genes. A conserved 23 bp-long sequence ' AGGAGTACGGGAACGTGCCTCCT' was used to identify the 3′ end of rrnS gene in T. kiyoni and P. pilula. This motif was conserved and also reported as the basis to infer the 3′ end of rrnS in other Arcidae mitogenomes 15 . The length of rrnL is 1,479 bp, and the rrnS is 710 bp in T. kiyoni. They are the largest rRNA genes yet reported in the family Arcidae. In P. pilula, the rrnL and rrnS are 1,344 bp and 673 bp in length, respectively (Table 1).
Transfer RNA genes and anticodons. The complete set of 22 tRNA genes typical of metazoan mitogenomes were present in T. kiyoni: two tRNAs for each of serine and leucine, and one tRNA for each of the other 18 amino acids. The P. pilula mitogenome contained 23 tRNAs, including the standard 22 tRNAs set and an extra trnR. All tRNAs were interspersed between the rRNAs and the protein-coding genes with the ranges from 64 bp (trnS UCA ) to 74 bp (trnI) in T. kiyoni and ranged from 64 bp (trnR2) to 73 bp (trnW) in P. pilula. The predicted secondary structures of tRNAs in two Arcidae mitogenomes were shown in Supplementary Figs 1 and 2. Most of them can fold into canonical clover-leaf secondary structures except trnE and trnS AGA in T. kiyoni, trnR CGA and trnS AGA in P. pilula, whose paired "DHU" arm were missing, simplifing down to a loop. A modified DHU-arms of trnE in T. kiyoni is unique among molluscs. The modified DHU-arms of trnR CGA is present in only few mitochondrial genomes 36,37 . However, missing of the "DHU" arm in the secondary structure of the trnS gene-trnS UCN and trnS AGN is common for molluscs [37][38][39][40] . To work in a similar way as usual tRNAs, these aberrant tRNA genes may require coevolved interacting factors or post-transcriptional RNA editing [41][42][43] .
In vertebrate mtDNAs, the most used codon in a degenerate codon family perfectly matches the anticodon of the corresponding tRNA, which is called codon-anticodon adaptation (also known as optimal codon usage) 44 . Different from the vertebrate mitochondrial genomes, non-optimal codon usage was the characteristic of Arcidae mtDNAs, and presumably other bivalves, where the most used codon does not perfectly match the corresponding tRNA anticodon in the 22 degenerate codon families (Table 4). This codon usage bias may disrupted by the A + T mutation pressure of the mitogenomes. In addition, the mitogenomes of T. kiyoni and P. pilula shared the same tRNA anticodons with vertebrate (Table 4) suggesting the anticodon evolution in metazoan mitochondrial genomes could be under the same operational forces 45 . This result is not consistent with the hypothesis that the biased codon usage drives the evolution of tRNA anticodons in the vertebrate mitogenome 30 .     The largest non-coding region with increased A + T composition is considered as the control region as it usually contain the signals for replication and transcription 2 . It shows a higher size variation than the other regions of the mitogenome due to both length variation with tandem repeat units (TRs) and differences in their copy numbers 46 . In the T. kiyoni mitogenome, one 770-bp tandem repeat (10,333-11,102), comprising three nearly identical motifs was found in the largest non-coding region (1,009 bp) between trnF and trnS UCU . Most of the non-coding sequences (9,843 bp) were observed within one segment in P. pilula mitogenome, within this segment all of the sequence, except trnR CGA , were predicted to be non-coding DNA. The large concentrated non-coding region of P. pilula, contained two distinct tandem repeat units (19,936 and 25,978-26,585), which were 1,133 bp and 668 bp in length, respectively. The first repeat family contained four nearly identical motifs. The second one had a three identical copies and a third copy with a 40% length of a 180-base sequence. Tandem repeat units within non-coding regions seem a common feature in Arcidae mitogenomes, despite different length and copy number in the repeat units 3,4,14 . The tandem repeat region was also found in other molluscs 38,[47][48][49][50] . The occurrence of tandem repeats could be explained by mtDNA replication through slipped-strand mispairing 51 . Stem-loop structures were detected in the tandem repeat region of T. kiyoni and P. pilula ( Supplementary Fig. 3). It has been demonstrated that the potential stem-loop structures in repeated unita and its flanking part may cause an increase in slipped-strand mispairing frequency 51,52 . Varied genome size of Arcidae species. Arcidae mitochondrial genomes are exceptional for their extraordinarily large and highly variable sizes. They house by far the largest known metazoan mitochondrial genomes, with sizes ranging from 19.6 to 47 kb among the four genomes sequenced to date (https://www.ncbi. nlm.nih.gov/). Arcidae mitogenomes possess an average length of 34.5 kb, whereas T. kiyoni and P. pilula showed the length of 19,614 bp and 27,895 bp, respectively, which was the smallest characterized mitochondrial genomes in Arcidae. The smallest genome-size is weakly associated with gene number and primarily reflect shrinkage of the non-coding regions. Genomic coverage by mitochondrial non-coding regions are only 27.37% for T. kiyoni and 40.84% for P. pilula, which were much lower than that of other Arcidae.
The early diverging phylogenetic positions of T. kiyoni within the Arcidae is such that this species provides an important insight into the historical information of Arcidae mitochondrial genomes (Fig. 3). Although it is difficult to reconstruct with the limited genomes, however, the diversity of mitogenome size among the species appears to reflect a dynamic history of expansion. The common ancestor of Aridae, like T. kiyoni, might have possessed a relatively compact mitochondrial genome, with a series of independent expansions leading to the large genomes in other species. However, the sources underlying major expansions in mitochondrial genome size are unknown.
Already in 1991 it was reported that the length of mitochondrial rRNAs is correlated with the size of their corresponding organellar genomes in seven species 53,54 . Highly significant positive correlations were detected between mitochondrial genome size and the combined length of cox1-3, the length of Cytb, and the combined length of rRNAs (rrnS and rrnL) in 278 eukaryotes and 11 a-proteobacteria. The six mitochondrial genes are essential for oxidative phosphorylation, which in most species are refractory to nuclear transfer 54 . We presented here an analysis of this observation for 256 molluscs using six mitochondrial genes (cox1-3, Cytb, rrnS and rrnL). They have rarely been transferred to nucleus and are therefore well suited to test the hypothesis on the evolution of gene length in mitochondria 54 . A significant positive correlations are observed between the size of their mitochondrial genome and the combined length of cox1-3, the lengths of Cytb, and the combined length of rRNAs (rrnS and rrnL), which is consistent with former reports (Fig. 4, Supplementary Table 2). In many mitochondrial genomes, redox reactions produce oxygen free radicals during respiration, making a higher mutation rate than their corresponding nuclear DNA 55 . Müller's ratchet states that these deleterious mutations can accumulate and lead to a mutational meltdown if recombination (either within or between organelles) never occurs 56 . Müller's ratchet explains that the shorter genes may accumulate slightly deleterious mutations slower 54 . Further, the replication advantage hypothesis states that a smaller mitochondrial genome would be selected in intracellular  competition due to its faster duplication rate. And this selection for smaller genomes generally contributes to the elimination of redundant cytoplasmic genes, by selecting for deletions in organelles 56 . In turn, a shorter gene might give its carrier a replication advantage during intracellular competition 54 . Thus, both of the two hypotheses predict a positive covariance between genome size and gene length. It is also supported by the shape of the gene length-genome size relationship investigated here, which is strongly asymptotic for all gene.
Gene rearrangement as a novel structure. The mitogenomes of Bivalvia show substantially gene rearrangements, having no obvious common pattern in the arrangement of PCGs and rRNA 9 . Species sequenced in Bivalvia belong to three five subclasses: Palaeoheterodonta, Heterodonta, Pteriomorphia, Anomalodesmata and Protobranchia. Gene arrangement in Unionoida (Palaeoheterodonta) is relatively conserved, except for the translocation of several tRNAs, and protein-coding genes nad2 and nad3 20 . In addition, only one mitochondrial genome is available in both Anomalodesmata (Laternula elliptica, KF534717) and Protobranchia (Solemya velum, JQ728447), and more sequences are need for further analyze. The mitochondrial gene order of the remaining bivalves is frequently rearranged, but see oysters 22 , especially for the family Pectinidae 20,57 .
Although variability in gene arrangement is high, there are some conserved gene blocks within these groups. Arcidae seems to represent another example, as T. kiyoni mitogenome shared no gene block with any other five Arcidae species, despite being number of the same family, suggesting that gene rearrangements occurred dramatically among lineages in this family (Fig. 5). There were both PCGs and tRNA rearrangement in P. pilula mitogenome. In terms of gene arrangement, it is clear that P. pilula is more similar to A. vellicata and T. granosa than to S. broughtonii and S. kagoshimensis. They share three identical gene blocks: two large blocks cox1-nad5-trnM-nad1-nad 4-Cytb-trnF-cox-trnC-nad6-trnK and atp6-trnP-trnI-trnG-trnE-trnV -rrnL-trnA-trnT-trnH-trnQ-nad3-nad4l, and one small block trnY-trnN-rrnS-nad2. If the tRNA genes are not considered, the gene order in P. pilula is nearly identical to that of A. vellicata, T. granosa, S. broughtonii and S. kagoshimensis, except for the translocation of gene cox3. For generation of the gene arrangement of mt DNA during evolution, a model involving slipped-strand mispairing of two homologous regions and random gene loss was proposed [58][59] . We suspect that the asymmetric gene replication and transcription accelerate this phenomenon in the evolutionary process. This hypothesis is supported by the fact that all of the mt genes of marine bivalves are encoded on same strand and show tremendous rearrangements 60 . On the other hand, the family Unionidae with dual-strand coding have relatively fewer rearrangements of gene order 20 .
Gene arrangement comparisons may be a useful tool for phylogenetic studies. This is based on the hypothesis that gene arrangements are likely to be shared only as a result of common ancestry since it is highly unlikely that the same gene order would arise independently in separate lineages 20,58 . We present a schematic representation of mitochondrial gene arrangements in Arcidae species on the phylogenetic trees inferred from the nucleotide dataset of 12 PCGs (Fig. 5). The comparative analysis of mt gene rearrangements in Arcidae reinforces the validity of our ML-tree and contributes new information on Arcidae phylogenetic relationships. As shown in Fig. 5, T. kiyoni was in a separate, more ancestral branch in the phylogenetic tree. Its gene order may represent the pleisomorphic gene arrangement in Arcidae. Hence, our analysis imply that the complicated gene rearrangement in mitochondrial genome could be considered as one of key characters in inferring higher-level phylogenetic relationship of Arcidae.

Materials and Methods
Sample collection and DNA extraction. Specimens of T. kiyoni and P. pilula were collected from the coastal water of Fujian Province, China. These samples were stored at − 80 °C and deposited as voucher specimens (specimen number: TK01 and PP01) in Fisheries College, Ocean University of China. Each of the two Arcidae complete mitogenome sequenced was obtained from a single specimen. Total genomic DNA was extracted from adductor muscle by a modification of standard phenol-chloroform procedure as described by Li et al. 61 and then stored at − 20 °C. Determination of partial sequences. In order to design gene-specific primers, we first obtained partial cox1 sequences for both T. kiyoni and P. pilula, with the universal primers of LCO1490/HCO2198 62 . Another short fragment, rrnS genes, were obtained from NCBI data base (GenBank accession no. JN974675 for T. kiyoni and JN974660 for P. pilula).

Construction of BD GenomeWalker DNA libraries, PCR amplification and sequencing. Four BD
GenomeWalker DNA libraries were constructed with the BD GenomeWalker Universal Kit (BD Biosciences, San Jose, CA, USA) following the manufacturer's protocols.
The complete mitogenome of T. kiyoni and P. pilula were amplified using genome-walking based method, which involves two nested PCR reactions with a touch-down program modified from the BD GenomeWalker Universal Kit User Manual. The partial sequences of cox1 and rrnS were used to design the initial sets of gene-specific primers, one (GSP1) for original PCR and the other (GSP2) for nested PCR, which were used for genome-walking to amplify both of the Arcidae mitogenome. The primer sequences used for genome-walking are presented in Supplementary Tables 3 and 4. PCR was performed in a total volume of 50 μ l including 2 U Taq DNA polymerase (TaKaRa, Dalian, China), about 100 ng template DNA, 1 μ M forward and reverse primers, 200 μ M of each dNTP, 1× PCR buffer and 2 mM MgCl 2 . The original PCR were carried out as follows using the outer adaptor primer1 (AP1) and outer gene-specific primer1 (GSP1): 10 s initial denaturation at 94 °C, 7 cycles of 30 s at 94 °C, 3 min at 72 °C, 32 cycles 30 s at 94 °C, 3 min at 67 °C, and 67 °C for an additional 7 min after the final cycle. A 1-μ l sample of the original PCR was diluted in 59 μ l of distilled water as the template for nested PCR amplification. The nested PCR were carried out as follows using the outer adaptor primer2 (AP2) and the outer, gene-specific primer2 (GSP2): 10 s initial denaturation at 94 °C, 5 cycles of 30 s at 94 °C, 3 min at 72 °C, 25 cycles 30 s at 94 °C, 3 min at 67 °C, and 67 °C for an additional 7 min after the final cycle. This procedure generally produces a single, major PCR product (100 bp-5000 bp) in at least one of the four libraries, which begins in a known sequence at the 5′ end of GSP2 and extend in the unkonwn adjacent genomic DNA.
PCR products were purified with EZ-10 spin column DNA gel extraction kit (Sangon Biotech), and then directly sequenced with the primer walking method. The sequencing was conducted on an ABI PRISM 3730 (Applied Biosystems) automatic sequencer in Beijing Genomics Institute (BGI) using standard Sanger sequencing chemistry.
Sequencing assembling and annotation. All sequence data were analysed and arranged to create the full genomes using the Seqman program from DNASTAR (http://www.DNASTAR.com). The protein coding genes were analyzed with ORF Finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) and BLASTx using the invertebrate mitochondrial genetic code. The tRNA genes were identified by ARWEN 63 and DOGMA 64 using the mito/ chloroplast or invertebrate genetic code and the default search mode. The rRNA genes were identified by their similarity to published gene sequences and by using BLAST searches (http://www.ncbi.nlm.nih.gov/BLAST/).
The base composition and skewness analyses were performed and compared between T. kiyoni and P. pilula genomes, as well as the other four Arcidae genomes (S. broughtonii (46,985 bp) 3 , S. kagoshimensis (46,713 bp) 4 , T. granosa (31,589 bp) 14 and A. vellicata (34,147 bp) 15 ). The A + T content values were computed using Editseq program from DNASTAR. The GC and AT skews described strand bias were calculated according to the formulae by Perna and Kocher 65 , AT skew = (A − T)/(A + T); GC skew = (G − C)/(G + C), where A, T, G and C are the occurrences of the four nucleotides. The codon usage of each PCG were calculated using MEGA 5 66 . The ratios of nonsynonymous and synonymous substitutions rates (Ka/Ks) were estimated based on the Maximum-Likelihood (ML) method 67 using KaKs_Calculator 2.0 68 with the YN model.
The whole mitogenome sequence was tested for potentially tandem repeats by Tandem Repeats Finder 4.0 69 . Prediction of potential secondary structure was performed by the online version of the mfold software, version 3.2 70 , applying default settings. When multiple secondary structures were possible, the most stable (lowest free energy (Δ G)) was used.
The gene map of the T. kiyoni and P. pilula mitogenomes were generated with the program CGView 71 . The two mitochondrial genomes have been deposited in the GenBank database under the accession numbers KU975161 for T. kiyoni and KU975162 for P. pilula.
Predicted lengths of gene products and mitogenome sizes for up to 278 molluscs (see Supplementary Table 2). The statistical analysis was performed by using IBM SPSS Statistics 19 with Spearman rank correlations, as this test makes no assumption about the distribution of the data.
The phylogenetic relationships were built based on the nucleotide sequences of 12 PCGs. Crassostrea gigas (AF177226) and Crassostrea hongkongensis (EU266073) from the family Ostreidae was used as outgroup. The twelve-partitioned nucleotide sequences of protein coding genes were aligned with MAFFT based on their nucleotide sequences using default settings 72 . The final nucleotide sequences of each gene were then concatenated into single contigs (6719 bp) for phylogenetic analyses. The best-fit nucleotide substitution models for each data partitions were selected by jModelTest 73 . We employed ML in RAxML Black-Box webserver (http://phylobench. vital-it.ch/raxml-bb/index.php) 74 with GTR + G substitution model to each partition. For the ML analysis, 1000 bootstraps were used to estimate the node reliability.