Complete mitochondrial genome of Black Soft-shell Turtle (Nilssonia nigricans) and comparative analysis with other Trionychidae

The characterization of mitochondrial genome has been evidenced as an efficient field of study for phylogenetic and evolutionary analysis in vertebrates including turtles. The aim of this study was to distinguish the structure and variability of the Trionychidae species mitogenomes through comparative analysis. The complete mitogenome (16796 bp) of an endangered freshwater turtle, Nilssonia nigricans was sequenced and annotated. The mitogenome encoded for 37 genes and a major non-coding control region (CR). The mitogenome was A + T biased (62.16%) and included six overlapping and 19 intergenic spacer regions. The Relative synonymous codon usage (RSCU) value was consistent among all the Trionychidae species; with the exception of significant reduction of Serine (TCG) frequency in N. nigricans, N. formosa, and R. swinhoei. In N. nigricans, most of the transfer RNAs (tRNAs) were folded into classic clover-leaf secondary structures with Watson-Crick base pairing except for trnS1 (GCT). The comparative analysis revealed that most of the tRNAs were structurally different, except for trnE (TTC), trnQ (TTG), and trnM (CAT). The structural features of tRNAs resulted ≥ 10 mismatched or wobble base pairings in 12 tRNAs, which reflects the nucleotide composition in both H- and L-strands. The mitogenome of N. nigricans also revealed two unique tandem repeats (ATTAT)8, and (TATTA)20 in the CR. Further, the conserved motif 5′-GACATA-3′ and stable stem-loop structure was detected in the CRs of all Trionychidae species, which play an significant role in regulating transcription and replication in the mitochondrial genome. Further, the comparative analysis of Ka/Ks indicated negative selection in most of the protein coding genes (PCGs). The constructed Maximum Likelihood (ML) phylogeny using all PCGs showed clustering of N. nigricans with N. formosa. The resulting phylogeny illustrated the similar topology as described previously and consistent with the taxonomic classification. However, more sampling from different taxonomic groups of Testudines and studies on their mitogenomics are desirable for better understanding of the phylogenetic and evolutionary relationships.

Genome sequencing, assembly and annotation. The complete mitochondrial genome sequencing, de novo assembly and annotation were carried out at the Genotypic Technology Pvt. Ltd. Bangalore, India (http:// www.genotypic.co.in/). The whole genome library was sequenced using the Illumina NextSeq 500 (150 × 2 chemistry) (Illumina, Inc, USA) which yielded total 4677068 raw reads. The raw reads were processed using cutadapt tool (http://code.google.com/p/cutadapt/) for adapters and low quality bases trimming with cutoff of Phread quality score (Q score) of 20. These high quality reads were down sampled to 2 million reads using Seqtk program (https://github.com/lh3/seqtk) and the reads were de novo assembled using SPAdes-3.7.1. using default parameters 26 , considering Pelochelys cantorii mitochondrial genome (JN016746) as reference contig. The sequence annotation was also checked in MITOS online server (http://mitos.bioinf.uni-leipzig.de). The nucleotide sequences of the protein coding genes (PCGs) were initially translated into the putative amino acid sequences on the basis of the vertebrate mitochondrial DNA genetic code. The exact initiation and termination codons were identified in ClustalX using other reference sequences of Testudines 27 . Genome sequence and Phylogenetic analysis. The direction of locus, size, start and stop codon of PCGs, overlapping regions and intergenic spacers between genes, anticodon of transfer RNA (tRNA) genes were checked in MITOS online server (http://mitos.bioinf.uni-leipzig.de) and Open Reading Frame Finder (https:// www.ncbi.nlm.nih.gov/orffinder/). The PCGs located in light strand (L strand) were made reverse complementary before incorporating them in the analysis. However, the nucleotide composition and ATGC-skew of tRNA genes, situated in both heavy strand (H strand) and light strand (L strand) were analyzed as its original orientation. The CGView Server (http://stothard.afns.ualberta.ca/cgview_server/) with default parameters was used to map the circular representation of the generated completed mitochondrial genome 28 . Based on the homology search against the Refseq mitochondrial database (https://www.ncbi.nlm.nih.gov/refseq/), 12 mitogenomes of Trionychidae species were acquired from GenBank and incorporated in the dataset for comparative analysis. The base composition of nucleotide sequences, and composition skewness were calculated manually in Microsoft Excel as described previously: 29 . The Relative Synonymous Codon Usage (RSCU) values of each PCGs were calculated by assuring their codon frame using MEGA6.0 30 . The tRNA genes were verified in MITOS online server (http://mitos.bioinf.uni-leipzig.de), tRNAscan-SE Search Server 2.0 (http://lowelab.ucsc.edu/tRNAscan-SE/) and ARWEN 1.2 with the default settings with the appropriate anticodon capable of folding into the typical cloverleaf secondary structure 31 Watson-Crick, wobble, and mismatched base pairing. The large and small subunit of ribosomal RNA (rrnL and rrnS) were annotated by the MITOS online server (http://mitos.bioinf.uni-leipzig.de). In order to determine the unique base compositions in the control regions (CRs) of 13 Trionychidae species including N. nigricans, tandem repeats were predicted by the online Tandem Repeats Finder web tool (https://tandem.bu.edu/trf/trf.html) 33 . Subsequently, the CRs of 13 Trionychidae species were aligned using ClustalX software and checked manually in order to define the conserved sequence blocks and conserved sequence motif 27 . To recognize the location for replication of the L-strand, putative secondary structure of the CRs were anticipated by the online Mfold web server (http://unafold.rna.albany.edu) 34 . The software package DnaSPv5.0 was used to calculate the synonymous substitutions per synonymous sites (Ks) and non-synonymous substitutions per non-synonymous sites (Ka) 35 . Further, the complete mitogenomes of five turtle species under five different families (suborder: Cryptodira) were also incorporated to the dataset to illustrate the phylogenetic relationships. The complete mitogenome of Chelus fimbriata, family Chelidae (suborder: Pleurodira) was incorporated as an out-group in the phylogenetic analysis. The PCGs were individually aligned with the TranslatorX online platform using the MAFFT algorithm with the GBlocks parameters and default settings 36

Results and Discussion
Genome size and organization. In this study, the closed-circular complete mitogenome (16796 bp) of the black soft-shell turtle, N. nigricans was determined under the 'National Faunal Genome Resources (NFGR)' program at Zoological Survey of India, Kolkata (Fig. 1). The generated mitogenome was submitted to the GenBank database through Sequin submission tool (https://www.ncbi.nlm.nih.gov/projects/Sequin/) and acquired the accession number: MG383833. The comparative analysis showed the length of mitogenomes ranging from 16489 bp in L. punctata to 17499 bp in P. cantorii. All the 37 genes were annotated in the de novo assembled genome; 13 PCGs, 22 tRNAs, two ribosomal RNAs (rRNAs), and a major non-coding CR (Table 1). Among them, 28 genes (12 PCGs, two rRNAs, and 14 tRNAs) were located on the heavy strand (H strand) and others (nad6 and eight tRNAs) were located on the light strand (L strand). The organization of genes in N. nigricans mitogenome resulted in a similar arrangement as is described for typical vertebrate species 38 . The comparative analysis revealed that similar gene arrangements in both L and H strand were present in other Trionychidae species (Table S2). The total length of PCGs, tRNAs, and rRNAs was 11251 bp, 1551 bp, and 2593 bp respectively. The nucleotide composition was biased toward A + T (62.16%) within the complete mitogenome of N. nigricans. The A + T composition was 61.26%, 63.82%, 61.47%, and 68.52% in PCGs, tRNAs, rRNAs and CR respectively (Table S1). In other Trionychidae species, the nucleotide composition within the complete mitogenomes is similar to N. nigricans and biased towards A + T with a variable frequency ranging from 58.48% (T. triunguis) to 62.97% (N. formosa). In the complete mitogenome of N. nigricans, the AT and GC skew was 0.197 and −0.400 respectively. The AT skew of other Trionychidae species varied from 0.127 (P. sinensis) to 0.208 (T. triunguis) and GC skew were varied from −0.412 (T. triunguis) to −0.363 (P. steindachneri). The positive AT skew in most of the genes indicated more Adenine (A)s than Thymine (T)s in the complete mitogenomes of Trionychidae species (Table S1).
Overlapping and intergenic spacer regions. Six overlapping sequences with a total length of 13 bp were identified in the complete mitogenome of N. nigricans (Table 1). These sequences varied in length (1 bp to 4 bp) with the longest overlapping region present between NADH dehydrogenase subunit 4 L (nad4l) and NADH dehydrogenase subunit 4 (nad4) as well as in between ATP synthase F0 subunit 8 (atp8) and ATP synthase F0 subunit 6 (atp6). Further, the intergenic spacers within this mitogenome, spread over 19 regions and ranged in size from 1 bp to 33 bp with a total length of 124 bp. The longest spacer (33 bp) occurred between trnN (GTT) and trnC (GCA). The comparative analysis showed the number of overlapping sequences varied from five (N. formosa) to 14 (P. sinensis) with a length variation of 7 bp to 143 bp in other Trionychidae species. The longest intergenic spacer (33 bp) is present between trnN (GTT) and trnC (GCA) of L. scutata and L. punctata (Table S3).  (Table S4). The codons of each amino acids were conserved in all PCGs of the compared Trionychidae species. Moreover, codons with A or T in the third position were overused in comparison to other synonymous codons. For example, the codon for Glutamine (Gln) as CAG was rare as compared to CAA; and codon for Glutamic acid (Glu) as GAG was rare as compared to GAA. The comparative RSCU analysis indicated a significant fall in frequency of the TCG codon in Serine (Ser) in N. nigricans, N. formosa, and R. swinhoei (Table S5). The RSCU analysis of N. nigricans revealed that, the codons encoding for Asparagine (Asn), Histidine  Table S1). The comparative results showed that the anticodons for all tRNAs were similar in Trionychidae species (Table S6). It has been suggested that the unique arrangements in the 'WANCY' region in vertebrates, have a significant role to fold a secondary structure and replace the function of light strand in mitogenome 39  The rrnS gene is located in between the trnF (GAA) and trnV (TAC); however, the rrnL gene is located in between trnV (TAC) and trnL2 (TAA) in N. nigricans as also occurs in other Trionychidae species (Table S1). Comparision of tRNAs secondary structure. The wobble base pairing is a key characteristic of RNA structure and often susbtitutes the GC or AT base pairs due to thermodynamic stability. These features play essential functional roles in a wide range of biological processes 40 . RNA-binding proteins adhere to unique G-U sites by recognizing chemical features and differ from Watson-Crick and other mismatched pairs 41 . Thus, the extensive comparisons of tRNAs are crucial for understanding the structural and functional features of the mitogenomes 42 .

Locus
In N. nigricans, most of the tRNAs were folded into classic clover-leaf secondary structures, except for trnS1 (GCT). Similar characteristics were also present in all other Trionychidae species (Fig. 2).  (Fig. 2). Thus, the resultant alteration in base composition of tRNAs reflects the nucleotide composition in both H-and L-strands.

Control region (CR). The mitochondrial control region (CR) is generally divided into three functional
domains: the termination associated sequence (TAS) domain, the central conserved domain (CD), and the conserved sequence block (CSB) domain 43 . The CD domain comprises more highly conserved sequences than TAS and CSB domain with variable numbers of tandem repeats (VNTRs) and origin of the H-strand transcription 44 . The pattern of conserved sequences vary among different vertebrate groups: humans 45 , birds 46 , and reptiles including turtles [47][48][49][50] . The length of CR was 1290 bp and the A + T content was 68.52% in N. nigricans (Table S1). Three conserved blocks (CSB-1, CSB-2, and CSB-3) were depicted in the CSB domains of N. nigricans CR, as similar to other turtle species of suborder Cryptodira 51,52 . It is reported that, the CSB-2 and CSB-3 were absent in the CRs of Pleurodira species 44 . These unique features suggest that the regulatative mechanisms of transcription might vary between Cryptodira and Pleurodira suborders, and could provide significant diagnostic characters. Further four VNTRs; (ATTAT) 8 , (49 bp)2, (AT) 7 , and (TATTA) 20 were observed in N. nigricans CR with spacer of 151 bp was observed between 1 st and 2 nd tandem repeat; 781 bp between 2 nd and 3 rd tandem repeat, and 87 bp between 3 rd and 4 th tandem repeat (Fig. 3).  (Table S1). The frequency of tandem repeats is higher at the 3ʹ end of the CR in all Trionychidae species. The numbers VNTRs were varied from two (A. spinifera, C. indica, R. swinhoei, and T. triunguis) to eight (P. sinensis). A single short tandem repeat (STRs) of AT was found in two Trionychidae species (L. scutata, and L. punctata). The length of the tandem repeats was variable in other Trionychidae species, ranging from 2 bp (A. ferox, C. indica, D. subplana, L. scuata, L. punctata) to 57 bp (P. cantorii). Exceptionally, overlapping of tandem repeats was found in A. ferox, P. cantorii, and P. sinensis. Comparative analysis revealed a single tandem repeat of (AT) 43 , (ATATTTAT) 13 , (AT) 27 and (TATATATTA) 13 in A. ferox, A. spinifera, C. indica, and R. swinhoei respectively. Exceptionally, in T. triunguis, L. scutata, and L. punctata tandem repeats were lacking at the 5′ end of CR in comparison with other species. Further, two tandem repeats (TAAACACAC) 2 and (ATCTATAT) 20 with intergenic spacer of 71 bp was observed in T. triunguis and one tandem repeat of (AT) 26 and (AT) 28 was observed in L. scutata and L. punctata respectively (Fig. 3). Further, the origin of L-strand replication was identified in 13 Trionychidae species including N. nigricans. The conserved motif 5′-GACATA-3′ in CSB-1 block and stable stem-loop structure was identified in N. nigricans and other Trionychidae species CRs as identical among other vertebrates 46,53 . The stem and loop lengths ranged from 2 to 7 bp and from 4 to 23 nucleotides, respectively (Fig. 3). The structural features, and duplication of CR play an important role in regulating transcription and replication in the mitochondrial genome 54,55 . The present study evaluated the genetic features of CR among the studied Trionychidae species mitogenomes including N. nigricans that will be helpful to hypothesize the evolutionary pattern of this group.

Synonymous and non-synonymous substitutions. Darwinian selection is an important hypothesis
for recognizing the evolutionary pattern of positively selected genes and plays an important role behind species divergence 56,57 . The analysis of mitogenome for detecting positive selection of PCGs serve as supportive technique for understanding the influences of natural selection in evolution and amending protein function [58][59][60][61] . The comparison of synonymous (Ks) and nonsynonymous (Ka) substitution rates in PCGs, evidenced for darwinian selection and adaptive molecular evolution in vertebrates [62][63][64] . The ratio of Ka/Ks was generally known as an indicator of selective pressure and evolutionary relations at molecular level among the homogenous or heterogeneous species 65    Phylogenetic analysis. Phylogenetic relationships of the new N. nigricans mitogenome are shown relative to 19 species of Testudines from two suborders and seven families (Fig. 4). The phylogenetic tree revealed 13 species of Trionychidae family were clustered together with 100 percent bootstrap support and congruent with the previous phylogeny based hypothesis in Testudines systematics 2,3 . The current phylogeny places N. nigricans as sister group to N. formosa and supports the previous classification of Trionychidae species. The 11 species form three clades corresponding to tribes Amydona, Apalonina and Gigantaesuarochelys in the subfamily Trionychinae. The species of Geoemydidae and Testudinidae families clustered together and shows sister clade of Platysternidae and Emididae species (Fig. 4). The Cheloniidae species (sea turtle) joined the base of the ML tree. This phylogenetic analysis combined with all mitochondrial PCGs, is consistent with the previously described phylogeny using partial sequences of mitochondrial and nuclear genes 2,6,15,16,25 . However, further taxon sampling from different taxonomic ranks and their mitogenomics data would be useful for better understanding of the phylogenetic and evolutionary relationships among Testudines. In addition, the present study further confirmed the presence of N. nigricans in the tributaries of river Bramhaputra in Arunachal Pradesh state in northeast India. The previous and present locality information in wild habitats updated range distribution of N. nigricans in Bangaladesh and India (West Bengal, Assam, Tripura, Arunachal Pradesh).