Complete mitogenome of the endangered and endemic Nicobar treeshrew (Tupaia nicobarica) and comparison with other Scandentians

The Nicobar treeshrew (Tupaia nicobarica) is an endangered small mammal endemic to the Nicobar Island of the Andaman Sea, India regarded as an alternative experimental animal model in biomedical research. The present study aimed to assemble the first mitochondrial genome of T. nicobarica to elucidate its phylogenetic position with respect to other Scandentians. The structure and variation of the novel mitochondrial genome were analyzed and compared with other Scandentians. The complete mitogenome (17,164 bp) encodes 37 genes, including 13 protein-coding genes (PCGs), 22 transfer RNA (tRNAs), two ribosomal RNA (rRNAs), and one control region (CR). Most of the genes were encoded on majority strand, except nad6 and eight tRNAs. The nonsynonymous/synonymous ratio in all PCGs indicates strong negative selection among all Tupaiidae species. The comparative study of CRs revealed the occurrence of tandem repeats (CGTACA) found in T. nicobarica. The phylogenetic analyses (Maximum Likelihood and Bayesian Inference) showed distinct clustering of T. nicobarica with high branch supports and depict a substantial divergence time (12–19 MYA) from the ancestor lineage of Tupaiidae. The 16S rRNA dataset corroborates the taxonomic rank of two subspecies of T. nicobarica from the Great and Little Nicobar Islands. In the future, whole nuclear genome sequencing is necessary to further improve our understanding of evolutionary relationships among treeshrews, and will have implications for biomedical research.


Materials and methods
Sample collection, mitochondria separation, and DNA extraction. The museum sample of T. nicobarica is vouchered in 70% ethanol at the National Zoological Collections of Andaman and Nicobar Regional Centre, Zoological Survey of India, which was collected from the Galathia, Campbell bay (06.83 N 93.87 E) on 30th November 2018 (contact person: Govindarasu Gokulakrishnan, Email: gokul7701@gmail.com). No treeshrew was killed as the collected specimen was a natural kill; hence no prior permission was required in the present study. Before preserving, the muscle tissue sample was aseptically collected from the hind leg of the specimen with ample attention and stored in 70% ethanol at Mammal and Osteology section, Zoological Survey of India, Kolkata under voucher ID (Reg. No. 28532) for downstream molecular investigation. The collection locality map was prepared by the online platform (https:// noaa. maps. arcgis. com). We used whole mitochondria for the extraction of mitochondrial DNA as per standard protocol 29 . The tissue sample was homogenized with 1 ml buffer containing 0.32 M Sucrose, 1 mM EDTA, and 10 mM TrisHCl by the WiseTis HG-15 homogenizer. To remove the nuclei and cell debris, the working mixture was centrifuged at 700 g for 5 min at 4 °C. The supernatant was collected in 1.5 ml centrifuge tube and centrifuged at 12,000 g for 10 min at 4 °C to precipitate the mitochondrial pellet. The pellet was re-suspended in 500 µl of buffer (50 mM TrisHCl, 25 mM of EDTA, 150 mM NaCl) and incubated at 56 °C for 1-2 h along with 20 µl of proteinase K (20 mg/ml). The mitochondrial DNA was extracted by using QIAamp DNA Investigator Kit (QIAGEN Inc.) and the eluted volume was reduced to 100 µl to increase the mtDNA concentration. The quality of the extracted mtDNA was checked through 1% agarose gel electrophoresis, and the concentration was quantified with a NANODROP 2000 spectrophotometer (Thermo Scientific).
Sequencing, assembly and annotation. The complete mitogenome sequence and assembly were carried out at PHIXGEN Pvt. Ltd. Gurugram, India (http:// www. phixg en. com). The mitochondrial DNA (> 100 ng) was used in Illumina TruSeq Nano DNA HT library preparation kit for library assembly (Illumina, Inc, USA). The mtDNA was fragmented by ultra-sonication (Covaris M220, Covaris Inc., Woburn, MA, USA) and the A-tailed fragments were joined with the sequencing indexed adapters done by the Illumina kit. The mtDNA fragments (450 bp) were selected through sample purification. The amplified PCR library was examined using Bioanalyzer 2100 (Agilent Technologies, Inc., Waldbronn, Germany) with high sensitivity DNA chips. Total > 4 million raw reads were generated through Illumina NextSeq500 (150 × 2 chemistry) (Illumina, Inc, USA). The high-quality reads were downsampled to 2 million using Seqtk (https:// github. com/ lh3/ seqtk) and iterative assembly was performed by using NOVOPlasty v2.6.7 using default parameters 30 . The mitogenome of T. belangeri (accession no. NC_002521) was used as a reference seed to start the assembly. The typical circular representation of the generated mitogenome of T. nicobarica was plotted by CGView Server (http:// stoth ard. afns. ualbe rta. ca/ cgview_ server/) with default parameters 31 . Further, the contig was subjected to confirmation by the MITOS v806 online webserver (http:// mitos. bioinf. uni-leipz ig. de). The direction and arrangements of PCGs, tRNAs, and rRNAs were confirmed through MITOS online server (http:// mitos. bioinf. uni-leipz ig. de) 32

Dataset construction and comparative analyses.
On the basis of taxonomic classification, the mitogenomes of five Tupaiidae species were downloaded from GenBank and merged in the dataset for comparative analysis (Supplementary Table S1). The genome sizes and nucleotide compositions of all the studied species were calculated using MEGA X 33 . To calculate the base composition skew, we utilized previously known formula: AT skew = (A − T)/(A + T), GC skew = (G − C)/(G + C) 34 . The overlapping regions and intergenic spacers of T. nicobarica and other Tupaiidae species mitogenomes were calculated manually. The pairwise test of the synonymous (Ks) and non-synonymous (Ka) substitutions were calculated between T. nicobarica and other Tupaiidae species using DnaSPv6.0 35 . The comparative analysis of Relative Synonymous Codon Usage (RSCU) and relative abundance of amino acids were also calculated using MEGA X. The secondary structures of tRNA genes were affirmed by tRNAscan-SE Search Server 2.0 (http:// lowel ab. ucsc. edu/ tRNAs can-SE/) and ARWEN 1.2 36,37 . To speculate the putative domains and motif, the CR of T. nicobarica and other Tupaiidae species was screened from the database. The tandem repeats within the CR were predicted by the online Tandem Repeats Finder web tool (https:// tandem. bu. edu/ trf/ trf. html) 38 .

Phylogenetic analysis and divergence time estimation.
To assess the phylogenetic relationship, the dataset was constructed with the representatives of Scandentia, Dermoptera, Primates, Lagomorphs, and Rodents based on the previous literatures 15,25,39,40 . The 13 PCGs of 14 mitogenomes were aligned and concatenated using TranslatorX (with MAFFT algorithm with L-INS-i strategy and GBlocks parameters) and Sequence-Matrix v1.7.84537 41,42 . The best fit model (GTR + I + G) was estimated by PartitionFinder 2 using lowest Bayesian information criterion (BIC) criterion 43 and the maximum-likelihood (ML) tree was constructed using the IQ-Tree web server with 1000 bootstrap support 44 . The estimation of divergence times among Tupaiidae species were calculated by Bayesian relaxed clock method in BEAST v2.4.7 45 . The GTR + I + G substitution model, empirical base frequencies, and relaxed uncorrelated log-normal clock with the Yule speciation model was applied as Tree prior. A total of four fossil calibration points were applied in the phylogeny to constraint the analysis as described in the previous study 15,[46][47][48] Table S2). The 16S rRNA sequence of the Sunda flying lemur, Galeopterus variegatus (Accession No. NC_004031) was used as an out-group in the second dataset. The genetic distances were calculated using MEGA X and the best fit model for this dataset was estimated using Mr. MODELTEST v2with lowest BIC (Bayesian Information Criterion) score 51 . The Bayesian tree was constructed in Mr. Bayes 3.1.2 by selecting nst = 6 for GTR + G + I model and four (one cold and three hot) metropolis-coupled Markov Chain Monte Carlo (MCMC), was run for 10,000,000 generations with 25% burn in with trees saving at every 100 generations 52 . The MCMC analysis was used to generate the convergence metrics, till the standard deviation (SD) of split frequencies reached under 0.01 and the potential scale reduction factor (PSRF) for all parameters approached 1.0. To represent the generated BA tree, the web based iTOL tool (https:// itol. embl. de/) was used 53 .

Results and discussion
Mitogenome structure and organization. The mitogenome (17,164 bp) of the endangered Nicobar treeshrew, T. nicobarica was determined in the present study (GenBank accession no. MW751815). The mitogenome contained 37 genes, comprising 13 PCGs, 22 tRNAs, 2 rRNAs, and a major non-coding CR. Among them, nine genes (nad6 and eight tRNAs) were placed on the negative strand, while the remaining 28 genes were placed on the positive strand (Table 1, Fig. 1). In the order Scandentia, the length of the Tupaiidae mitogenome varied from 16,183 bp (T. montana) to 17,164 bp (T. nicobarica). All Tupaiidae species showed the same gene arrangement as observed in typical vertebrate's mitogenome 54 . The nucleotide composition of the T. nicobarica mitogenome was A + T biased (58.3%), as in all Tupaiidae species ranging from 58.3% (T. nicobarica) to 59.72% (T. tana) ( Table 2). The AT skew and GC skew were 0.11 and − 0.30 in the mitogenome of T. nicobarica. The comparative analysis showed that the AT skew ranged from 0.08 to 0.11 and the GC skew from − 0.28 to − 0.30 (Table 2). A total of 14 overlapping regions with a total length of 87 bp were identified in T. nicobarica mitogenome. The longest overlapping region (43 bp) was observed between the ATP synthase F0 subunit 8 (atp8) and ATP synthase F0 subunit 6 (atp6). Further, a total of 14 intergenic spacer regions with a total length of 68 bp Protein-coding genes. The total length of PCGs was 11,410 bp in T. nicobarica, which represents 66.47% of the complete mitogenome. The nucleotide composition of the T. nicobarica PCGs was A + T biased (57.95%), as in all Tupaiidae species ranging from 57.95% (T. nicobarica) to 59.61% (T. tana) ( Table 2). The AT skew and GC skew were 0.09 and − 0.36 in the PCGs of T. nicobarica ( Table 2). Most of the PCGs of T. nicobarica initiated with an ATG start codon; however, the ATC initiation codon was found in the NADH dehydrogenase subunit 2 (nad2), ATT in NADH dehydrogenase subunit 3 (nad3), ATA in NADH dehydrogenase subunit 5 (nad5). The TAG termination codon was used by six PCGs, TAA by four PCGs, AGA by Cytochrome oxidase subunit 1 (cox1), AGG by NADH dehydrogenase subunit 6 (nad6), and incomplete T(AA) by NADH dehydrogenase subunit 4 (nad4), respectively. The comparative study revealed that, most of the PCGs in other Tupaiidae species were initiated by ATG start codon and terminated by TAA stop codon (Supplementary Table S4). The analysis of mitogenome for detecting positive selection of PCGs assists to understand the influences of natural selection in evolution and protein function 55,56 . The comparison of synonymous (Ks) and nonsynonymous (Ka) substitution rates in PCGs, witnessed for Darwinian selection and adaptive molecular evolution 57,58 . It is reported that, for positive selection Ka/Ks > 1, for neutrality Ka/Ks = 1, and for negative selection Ka/Ks < 1 59 .  Table S5, Supplementary Fig. S1). Most of the PCGs show Ka/Ks values of < 1, which indicated a strong negative selection among the studied Tupaiidae species, that reflects natural selection works against deleterious mutations with negative selective coefficients as highlighted general patterns in other vertebrates 60 . The comparative RSCU analysis indicated a significant fall in the frequency of GCG codon in Alanine (Ala) was observed in T. nicobarica, T. montana, T. minor, T. splendidula, and T. tana, except in T. belangeri with CCG in Proline (Pro) (Supplementary Fig. S2).
Ribosomal RNA and transfer RNA genes. The total length of two rRNA genes of T. nicobarica was 2,519 bp, compared to a range from 2,508 bp (T. montana) to 2,520 bp (T. belangeri) among other Tupaiidae species in the present dataset. The AT content within rRNA genes was 58.56%, while the AT and GC skew were 0.22 and − 0.09 respectively observed in T. nicobarica rRNAs ( Table 2). A total of 22 tRNAs were found in the T. nicobarica mitogenome with a total length of 1,497 bp. In other Tupaiidae species, the length of tRNAs varied from 1,493 bp (T. minor) to 1,564 bp (T. belangeri). The AT content within tRNA genes was 60.86%, while the AT and GC skew were 0.11 and − 0.12, respectively observed in T. nicobarica tRNAs ( Table 2). Most of the tRNA genes www.nature.com/scientificreports/ were predicted to be folded into classical cloverleaf structures, except trnS1 (without DHU stem and loop) and trnK (without DHU loop) ( Supplementary Fig. S3). The conventional pairings (A=T and G≡C) were observed in most of the tRNAs bases 61 ; however, wobble base pairing was observed in the stem of 14 tRNAs (trnA, trnN, trnQ, trnE, trnC, trnG, trnL1, trnK, trnL2, trnP, trnS2, trnT, trnY, and trnW) (Supplementary Fig. S3). The wobble base pairing is a key feature of RNA structure and often substitutes the conventional base pairs due to thermodynamic stability. These characteristics play crucial functional roles in a wide range of phenomena 62  In the T. nicobarica CR, the AT and GC skew was 0.12 and − 0.30 ( Table 2). The CR is also involved in the initiation of replication and is positioned between trnP and trnF for most of the Tupaiidae including T. nicobarica. The ETAS domain was divided into two regions: ETAS1 (60 bp) and ETAS2 (67 bp), while the CSB domain was Table 2. Nucleotide composition of the mitochondrial genomes of different treeshrew mtDNA. The A + T biases of the complete mitogenome, PCGs, tRNAs, rRNA, and CRs were calculated by AT-skew = (A − T)/ (A + T) and GC-skew = (G − C)/(G + C), respectively. www.nature.com/scientificreports/ further divided into three regions: CSB1 (25 bp), CSB2 (17 bp), and CSB3 (18 bp). After CSB3, a six base pair (CGT ACA ) tandem repeats were found 60.3 times in T. nicobarica, while eight base pair (CAC ACA TA) were found 23.8 times in T. belangeri (Fig. 2). Due to the short nucleotide length, no tandem repeats were found in other Tupaiidae species CRs. The structural features of CR play an important function in influencing transcription and replication in the mitochondrial genome 65,66 . The present study evaluated the genetic features of CR among the studied Tupaiidae species mitogenomes including T. nicobarica that will be helpful to speculate the evolutionary pattern of this group.
Phylogenetic inference. The phylogenetic position of Scandentia is repetitively argued and examined within the eutherian tree 15,25,39,40 . The treeshrews are widely considered as living fossils due to their approximating ancestral lineages with primates 67 . Based on the anatomical evidence, Primates, Chiroptera, Dermoptera, and Scandentia were hypothetically within the superordinal clade Archonta without considering the paleontological or molecular evidence 68,69 . Later on, the phylogenetic position of Scandentia has been studied based on the complete mitochondrial DNA sequences of wider group of taxa and corroborated a closer relationship with Lagomorpha 24,25,39 . Further, multiple loci of mitochondrial genes has been assessed to check the phylogeny of treeshrews and diversification and the timescale of diversification in Southeast Asia 15,27,40,70 . The present ML and BA phylogenies clearly discriminate T. nicobarica from other congeners and are congruent with earlier evolutionary hypotheses of Scandentia (Fig. 3, Supplementary Fig. S4). Further, using four calibration points from earlier studies, the present mitogenome-based dating analysis indicates that, the Tupaiidae species (Scandentia) were diverged from Primates and Dermoptera during the Cretaceous period (81-101 MYA). However, the basal node of Scandentia, Primates and Dermoptera was diverged from the Lagomorphs and Rodents during the same Cretaceous period (82-125 MYA). As a whole the divergence time estimations are little deviated due to the exclusion of Lagomorphs and Rodents in earlier analysis 15 . However, the representative of Scandentia family members (Ptilocercidae and Tupaiidae) in earlier analysis revealed that, they were diverged during Neogene to Paleogene period. Due to the lack of mitogenomic information of all extant Tupaiidae species, we restricted our analysis with few representative species. Diversification of the studied Tupaiidae species occurred during the Pliocene to Miocene epoch . The endemic Nicobar treeshrew, T. nicobarica was diverged from the common ancestor lineages of other Tupaiidae species during the Miocene epoch (12)(13)(14)(15)(16)(17)(18)(19) (Fig. 3). Further, based on 16S rRNA genes (1667 bp), we evaluated the status of two known subspecies of T. nicobarica from the Great and Little Nicobar Islands. The T. nicobarica nicobarica and T. nicobarica surda showed cohesive clustering in the BA tree as compared with other species (Fig. 4). Both the subspecies depicted 11 variable sites and maintained less genetic distance (0.7%) with each other. The 16S rRNA based topology showed a sister relationship of T. nicobarica with T. javanica, distributed in Sumatra and Java.
Biogeographic connection and conservation implication. The tectonic drifts allowed multiple possibilities for dispersal and colonization events of many animals into the same or distant geographical distribution. Due to the adjacent biogeographic realms, the biological affinities between the Indian mainland and Southeast Asia have been well documented 71 . However, the faunal diversity of the AN archipelago and their biotic www.nature.com/scientificreports/ networks is still anonymous in spectacular aspects. The bathymetric study evidenced that the well-developed seamounts have been detected on the Andaman seafloor, which extended up to Sumatra and Java Islands 72,73 .
Considering the skeletal variation, the treeshrew species showed intraspecific variation depending upon their distribution in mainland and island ecosystems 74 . A recent molecular study also elucidates the biogeographic connection of smaller mammals in the AN archipelago with the Indo-Malayan and Sundaic realms 7 . The present mitogenome based phylogeny also manifested the close relationship of T. nicobarica with T. minor, T. tana, T. splendidula, and T. montana (distributed in Thailand, Peninsular and East Malaysia, Brunei Darussalam, Sumatra, and Indonesia) as compared with the widespread species T. belangeri known from South and Southeast Asia. Further, the single gene (16S rRNA) based phylogeny showed sister relationship of T. nicobarica with T. javanica (distributed in Sumatra and Java) as compared with other congeners. The two subspecies of T. nicobarica were discriminated only by their different distributional pattern in two distinct islands. Prior to this study, they were neither examined morphologically nor tested genetically to assure their taxonomic status. The present molecular based assessment clearly distinguished two subspecies with 0.7% genetic distance by 16S rRNA gene. This preliminary molecular information will help for rapid and reliable identification of this highly threatened and endemic species from the Great and Little Nicobar Islands. However, further research can be done with the extensive sampling and generation of microsattelite data to substantiate their population genetic structure to formulate precise conservation action plans.
Considering the conservation implication, the previous studies reported that, this arboreal mammal species confronted several threats due to the forest loss and fragmentation, and ongoing road construction from Galathia to Indira Point at the Great Nicobar Island 75 . Although the species is listed on the IUCN with decreasing population trend, it has not yet listed in the Indian wildlife (Protection) Act, 1972. Other than a single ecology and behavior study and a nest record, no ample assessment has been approached so far 3,76 .
Besides, the treeshrews species were considered as a significant model for studying hepatitis and influenza H1N1 viral infections 19,20 . A recent study characterized the genome sequence to demonstrate the genetic basis of signaling pathways in nervous and immune systems of the Chinese treeshrew (Tupaia belangeri chinensis) and evidenced as a potential model for biomedical research 21 . The T. belangeri chinensis also maintained sufficient intraspecific variation (5.4-9.5%) with the Northern Treeshrew, T. belangeri in all 13 PCGs. Hence, the generation of molecular information from different geographical region is crucial for elucidating the actual evolutionary history of this small mammal species.
We propose the whole genome sequencing of T. nicobarica is essential as a genetic resource for conservation purposes. The genome sequence will also assist to predict the signaling pathways linked with many pathogenic microorganisms as well as able to develop potential mitigations programs in advance. As the population of this treeshrew is confined to the insular habitats in the Nicobar Islands, we propose an integrated approach with taxonomy, ecology, and further molecular studies to save this endemic species before it reaches to the brink of extinction.

Data availability
The following information was supplied regarding the accessibility of DNA sequences: The generated complete mitochondrial genome sequences of Tupaia nicobarica are deposited in GenBank of NCBI under accession number MW751815.