Assembling mitogenome of Himalayan Black Bear (U. t. laniger) from low depth reads and its application in drawing phylogenetic inferences

The complete mitogenome of Himalayan black bear (Ursus thibetanus laniger) from Indian Himalayan region was assembled following the modified approach of mitochondrial baiting and mapping using the next-generation sequencing reads. The complete mitogenome was of 16,556 bp long, consisted of 37 genes that contained 13 protein-coding genes, 22 tRNAs, 2 rRNAs and 1 control region. The complete base composition was 31.33% A, 15.24% G, 25.45%C, and 27.98%T and gene arrangement was similar to the other sub-species of Asiatic black bear. The relative synonymous codon usage analysis revealed the maximum abundance of Isoleucine, Tyrosine, Leucine and Threonine. The assembled mitogenome of U. t. laniger exhibited 99% similarity with the mitogenomes of Himalayan black bear available from Nepal and Tibetan Plateau-Himalaya region. The findings of the present study has proven low depth sequencing data, adequate and highly efficient in rapid recovering the mitochondrial genome by overcoming the conventional strategies of obtaining long-range PCR and subsequently drawing phylogenetic inferences.

The Asiatic black bear (Ursus thibetanus) with wide range distribution, consists of seven well recognized subspecies, i.e. Japanese black bear (U. t. japonicus) in Japan, Ussuri black bear (U. t. ussuricus) in far-east Russia, northeast China, and Korea, Formosan black bear (U. t. formosanus) in Taiwan, Indochinese/Sichuan black bear (U. t. mupinensis) in Southwest China, Baluchistan black bear (U. t. gedrosianus) in South Pakistan and Iran, Tibetan black bear (U. t. thibetanus) in the eastern Himalayas and southeast Asia, and Himalayan black bear (U. t. laniger) in the western Himalayas 1 . Among the seven sub-species of Asiatic black bear, the Himalayan black bear (henceforth, HBB) is distributed in between 1200 and 3300 m asl all along the forested habitats of the Himalayas and hills of northeastern states of India covering an area of about 270,000 km 2 with an estimated population of 5400 to 6700 divides 2,3 . A small population of HBB is patchily distributed across Pakistan, northwest India, and likely northeast India and Nepal 4 . In India, HBB has experienced several challenges including habitat loss, population decline due to hunting/poaching for pelts, paws and gall bladders and retaliatory killing in the response to Human-Bear Conflicts 3,5,6 . Considering the increased threats and species vulnerability in wild, HBB is listed as Vulnerable in the Red list of IUCN 1 and categorized under the Schedule-I of Indian Wildlife (Protection) Act 1972. Complete mitochondrial genomes of six sub-species of Asiatic black bear except the U. t. gedrosianus have been sequenced using long range PCR strategy [7][8][9][10] . However, no study has provided the detailed genome organization and comparative assessment for gene arrangements and structural consistency in the t-RNA model, important in variety of cellular processes controlling species life history traits.
Further, Next Generation Sequencing (NGS), which rapidly captures the broad spectrum of mutations and has dramatically revolutionized DNA sequencing 11 , and has been popularized to address questions in the field of molecular ecology 12 , phylogeographic 13 , population genetics 14 and phylogenetic studies 15 . Most studies in bears have made use of the conventional strategy of combining long-range PCR with subsequent primer walking for sequencing the complete mitogenomes [7][8][9][10]16 . However, conventional sequencing is tedious and challenging in particular for optimizing long range PCRs. In contrast, revolution in NGS technology has made considerable Scientific Reports | (2021) 11:730 | https://doi.org/10.1038/s41598-020-76872-y www.nature.com/scientificreports/ decrease in cost and increase in throughput (millions of short sequencing reads) and accuracy 17,18 . Several studies have demonstrated the application of NGS in drawing the phylogenetic inferences, genome organization and comparative assessment among the sympatric species by mapping and assembly the complete mitogenomes from low depth sequencing reads [19][20][21] . Therefore, to overcome the unwieldy process of conventional sanger sequencing, we assembled the complete mitogenome directly from low depth NGS reads following a modified approach of mitochondrial baiting and mapping reported earlier by Hahn et al. 22 . We demonstrated organization of complete mitogenome of HBB for the first time and presented its structure consistency of tRNA model with the other sub-species of Asiatic black bear. We also testified the assembled mitogenome of U. t. laniger in re-construction of bear phylogeny among the other black bear subspecies.

Results and discussion
Genome organization. A total of 3.73 GB data of ~ 1.6 × coverage was obtained from Illumina HiSeq 2500 platform which yielded 12,418,314 reads. With reference-based assembly, we obtained the longest contig of 16, 556 bp length that represented the complete mitochondrial genome of U. t. laniger and submitted to Gen-Bank (accession no. MN935768). The observed total AT and GC contenst were 59.3% and 40.7% (Fig. 1a), and mitogenome showed positive AT skewness (+ 0.057), indicating that adenine bases occurred more frequently than the thymine, whereas GC-skew was negative, − 0.25. The assembled mitogenome encoded 37 genes, of which, 13 were PCGs, 22 tRNAs, 2rRNAs and one control region. The arrangement and distribution of genes were similar to the other mammalian species 23 . The overall nucleotide composition was: 31.33% -A, 25.45% -C, 15.24% -G, 27.98%-T. In exception to ND6 and eight tRNA genes (trnQ, trnA, trnN, trnC, trnY, trnS2, trnE and trnP), most genes were encoded on the heavy strand (Fig. 2). The five pairs of overlapping regions in mitogenome were observed among trnV/rrnL, trnI/trnQ, atp8/atp6, nd4l/nd4, and trnT/trnP. The overlapping regions ranged from − 1 to − 34 bp. The smallest overlapping region was located in between trnV/trnL and trnT/ trnP (1 bp) whereas the longest overlapping was in between ATP8/ATP6 (40 bp). Besides, 20 intergenic spacers were observed between the mitochondrial regions ranging from 1 to 33 bp length; the longest space was found between trnN/trnC ( Table 1).
PCGs and rRNAs. All 13 PCGs had ATG start codon except nd2 and nd6 which encoded by ATA and TCC, respectively. The total length of PCGs was 11,316 bp which shared 68.3% of complete mitogenome ( Table 2). The average base composition in PCGs were 30.1%-A, 28.6%-T, 14.2%-G and 27.2%-C. The abundance of AT (%) was higher than GC (%).Comparative analysis of U. t. laniger with the other subspecies of Asiatic black bear and Ursus americanus exhibited relatively high adenine and cytosine contents than thymine and guanine. All the PCGs showed positive AT skew except for the genes cox1, cox3, nd3 and nd4l whereas GC skew showed negative skewness for all the genes (Fig. 1b). The PCGs region consisted of twelve heavy strands and one light strand as commonly found in other vertebrate species [24][25][26] . The PCGs region consisted of seven NADH dehydrogenases, three cytochrome c oxidases, two ATPases and one cytochrome b genes.
The mtDNA ribosomal region is known to be highly conserved and widely used for phylogenies of higher and middle category level, such as phyla, family and genera 24,26 . The length of 12S rRNA and 16S rRNA genes was 966 bp and 12,582 bp, respectively. The 12S rRNA gene was positioned between the tRNA-Phe and tRNA-Val and 16S rRNA gene was positioned between tRNA-Val and tRNA-Leu2. Similar to PCGs, the AT skewness was positive (0.208) and the GC skewness was negative − 0.086) ( Table 2) and the total AT content of rRNA was 59.4% which was in correspondence with other sub-species of Asiatic black bear (Table 2).
Transfer RNAs and control region. The length of the tRNA was 1508 bp, overall AT and GC content was 64.1% and 35.9% respectively. The average AT and GC skewness values for tRNAs were 0.026 and 0.083, respectively ( Table 3). The results exhibited 21 tRNAs can fold into cloverleaf structure except for tRNA ser which lacks the dihydrouridine arm (Fig. 3). The tRNA genes length varied from 59 to 75 bp and out of 22 tRNAs, fourteen were located on heavy strand and eight were on the light strand ( Table 1).
The control region was located between trnP and trnF and the length was 1109 bp in size and contributed to 6.7% of the whole mitogenome with containing a microsatellite repeat, (AT) 4 and seven 10 bp tandem repeats (Table S1). The A + T composition was 58.7%, higher than that of G + C content. The AT and GC skewness values were negative, − 0.069 and − 0.142, respectively ( Table 2). RSCU and reconstruction of bear phylogeny. The relative synonymous codon usage showed the highest utilization of codons of UAC, UUG, AUC and ACC among all the PCGs (Fig. 4). The RSCU analysis revealed the most occurred amino acids in protein-coding genes of U. t. laniger mitochondrial genome were Ile, Tyr, Leu, and Thr with 449, 482, 439 and 419 codon frequencies, respectively (Table S2, Fig. 5), whereas Met, Cys and Asp were less abundant. We did not find any difference in the RSCU of the U. t. laniger when compared with the other subspecies of Asiatic black bear. The phylogenetic analysis showed that the two mitogenomes i.e. MG066704.2 and MH281753.1 shared 99% similarity with the assembled mitogenome of U.t. laniger. These two mitogenomes were sequenced from Nepal 10 and Tibetan Plateau-Himalaya region 27 which are the known distribution ranges of the U.t. laniger, exhibiting an obvious trend of clustering in phylogeny with strong bootstrap support (Fig. 6 www.nature.com/scientificreports/ Further, tRNAs secondary structure of U .t. laniger were compared with the other taxon of Ursids whose complete mitogenomes were available. The comparison showed more than 90% structure similarity with MH281753.1 (99%), MG066704.2 (99%), EF076773.1 (95%), EF19666.1 (93%) with z-score value of more than 10.0 and lesser similarity with EF196665.1 (39%) and EF212882.1 (46%) ( Table S3) having low structure stability which was also evident from the phylogenetic analysis. We found no functional change in wobble position of anticodon (UAA) except in Ailuropoda melanoleuca (AAG). The pairwise genetic distances matrix, calculated based on Kimura 2-parameter model indicated that assembled mitogenome of U. t. laniger showed highest genetic differentiation with U. t. mupinensis (0.019) and lowest with the subspecies of Asiatic black bear sequenced from Nepal and Tibetan Plateau-Himalaya region (0.001), expectedly the HBB, U. t. laniger within the species of U. thibetanus (Table S4).  Direction of gene signal is indicated by arrows. Protein-coding genes are shown in silver color arrows, rRNA genes in purple color arrows, tRNA genes in light pink color arrows and non-coding region in white color. The GC content is plotted using a black color; GC-skew is plotted using green and dark pink color. The figure was drawn using CGView online server (https :// stoth ard.afns.ualbe rta.ca/cgvie w_serve r/) with default parameters.  Quality check and reference-based assembly. Quality screening of raw reads was done using FastQC (https ://www.bioin forma tics.babra ham.ac.uk/proje cts/fastq c/) and reads with low quality (Q < 20) and shorters (< 50 bp) were filtered out using NGS QC toolkit (https ://www.nipgr .res.in/ngsqc toolk it.html). Usable reads were mapped against reference genome (MH281753) using bwa-aln (0.7.17) and then the fishing reads were grouped into extended reads (blocks) and the resultant contig was re-mapped with the filtered reads in order to increase the correctness of assembly using CLC genomics workbench version 12.0.3 (https ://www.qiage nbioi nform atics .com/produ cts/clc-genom ics-workb ench/) with default parameters One of the longest contigs that represented the assembled complete mitochondrial genome of HBB, was thus generated (Fig. 7).