The Complete Mitochondrial Genome of endemic giant tarantula, Lyrognathus crotalus (Araneae: Theraphosidae) and comparative analysis

The complete mitochondrial genome of Lyrognathus crotalus is sequenced, annotated and compared with other spider mitogenomes. It is 13,865 bp long and featured by 22 transfer RNA genes (tRNAs), and two ribosomal RNA genes (rRNAs), 13 protein-coding genes (PCGs), and a control region (CR). Most of the PCGs used ATN start codon except cox3, and nad4 with TTG. Comparative studies indicated the use of TTG, TTA, TTT, GTG, CTG, CTA as start codons by few PCGs. Most of the tRNAs were truncated and do not fold into the typical cloverleaf structure. Further, the motif (CATATA) was detected in CR of nine species including L. crotalus. The gene arrangement of L. crotalus compared with ancestral arthropod showed the transposition of five tRNAs and one tandem duplication random loss (TDRL) event. Five plesiomophic gene blocks (A-E) were identified, of which, four (A, B, D, E) retained in all taxa except family Salticidae. However, block C was retained in Mygalomorphae and two families of Araneomorphae (Hypochilidae and Pholcidae). Out of 146 derived gene boundaries in all taxa, 15 synapomorphic gene boundaries were identified. TreeREx analysis also revealed the transposition of trnI, which makes three derived boundaries and congruent with the result of the gene boundary mapping. Maximum likelihood and Bayesian inference showed similar topologies and congruent with morphology, and previously reported multi-gene phylogeny. However, the Gene-Order based phylogeny showed sister relationship of L. crotalus with two Araneomorphae family members (Hypochilidae and Pholcidae) and other Mygalomorphae species.

Mitochondrial genome sequencing and assembly. The Genotypic Technology Pvt. Ltd. Bangalore, India (http://www.genotypic.co.in/) had carried out the sequencing. The sequencing and assembly protocol was followed by our previous study 20,21 . The whole genome library of genomic DNA was sequenced using Illumina Hiseq. 2500 (2 × 150 base paired-end reads) (Illumina, USA) platform which yielded ~14 million reads. The TruSeq DNA Library Preparation kit (https://support.illumina.com/downloads/truseq) was used for the construction of the paired-end library with standard protocols. The trimming and filtering of the raw sequencing reads were done by using the NGS-Toolkit 22 to removing adapter contamination and low-quality reads with a cutoff of Phred quality scores of Q20. Burrows-Wheeler Alignment (BWA) tool 23 screened the high quality reads (1.4 million) using Seqtk (https://github.com/lh3/seqtk) and down sampled high-quality reads and assembled with SPAdes 3.9.0 24 , using Ornithoctonus huwena mitochondrial genome (NC_005925.1) as a reference.
Annotation, secondary structure prediction, and comparative studies. The annotation of PCGs and rRNAs were done by using MITOS web-server 25 . The annotation of the tRNA genes was the most difficult steps to define their location and gene boundaries. To search the tRNAs in the mitochondrial genome, the tRNAscan-SE 1.21 26 was used, but incapable to detect tRNAs. Further, we have used other spider tRNA sequences from GenBank to confirm the locations and boundaries of each tRNAs. Further, the anticodon arm motifs (or sometimes only the 3-bp anticodon sequence) were examined manually which were conserved among all spiders.
The start and stop codons of PCGs were observed by using the ClustalX program 27 . The PCGs of L. crotalus with other spider species were aligned by MEGAX 28 . For acquiring the accession number of the complete annotated L. crotalus mitochondrial genome from GenBank, the Sequin submission tool (http://www.ncbi.nlm.nih. gov/Sequin/) was used. The circular image of L. crotalus mitochondrial genome was drawn by using online server CGView 29 (http://stothard.afns.ualberta.ca/cgview_server/). The length and locations of spacer regions (overlapping and intergenic) of L. crotalus mitochondrial genome were detected manually.
The nucleotide composition, codon usages, relative synonymous codon usage (RSCU) was done by MEGAX. To calculate the skewness, we used the formula: AT skew = (A − T)/(A + T) and GC skew = (G − C)/(G + C) 30 . Codon usage bias was evaluated by calculating of effective number of codon (ENc) with the DnaSP6.0 31 . The relative synonymous codon usage (RSCU) graph is plotted in Microsoft Office Excel. The ratios of non-synonymous substitutions (Ka) and synonymous (Ks) substitutions were estimated in DnaSP6.0. The transition and transversion ratio versus genetic distance calculated by using DAMBE5 32 . We predicted the secondary structure of tRNAs and CR for L. crotalus mitochondrial genome. The tRNAs secondary structures were predicted by VARNA 3.93 33 . The prediction of secondary structure of CR was done by The Mfold web server 34 . preparation of data sets, model selection, phylogenetic analyses. Out of 32 available spider mitogenomes in the global database, 16 species mitochondrial genomes were retrieved and used in the present dataset based on representative families [8][9][10][11][12][13][14][15][16][17][18] . The mitogenome of Limulus polyphemus (order: Xiphosura) were also retrieved from GenBank and used as an out-group 35 (Table S1). The four data sets were prepared for phylogenetic analysis: (1) PCGs without GBlock 36 (Table S2). We used the online web portal The CIPRES Science Gateway v.3.1 (www.phylo.org/sub_sections/portal/) to perform Mr. Bayes 3.2 for BI analysis 38 and ML analysis using IQ-tree web server using four data sets 39 . The phylogenetic tree was visualized and edited using FigTree v1. 4.2 40 (http://tree.bio.ed.ac.uk/software/ figtree/). We used the MLGO web server 41 for gene order based phylogeny (http://www.geneorder.org/server. php) (Table S3).

Results and Discussion
Genome structure, organization and composition. L. crotalus complete mitochondrial genome (accession number MN072398) is 13,866 base pairs (bp) in length. This is the smallest mitochondrial genome of spider among all published mitochondrial genomes so far due to the presence of extremely truncated tRNAs. It included 37 genes: 13 PCGs, large and small rRNAs, 22 tRNAs and one non-coding region (D-Loop) with the origin of light-strand replication (OL) (Fig. 1, Table 1). The majority strand contains 22 genes and minority with 15 genes ( Table 1). The AT and GC content of nucleotide was 68.74% and 31.26%, respectively (Table S6), like other spider mitochondrial genomes. The highest AT content observed in tRNAs (69.68%,), followed by PCGs PCGs are shown as purple arrows, rRNA genes as pink arrows, tRNA genes as peach color arrows and CR regions as gray rectangles. The GC content is plotted using a black sliding window, as the deviation from the average GC content of the entire sequence. GC-skew is plotted using a colored sliding window (green and orchid color), as the deviation from the average GC skew of the entire sequence. The figure was drawn using CGView online server (http://stothard.afns.ualberta.ca/cgview_ server/) with default parameters. The species photograph was taken by the fourth author using Leica Microscope EZ4 HD with Leica software application suite (LAS EZ) and edited manually in Adobe Photoshop CS 8.0. (2020) 10:74 | https://doi.org/10.1038/s41598-019-57065-8 www.nature.com/scientificreports www.nature.com/scientificreports/ (68.72%), rRNAs (68.66%), and CR (67.67%). The mitochondrial genome showed positive AT (0.07) and negative GC (-0.36) skewness (Table S6) in contrast to other spider mitochondrial genome.
Protein-coding genes. The total length of PCGs was 10,587 bp in L. crotalus. The ATN start codons were used by most of the PCGs except cox3, and nad4 with TTG. Comparative study revealed that ATN and TTG start codon used by most of the PCGs of spider mitochondrial genomes. TTT start codon used by only cox1 (Tetragnatha maxillosa 17 ); TTA used by cox1 (Phyxioschema suthepium, Ornithoctonus huwena 8 , Agelena silvatica 11 , Carrhotus xanthogramma 13 , Selenops bursarius 16 , Oxytate striatipes 15 ); GTG used by cox2 (Selenops bursarius, Pholcus phalangioides), cytb (Ornithoctonus huwena), nad2 (Pholcus phalangioides), nad6 (Oxyopes sertatus 14 ), atp6 (Pholcus phalangioides); CTG used by cox1 (Neoscona theisi 10 , Liphistius erawan); CTA used by cox1 (Calisoga longitarsis 8 ). The stop codons TAA, TAG, and T(AA) were used commonly by most of the PCGs as observed in other spiders (Table S7). www.nature.com/scientificreports www.nature.com/scientificreports/ codon usage bias and mutations. The use of codons or codon usage bias is a fundamental phenomenon in nature 44,45 . The main influencing forces for codon usages are the mutation pressure and natural selection. Codon usage bias can be triggered by a number of other factors such as the content of nucleotides, gene length and their function, and the external environment 45 . We investigated the GC content to study the nucleotide distribution of all three codon positions of PCGs for 17 spider mitochondrial genomes (Fig. S1). The average GC content was 29.02%, while the value of GC1s, GC2s and GC3s were 35.98%, 34.90% and 20.02% respectively. The codon frequency ending with A/T is higher than G/C due to the AT rich segments which leads to the high codon bias 46,47 . The comparative study of spider mitochondrial genomes revealed 21 codons (9 with A-ending, 12 with U-, and none with G-or C-ending) with high frequency (Table S8). This result suggested that compositional constrain may play an important role in the codon usage patterns in spider species.
The average of the Effective Codon number (ENc) values for all the PCGs was 42.97, which indicate a strong codon bias, ranging from 39.31 to 46.83. We plotted ENc-GC3s value to explain the relationship between nucleotide composition and codon bias ( Figs. 2A and S2). The result specified that not only mutation, but other factors like natural selection might be involved in shaping the codon bias in spider mitochondrial genomes. To confirm the relation between GC12 and GC3 and explain the mutation-selection equilibrium in shaping the codon usage bias, we used neutrality plot analysis (Fig. 2B). This plot indicates that the genes have a wide range of GC3 value distributions, ranging from 8.1% to 34.7%, and also showed the positive correlation between GC12 and GC3 (r = 8455, p < 0.01). Moreover, the slope of the regression line of the entire coding sequences was 0.2516. So, the natural selection may probably dominate the codon bias rather than mutation bias 48 .

Non-synonymous and synonymous substitutions.
To investigate the selective pressure and evolutionary relation of the homogenous or heterogeneous species, non-synonymous and synonymous substitutions (Ka/ Ks) ratio was used 49 . Our result showed the average Ka/Ks ratio ranging from 0.122 ± 0.03 in cox1 to 0.443 ± 0.14 in nad6 gene and the resulted following order: cox1 < nad1 < cox2 < nad5 < cytb < cox3 < nad4 < atp6 < nad4 l < nad3 < nad2 < nad6 < atp8. This result indicated that the 13 PCGs excluding atp8 of all spider mitochondrial genomes including L. crotalus were evolving under purifying selection (Fig. 2C). The value of Ka/Ks for all the PCGs was below one, indicating the mutations swapped by synonymous substitutions. The cox1 gene with low Ka/Ks ratio represent fewer changes in amino acids and hence widely used as a potential molecular marker for species identification and phylogenetic analysis 50,51 . The transversion and transition plot against the genetic distance showed a linear relationship for PCGs (Fig. 2D). The value of the substitution saturation index for the combined dataset of all PCGs of 17 spider mitochondrial genomes (Iss = 0.4261) was significantly lower than the  (Table 1). L. crotalus has 22 tRNAs (total length 1,329 bp), ranging from 41 to 99 bp in length. (Table S5).
The prediction of the secondary structure and gene boundaries for all the tRNA genes was extremely difficult, as truncated at their 3′ end and lack of proper base pairing at the 5′ end (Fig. S3). The extreme truncation or atypical structure of tRNA is very peculiar in Arachnids mitochondrial genomes, including Araneae 9,52 . Seventeen of the 22 tRNAs has atypical secondary structures, missing either a D-arm or T-arm. The majority of tRNAs also have mismatches on T-arm, D-arm, acceptor arm or anticodon arm. The lack of acceptor arm or very poorly paired arm was observed in the following genes (trnL1, trnS1, trnE, trnL2, trnI). The lack of T-arm and loop were observed in 13 genes (trnC, trnD, trnF, trnG, trnH, trnK, trnL2, trnN, trnP, trnI, trnY, trnQ and trnV), which was inferred by TV-replacement loops. The lack of DHU loop observed in trnN whereas, lack of DHU arm and loop observed in trnS1, trnI (Fig. S3). Out of 22 tRNAs, boundaries of nine genes were overlapped with the adjacent gene on both the ends. The gene overlaps or quantity of truncation in each tRNA gene were differ from species to species. trnE showed overlaps of 39 nucleotides (nt) with its neighboring gene (trnF) on the same stand at 5′ end and 26 nt overlap with trnR gene on the opposite strand at the 3′ end and. After overlapping on both the ends, only four nt was remaining which exclusively denoted for trnE. trnC showed overlaps of 28 nt at the 5′ end with its neighboring gene (cox1) on the opposite strand and 35 nt overlap at the 3′ end with trnY gene on the same strand. After overlapping on both the ends, no nucleotide was remaining for trnC. the A + T-rich region. The A + T-rich region in mitochondrial genome is important for the initiation of replication in Metazoans 53,54 , and also called the control region (CR). It is located between trnM and trnQ in L.crotalus, spans 356 bp with 67.67% AT content and showed positive AT/GC skew (0.04/0.15), indicating an obvious bias towards the use of A and G. The non-coding region named replication origin of L-strand (OL) region was also found. The "OL" region (TCCTCCTCCGCGGAAAAGAGAGGAGGA) is 27 bp in length and has the potential to fold into a stem-loop secondary structure (Fig. 3). Apart from the conserved elements in A + T-rich region, tandem repeats were also found to be a characteristic of A + T-rich regions. In L.crotalus, two tandem repeats of 10 bp (ATTTTTATTC) and six tandem repeats of 6 bp (CATATA) were present at the 3′ end upstream of the trnQ. Further, this motif was also detected in other eight species (A. angulatus, C. longitarsis, C. xanthogramma, H. thorelli, N. theisi, O. huwena, P. phalangioides, S. bursarius). All elements which are related to the regulation of transcription and control of DNA replication in the A + T-rich region were arranged in the conserved order as compared to other Opisthothele spiders. phylogenetic analyses. Eight phylogenetic trees based on four datasets and two inference methods, Maximum likelihood (ML) and Bayesian Inference (BI) were generated (Figs. 4 and S4-S6). All the analyses supported the monophyly of suborders Opisthothelae with high bootstrap support and posterior probability. The infraorder Mygalomorphae was recovered as monoplyletic in all the analyses. The Araneomorphae was recovered paraphyletic in all the analyses except BI-2, BI-4, ML-4 (Fig. 4). The paraphyly in the Araneomorphae is due to the varied position of family Hypochilidae and Pholcidae. The family Hypochilidae is first to branch from the other taxa of spiders and leaving two clade with strong posterior probabilities and bootstrap support in BI-1, 3, ML-2 (Figs. S4 and S6). The first clade of Mygalomorphae + family Pholcidae of Araneomorphae and; second clade of Araneomorphae. In two analyses (ML-1, 3), the families Pholcidae and Hypochilidae were grouped with Mygalomorphae taxa with low bootstrap support (Fig. S5). In all the analyses L. crotalus is cladded with O. huwena and showed sister relationship with C. longitarsis (Nemesiidae) and P. suthepium (Dipluridae). The estimated tree from the mitochondrial genome is well supported by previously generated multilocus phylogeny 5 . However, the genital structure is the distinguishable feature of the Mygalomorphae and Araneomorphae spiders with three types of genitalia 5,55 : Haplogyne (simple) in Mygalomorphae, Entelegyne (two separate   (Fig. 6B). The transposition of trnI at node A14 towards A4 node makes two new gene boundaries, 22 and 23 and become a synapomorphy character for Mygalomorphae and two families of Araneomorphae (Hypochilidae, Pholcidae). The inversion of trnI on A3 node towards A2 separated the Mygalomorphae from Pholcidae and Hypochilidae and once more inversion of trnI towards L. crotalus separated the L. crotalus to O. huwena. Three gene boundaries (5, 6, 12) are synapomorphy for Araneomorphae