Organization and phylogenetic relationships of the mitochondrial genomes of Speiredonia retorta and other lepidopteran insects

In this study, we analyzed the complete mitochondrial genome (mitogenome) of Speiredonia retorta, which is a pest and a member of the Lepidoptera order. In total, the S. retorta mitogenome was found to contain 15,652 base pairs encoding 13 protein-coding genes (PCGs), 22 tRNAs, 2 rRNAs, as well as an adenine (A) + thymine (T)-rich region. These findings were consistent with the mitogenome composition of other lepidopterans, as we identified all 13 PCGs beginning at ATN codons. We also found that 11 PCGs terminated with canonical stop codons, whereas cox2 and nad4 exhibited incomplete termination codons. By analyzing the mitogenome of S. retorta using Bayesian inference (BI) and maximum likelihood (ML) models, we were able to further confirm that this species is a member of the Erebidae family.

. Given the clear value of such analyses, in the present study we sequenced the full S. retorta mitogenome in an effort to more fully explore the evolutionary relationship between this agriculturally important insect and other Noctuidae species.
The composition and skewness of the S. retorta mitogenome were compared to those of other Noctuoidea species ( Table 2). The major strand was composed of A, T, G, and C nucleotides at relative frequencies of 35.59%, 41.23%, 7.37%, and 11.80%, respectively (80.83% A + T), thus exhibiting negative AT and GC skewness (− 0.020 and − 0.231, respectively). The AT skewness of other Noctuoidea mitogenomes has been found to range between 0.016 (L. dispar) and − 0.027 (A. formosae), whereas GC skewness values range from − 0.266 (A. formosae) to − 0.178 (A. ipsilon). The negative AT skewness in S. retorta indicates that there are more T residues than A residues, as previously reported for many Lepidopteran species including A. formosae (− 0.027), and A. ipsilon (− 0.006) ( Table 2).  F  1-68  68  CAT  --0   tRNA Ile  F  75-141  69  GAT  --6   tRNA Gln  R  139-207  69  TTG  --− 3   nad2  F  267-1280  1014  ATT  TAA  59   tRNA Trp  F  1279-1346  68  TCA  --− 2   tRNA Cys  R  1339-1406  68  GCA  --− 8   tRNA Tyr  R  1420-1486  67  GTA  --13   cox1  F  1494-3032  1539  ATG  TAA  7 tRNA mitogenome sequence (11,134 bp). These genes were between 159 bp (atp8) and 1713 bp (nad5) in length. This is consistent with findings from other Lepidopteran species in which these two mitogenes are often the shortest and longest, respectively 11,13 . All detected PCGs began with an ATN codon (2 ATA, 3 ATT, and 8 ATG). Atypical cox1 start codons (including TTAG, ACG, and TTG) have been reported in the mitogenomes of a range of insect species [14][15][16] . The TAA stop codon was the most common among these PCGs (nad2, cox1, atp8, atp6, cox3, nad3, nad5, nad4L, nad6, cytb), while nad1 utilized a TAG stop codon, and nad4 and cox2 harbored incomplete stop codons (T) ( Table 1). This is a common conserved mitogenomic feature among invertebrates [17][18][19] , and this single T residue can be still be recognized by endonucleases during polycistronic pre-mRNA transcription, with polyadenylation from contiguous PCGs ultimately yielding a functional stop codon 5,20,21 . Next, we assessed codon usage among a range of Lepidopteran species, including three Noctuoidae members as well as Bombycoidea, Tortricoidea, and Geometroidea members (Fig. 2). Through this approach, we found Asn, Ile, Leu2, Lys, Met, Phe, and Tyr to be the amino acids that were used most often. In these 6 species, we identified 4 codon families with at least 80 codons per thousand codons (Ile and Phe), and 3 with a minimum of 60 codons per thousand codons (Leu2, Met and Asn). The Arg and Cys codon families were the least represented. The codon distributions were consistent among the three Noctuoidae species analyzed herein, with the exception of Lys, which was rarely encoded in S. retorta (Fig. 3). www.nature.com/scientificreports/ The RCSU was next assessed in the PCGs encoded by the mitogenomes of the four Lepidopteran superfamilies (Fig. 4). We found that the PCGs of S. retorta contained all possible combinations of codons other than GCG, CGC, GGC, AGG, CCG, ACG, and TGG. This lack of GC-rich codons is common to many other Lepidopterans including H. cunea (GCG and GTG), P. atrilineata (CGG), and C. pomonella (GCG). Utilized codons showed a bias towards A + T content, contributing to the overall mitogenomic AT-bias that was observed both in S. retorta and across other insect mitogenomes 19,22 . Ribosomal and transfer RNA genes. Two rRNA genes were identified within the S. retorta mitogenome, consistent with findings in most other animal species. The 1413 bp large ribosomal RNA gene (rrnL) was located in between tRNA Leu (CUN) and tRNA Val, while the 781 bp small ribosomal RNA gene (rrnS) was between tRNA Val and the A + T-rich region (Table 1). These rRNA genes were A + T rich (84.40%), falling within the range observed for other Noctuoidae species including A. formosae (83.77%) and A. ipsilon (85.15%). These rRNA AT and GC skewness values have been found to be negative in the majority of analyzed Lepidopteran mitogenomes 17 , however, in S. retorta these values were positive (0.040 and 0.356, respectively), as has been reported in a subset of prior studies 23,24 .
We identified a full set of 22 tRNA genes (65-71 nucleotides long) in the S. retorta mitogenome, consistent with findings from other Lepidopterans. These tRNA regions were heavily A + T biased (81.39%), with positive AT and GC skewness (0.031 and 0.171, respectively; Table 2). All of these tRNAs exhibited expected cloverleaf-like secondary structures, although the DHU stem was lacking from trnS1 (Fig. 5), as has previously been observed in other Lepidopterans 23  Overlapping and intergenic spacer regions. There are 6 overlapping regions in the S. retorta mitogenome. These regions range from 1 to 8 bp in size (26 bp total), with the longest region of overlap being located between trnC and trnY (Table 1). When we aligned the region of overlap between atp6 and atp8, we found that S. retorta mitogenome contained an ATG ATA A nucleotide sequence common to other Lepidopterans (Fig. 6). In addition, we identified the ' ATA CTA A' motif within the 17 bp region between trnS2 (UCN) and nad1 (Fig. 7A), with this motif being highly conserved in insect mitogenomes in addition to being a potential mitochondrial transcription termination peptide-binding site (mtTERM protein) 25 .

A + T-rich region analysis. An A + T-rich region that was 417 bp long was detected between rrnS and trnM
in the S. retorta mitogenome. This region was composed of 93.76% AT residues and exhibited negative AT and GC skewness (− 0.038 and − 0.462, respectively) ( Table 2). This region contained a number of short repeated sequences, such as a 19 bp poly-T region that flanked an ' ATAGA' motif near rrnS, an ' ATTTA' roughly in the center of this A + T-rich region, as well as a poly-A element that was located upstream relative to trnM, consistent with mitogenomic findings from other Lepidopteran species (Fig. 7B). In addition, while the exact poly-T region length varies among Lepidopterans 4,17,26 , the ATAGA motif is highly conserved 27 .

Phylogenetic analyses.
A number of recent studies have explored phylogenetic relationships among Noctuoidea species. In one recent analysis, Zahiri et al. proposed the following relationship among these families: (Notodontidae + (Euteliidae + (Noctuidae + (Erebidae + Nolidae)))) 28   www.nature.com/scientificreports/ study in which the following grouping scheme was proposed: (Notodontidae + (Erebidae + (Nolidae + (Euteliidae + Noctuidae)))) 29 . In this study, we utilized BI and ML methods as well as the MAFFT alignment approach in order to explore the relationship between S. retorta and other Noctuidae insects according to its mitogenome sequence. We utilized the NT dataset to conduct phylogenetic analyses of 65 full mitogenomes which were representative of six Noctuoidea families (Erebidae, Lymantriidae, Euteliidae, Noctuidae, Notodontidae, and Nolidae). For outgrouping purposes, we additionally utilized the mitogenomes of Ahamus yunnanensis (NC_018095) and Thitarodes pui (NC_023530) in the present analysis (Fig. 8). According to Homziak's study, our analyses revealed a topology within Erebinae that was as follows: ((Catocala sp + Speiredonia retorta) + (Grammodes geometrica + Parallelia stuposa) + Eudocima phalonia). The result indicated that Catocalini belongs to Erebinae subfamily 30 , as confirmed via BI (Fig. 8A) and ML (Fig. 8B) analyses. This approach further confirmed that S. retorta is a member of Erebidae: (Notodontidae + (Erebidae + Lymantriidae + (Nolidae + (Euteliidae + Noctuidae)))). This is distinct from a study conducted by Zahiri et al. that yielded different phylogenetic results: (Notodontidae + (Euteliidae + (Noctuidae + (Erebidae + Nolidae)))) 28 .   www.nature.com/scientificreports/ S. retorta has previously been identified as a member of the Noctuoidea superfamily within the Erebidae family and the Erebinae subfamily 30 . Our data were consistent with this hypothesis. Even so, our findings were distinct from those of the prior study with respect to some of the identified relationships, suggesting that the sequencing of the mitogenomes of additional Noctuoidea species will be required to more accurately resolve these phylogenetic relationships.

Materials and methods
Insects and DNA collection. Samples of S. retorta were obtained from Bengbu Medical College, Anhui Province, China. The collected specimens were identified as Speiredonia retorta based on their morphological characteristics by a taxonomist of the Department of Entomology, Anhui Agricultural University, Hefei, China (AHAU). After a careful examination of the morphological characteristics and the comparison of voucher specimens to the referenced publications regarding Healthy plantations from the Forest Science Institute of Vietnam 1 . A Genomic DNA Extraction Ki (Aidlab Co., Beijing, China) was used to extract DNA from these samples, after which 1% agarose gel electrophoresis (AGE) was used to evaluate DNA quality. Samples were then used for mitogenome isolation.
Mitogenome sequencing. The S. retorta mitogenome was amplified using 12 primer pairs designed based on known conserved mitogenome sequences in other Lepidopteran species (Table 3; 34 , with additional predictions being made based upon sequences capable of adopting a tRNA secondary structure and containing an anticodon. Tandem repeats within the A + T-rich region were identified with a tandem repeat finder application (http://tande m.bu.edu/trf/trf.html) 35 .
Codon usage and RCSU. We assessed codon usage across a range of Lepidopteran species, including three Noctuoidae members as well as Bombycoidea and Geometroidea members that were closely related to Noctuoidae species and Tortricoidea species that were far more distantly related to these species 24,36 (Fig. 2). MEGA7.0 was used to calculate relative synonymous codon usage (RSCU) values 37 .
Phylogenetic analysis. Lepidopteran phylogenetic relationships were evaluated by utilizing all 64 Noctuoidea mitogenomes available from Genbank (Table 3), with the Ahamus yunnanensis (NC_018095) and Thitarodes pui (NC_023530) mitogenomes serving as an outgroup for this analysis. MAFFT was then used to conduct multiple alignments of concatenated nucleotide sequences for the 13 PCGs using default settings 38 , and the resultant concatenated sequences were used for phylogenetic analyses, which were conducted via the Maximum Likelihood (ML) method using MEGA7.0 37 and via a Bayesian Inference (BI) approach using MrBayes v3.2 39 . A total of 1000 bootstrap replicates were used for ML analysis in order to develop phylogenetic trees. For the BI analysis, the GTR + I + G model was used for the analyzed nucleotide sequences, with MrModeltest 2.3 being used in accordance with Akaike's information criterion (AIC). In the BI analysis, four simultaneous MCMC chains were run for 10,000,000 generations, with sampling being conducted in 1000 generation intervals with a 2500 generation burn-in. FigTree v1.4.2 (http://tree.bio.ed.ac.uk/softw are/figtr ee/) was then used to visualize the resultant phylogenetic trees.