Introduction

Speiredonia retorta (Lepidoptera: Erebidae) is a pest species that is widely distributed throughout Southeast Asia. S. retorta larvae can adhere to Acacia leaves and branches where they undergo pupation, while adults can feed on a range of fruits including apples, pears, and grapes, leading to their more rapid decay. These moths produce three generations per year, with a life cycle consisting of an egg stage (6–18 days), a larval stage with six instars (23–47 days), and a pupal stage (8–13 days). These insects overwinter in the pupal stage, which can last from 194–228 days1. The primary predators of this species include Ophion luteus (Hymenoptera: Ichneumonidae), Brachymeria obscurata (Hymenoptera: Chalalcididae), and Exorista sorbillans (Diptera: Tachinidae). Given that these moths represent a significant economic threat, their management is a key agricultural concern and necessitates an in-depth understanding of their biology. Further exploration of the phylogeny and genetic characteristics of S. retorta has the potential to offer novel insights into how to appropriately combat the spread of these moths. Understanding and analyzing the mitochondrial genome (mitogenome) of this species, in particular, may facilitate key comparative phylogenetic and evolutionary studies that can support such preventative efforts2,3.

The mitogenome of metazoan species generally ranges from 14,000–19,000 bp in size, with few or no intergenic spacer regions4, encoding 13 protein-coding genes (PCGs), 22 tRNAs, and 2 rRNAs5,6. In addition to these more broadly conserved features, members of the Lepidoptera order generally exhibit a conserved adenine and thymine (A + T)-rich region within their mitogenome. The mitogenome represents an ideal tool for the analysis of phylogenetic relationships, given that it has a simple structure, is maternally inherited, rarely undergoes recombination, and is conserved over the course of evolution7,8,9. Modern advances in sequencing techniques have led to the publication of mitogenomes from many different insects and other species, thereby supporting a wide range of evolutionary analyses6.

By conducting a comprehensive analysis of the mitogenome of a given insect species, we have the opportunity to perform intricate phylogenetic or population genetics studies and to identify potentially novel genes that may serve as valuable targets in future research efforts. The Lepidoptera order is the second-largest Insecta order and is composed of > 155,000 species of moths and butterflies10. Noctuoidea is the largest Lepidopteran sub-family, with > 42,400 species11. Several characteristic mitogenomic markers associated with these phylogenetic classes of insects have been identified to date, enabling us to reliably explore the phylogenetic relationship between S. retorta and other species through mitogenomic analyses. Six Noctuidae families were proposed in a phylogenetic framework constructed by Zahiri et al., including the Euteliidae, Erebidae, Nolidae, Notodontidae, Oenosandridae, and Noctuidae families12. Given the clear value of such analyses, in the present study we sequenced the full S. retorta mitogenome in an effort to more fully explore the evolutionary relationship between this agriculturally important insect and other Noctuidae species.

Results and discussion

Mitogenome structure, organization, and composition

Our sequencing revealed the S. retorta mitogenome to be 15,652 bp in length (Fig. 1), consistent with reported mitogenomic lengths in other Lepidopteran species such as Thitarodes pui (Hepialidae; 15,064 bp) and Plutella xylostella (Plutellidae; 16,179 bp). We then aligned the S. retorta mitogenome sequences with those of other Lepidopteran species, enabling us to identify 13 PCGs (atp6, atp8, cox1, cox2, cox3, cytb, nad1, nad2, nad3, nad4, nad5, nad6, and nad4L), two rRNAs (large and small rRNA), 22 tRNAs, and a non-coding A + T-rich region that is conserved within the mitogenome of most known animal species5 (Table 1). The S. retorta mitogenome also harbored a trnM-trnI-trnQ gene arrangement that was distinct from the ancestral trnI-trnQ-trnM gene order6.

Figure 1
figure 1

A map of the S. retorta mitogenome. Labeling of tRNA genes was conducted in accordance with IUPAC-IUB single-letter amino acid codes. The cytochrome c oxidase subunits (cox1-3), cytochrome b (cob), NADH dehydrogenase components (nad1-6), and rRNAs (rrnL and rrnS) are as indicated.

Table 1 List of annotated mitochondrial genes of S. retorta.

The composition and skewness of the S. retorta mitogenome were compared to those of other Noctuoidea species (Table 2). The major strand was composed of A, T, G, and C nucleotides at relative frequencies of 35.59%, 41.23%, 7.37%, and 11.80%, respectively (80.83% A + T), thus exhibiting negative AT and GC skewness (− 0.020 and − 0.231, respectively). The AT skewness of other Noctuoidea mitogenomes has been found to range between 0.016 (L. dispar) and − 0.027 (A. formosae), whereas GC skewness values range from − 0.266 (A. formosae) to − 0.178 (A. ipsilon). The negative AT skewness in S. retorta indicates that there are more T residues than A residues, as previously reported for many Lepidopteran species including A. formosae (− 0.027), and A. ipsilon (− 0.006) (Table 2).

Table 2 The composition and skewness of mitogenomes of different Noctuoidea species.

PCGs and codon usage

We identified 13 total PCGs in the S. retorta mitogenome, spanning 71.1% of the mitogenome sequence (11,134 bp). These genes were between 159 bp (atp8) and 1713 bp (nad5) in length. This is consistent with findings from other Lepidopteran species in which these two mitogenes are often the shortest and longest, respectively11,13. All detected PCGs began with an ATN codon (2 ATA, 3 ATT, and 8 ATG). Atypical cox1 start codons (including TTAG, ACG, and TTG) have been reported in the mitogenomes of a range of insect species14,15,16. The TAA stop codon was the most common among these PCGs (nad2, cox1, atp8, atp6, cox3, nad3, nad5, nad4L, nad6, cytb), while nad1 utilized a TAG stop codon, and nad4 and cox2 harbored incomplete stop codons (T) (Table 1). This is a common conserved mitogenomic feature among invertebrates17,18,19, and this single T residue can be still be recognized by endonucleases during polycistronic pre-mRNA transcription, with polyadenylation from contiguous PCGs ultimately yielding a functional stop codon5,20,21.

Next, we assessed codon usage among a range of Lepidopteran species, including three Noctuoidae members as well as Bombycoidea, Tortricoidea, and Geometroidea members (Fig. 2). Through this approach, we found Asn, Ile, Leu2, Lys, Met, Phe, and Tyr to be the amino acids that were used most often. In these 6 species, we identified 4 codon families with at least 80 codons per thousand codons (Ile and Phe), and 3 with a minimum of 60 codons per thousand codons (Leu2, Met and Asn). The Arg and Cys codon families were the least represented. The codon distributions were consistent among the three Noctuoidae species analyzed herein, with the exception of Lys, which was rarely encoded in S. retorta (Fig. 3).

Figure 2
figure 2

A comparative assessment of mitogenomic codon usage among Lepidopteran insects. Superfamily membership is indicated above species names using lowercase letters (a: Noctuoidea, b: Bombycoidea, c: Tortricoidea, d: Geometroidea).

Figure 3
figure 3

The distribution of codons among Lepidopteran species. CDspT codons per thousand codons.

The RCSU was next assessed in the PCGs encoded by the mitogenomes of the four Lepidopteran superfamilies (Fig. 4). We found that the PCGs of S. retorta contained all possible combinations of codons other than GCG, CGC, GGC, AGG, CCG, ACG, and TGG. This lack of GC-rich codons is common to many other Lepidopterans including H. cunea (GCG and GTG), P. atrilineata (CGG), and C. pomonella (GCG). Utilized codons showed a bias towards A + T content, contributing to the overall mitogenomic AT-bias that was observed both in S. retorta and across other insect mitogenomes19,22.

Figure 4
figure 4

Mitogenome Relative Synonymous Codon Usage (RSCU) for the four Lepidopteran superfamilies, with codon families on the x-axis. Any codons above the bar were absent within the mitogenome.

Ribosomal and transfer RNA genes

Two rRNA genes were identified within the S. retorta mitogenome, consistent with findings in most other animal species. The 1413 bp large ribosomal RNA gene (rrnL) was located in between tRNA Leu (CUN) and tRNA Val, while the 781 bp small ribosomal RNA gene (rrnS) was between tRNA Val and the A + T-rich region (Table 1). These rRNA genes were A + T rich (84.40%), falling within the range observed for other Noctuoidae species including A. formosae (83.77%) and A. ipsilon (85.15%). These rRNA AT and GC skewness values have been found to be negative in the majority of analyzed Lepidopteran mitogenomes17, however, in S. retorta these values were positive (0.040 and 0.356, respectively), as has been reported in a subset of prior studies23,24.

We identified a full set of 22 tRNA genes (65–71 nucleotides long) in the S. retorta mitogenome, consistent with findings from other Lepidopterans. These tRNA regions were heavily A + T biased (81.39%), with positive AT and GC skewness (0.031 and 0.171, respectively; Table 2). All of these tRNAs exhibited expected cloverleaf-like secondary structures, although the DHU stem was lacking from trnS1 (Fig. 5), as has previously been observed in other Lepidopterans23. Furthermore, 8/22 tRNAs were encoded on the L-strand while 12/22 were encoded on the H-strand.

Figure 5
figure 5

Predicted secondary structures for the 22 S. retorta mitogenome-encoded tRNAs.

Overlapping and intergenic spacer regions

There are 6 overlapping regions in the S. retorta mitogenome. These regions range from 1 to 8 bp in size (26 bp total), with the longest region of overlap being located between trnC and trnY (Table 1). When we aligned the region of overlap between atp6 and atp8, we found that S. retorta mitogenome contained an ATGATAA nucleotide sequence common to other Lepidopterans (Fig. 6). In addition, we identified the ‘ATACTAA’ motif within the 17 bp region between trnS2 (UCN) and nad1 (Fig. 7A), with this motif being highly conserved in insect mitogenomes in addition to being a potential mitochondrial transcription termination peptide-binding site (mtTERM protein)25.

Figure 6
figure 6

Alignment of atp8 and atp6 overlap among Lepidopteran species and other insects, with the number of intergenic nucleotides shown on the right.

Figure 7
figure 7

(A) The intergenic spacer region between trnS2 (UCN) and nad1 was aligned for multiple Lepidopteran species, with the conserved ‘ATACTAA’ motif being highlighted. (B) Features of the S. retorta A + T-rich region are shown in the reverse strand, with the ATATGA motif being highlighted. Double underlines were used to mark the poly-A region, while a single underline was used to mark the poly-T region. A dotted underline was used to mark single microsatellite T/A repeats.

A + T-rich region analysis

An A + T-rich region that was 417 bp long was detected between rrnS and trnM in the S. retorta mitogenome. This region was composed of 93.76% AT residues and exhibited negative AT and GC skewness (− 0.038 and − 0.462, respectively) (Table 2). This region contained a number of short repeated sequences, such as a 19 bp poly-T region that flanked an ‘ATAGA’ motif near rrnS, an ‘ATTTA’ roughly in the center of this A + T-rich region, as well as a poly-A element that was located upstream relative to trnM, consistent with mitogenomic findings from other Lepidopteran species (Fig. 7B). In addition, while the exact poly-T region length varies among Lepidopterans4,17,26, the ATAGA motif is highly conserved27.

Phylogenetic analyses

A number of recent studies have explored phylogenetic relationships among Noctuoidea species. In one recent analysis, Zahiri et al. proposed the following relationship among these families: (Notodontidae + (Euteliidae + (Noctuidae + (Erebidae + Nolidae))))28. In contrast, Yang et al. published another study in which the following grouping scheme was proposed: (Notodontidae + (Erebidae + (Nolidae + (Euteliidae + Noctuidae))))29. In this study, we utilized BI and ML methods as well as the MAFFT alignment approach in order to explore the relationship between S. retorta and other Noctuidae insects according to its mitogenome sequence. We utilized the NT dataset to conduct phylogenetic analyses of 65 full mitogenomes which were representative of six Noctuoidea families (Erebidae, Lymantriidae, Euteliidae, Noctuidae, Notodontidae, and Nolidae). For outgrouping purposes, we additionally utilized the mitogenomes of Ahamus yunnanensis (NC_018095) and Thitarodes pui (NC_023530) in the present analysis (Fig. 8). According to Homziak’s study, our analyses revealed a topology within Erebinae that was as follows: ((Catocala sp + Speiredonia retorta) + (Grammodes geometrica + Parallelia stuposa) + Eudocima phalonia). The result indicated that Catocalini belongs to Erebinae subfamily30, as confirmed via BI (Fig. 8A) and ML (Fig. 8B) analyses. This approach further confirmed that S. retorta is a member of Erebidae: (Notodontidae + (Erebidae + Lymantriidae + (Nolidae + (Euteliidae + Noctuidae)))). This is distinct from a study conducted by Zahiri et al. that yielded different phylogenetic results: (Notodontidae + (Euteliidae + (Noctuidae + (Erebidae + Nolidae))))28.

Figure 8
figure 8

Tree showing the phylogenetic relationships among Noctuoidea insects, constructed using (A) Bayesian inference (BI). (B) Maximum Likelihood method (ML). As outgroups for this analysis, we utilized Ahamus yunnanensis (NC_018095) and Thitarodes pui (NC_023530).

S. retorta has previously been identified as a member of the Noctuoidea superfamily within the Erebidae family and the Erebinae subfamily30. Our data were consistent with this hypothesis. Even so, our findings were distinct from those of the prior study with respect to some of the identified relationships, suggesting that the sequencing of the mitogenomes of additional Noctuoidea species will be required to more accurately resolve these phylogenetic relationships.

Materials and methods

Insects and DNA collection

Samples of S. retorta were obtained from Bengbu Medical College, Anhui Province, China. The collected specimens were identified as Speiredonia retorta based on their morphological characteristics by a taxonomist of the Department of Entomology, Anhui Agricultural University, Hefei, China (AHAU). After a careful examination of the morphological characteristics and the comparison of voucher specimens to the referenced publications regarding Healthy plantations from the Forest Science Institute of Vietnam1. A Genomic DNA Extraction Ki (Aidlab Co., Beijing, China) was used to extract DNA from these samples, after which 1% agarose gel electrophoresis (AGE) was used to evaluate DNA quality. Samples were then used for mitogenome isolation.

Mitogenome sequencing

The S. retorta mitogenome was amplified using 12 primer pairs designed based on known conserved mitogenome sequences in other Lepidopteran species (Table 3; BGI Group Co., Guangdong, China)2,31,32. An Eppendorf Mastercycler and Mastercycler gradient were used to amplify mitogenomic sequences in a 50 μL reaction volume containing 35 μL dH2O, 5 μL 10 × Taq buffer (Mg2+ plus), 4 μL dNTP (25 mM), 1.5 μL DNA, 2 μL of each primer (F + R; 10 μM) as well as 0.5 μL TaqDNA polymerase (Takara Co., Dalian, China). Thermocycler settings were: 94 °C for 4 min; 38 cycles of 94 °C for 30 s, 48–59 °C for 1–3 min (based on predicted fragment length); 72 °C for 10 min. Samples were then separated via 1% AGE and collected with a DNA gel extraction kit (Transgen Co., Beijing, China), after which direct sequencing was conducted with appropriate PCR primers.

Table 3 Details of the Lepidopteran mitogenomes used in this study.

Sequence assembly and annotation

NCBI BLAST (http://blast.ncbi.nlm.nih.gov/Blast) and SeqMan II (DNASTAR Inc.; WI, USA) were used to annotate the sequenced mitogenome. The Invertebrate Mitochondrial Genetic Code was used to identify putative proteins encoded by specific PCGs. An approach previously detailed by Junqueira et al.33 was used to assess skewness, with base composition being assessed as follows: AT skew = [A − T]/[A + T], GC skew = [G − C]/[G + C].

The tRNAscan-SE program (http://lowelab.ucsc.edu/tRNAscan-SE/) was used to identify tRNA genes34, with additional predictions being made based upon sequences capable of adopting a tRNA secondary structure and containing an anticodon. Tandem repeats within the A + T-rich region were identified with a tandem repeat finder application (http://tandem.bu.edu/trf/trf.html)35.

Codon usage and RCSU

We assessed codon usage across a range of Lepidopteran species, including three Noctuoidae members as well as Bombycoidea and Geometroidea members that were closely related to Noctuoidae species and Tortricoidea species that were far more distantly related to these species24,36 (Fig. 2). MEGA7.0 was used to calculate relative synonymous codon usage (RSCU) values37.

Phylogenetic analysis

Lepidopteran phylogenetic relationships were evaluated by utilizing all 64 Noctuoidea mitogenomes available from Genbank (Table 3), with the Ahamus yunnanensis (NC_018095) and Thitarodes pui (NC_023530) mitogenomes serving as an outgroup for this analysis. MAFFT was then used to conduct multiple alignments of concatenated nucleotide sequences for the 13 PCGs using default settings38, and the resultant concatenated sequences were used for phylogenetic analyses, which were conducted via the Maximum Likelihood (ML) method using MEGA7.037 and via a Bayesian Inference (BI) approach using MrBayes v3.239. A total of 1000 bootstrap replicates were used for ML analysis in order to develop phylogenetic trees. For the BI analysis, the GTR + I + G model was used for the analyzed nucleotide sequences, with MrModeltest 2.3 being used in accordance with Akaike's information criterion (AIC). In the BI analysis, four simultaneous MCMC chains were run for 10,000,000 generations, with sampling being conducted in 1000 generation intervals with a 2500 generation burn-in. FigTree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/) was then used to visualize the resultant phylogenetic trees.