Introduction

Insect mitochondrial DNA (mtDNA) is a double-stranded, circular molecule that is 14–19 kb in length and contains 13 PCGs: subunits 6 and 8 of the ATPase (atp6 and atp8), cytochrome c oxidase subunits 1–3 (cox1cox3), cytochrome B (cob), NADH dehydrogenase subunits 1–6 and 4 L (nad16 and nad4L). It also contains two rRNA genes, small and large subunit rRNAs (rrnL and rrnS), 22 tRNA genes and a non-coding element termed the A + T-rich region1. The A + T-rich region has a higher level of sequence and length variability than other regions of the genome2,3,4,5 and regulates the transcription and replication of mt genomes6. As an informative molecular marker, mtDNA can provide important information for rearrangement patterns and phylogenetic analysis due to its rapid evolutionary rate and lack of genetic recombination7. Therefore, mtDNA has been widely used for diverse evolutionary studies among species8.

Recent advances in sequencing technologies have led to the rapid increase in mt genome data in GenBank, including Bombycoidea mt genomes. Bombycoidea is a superfamily of moths that contains the silk moths, emperor moths, sphinx moth, and relatives9. Some complete mt genomes of Bombycoidea insects are currently available in GenBank (Table 1). Several representative families were studied in this paper. Two families, Bombycidae and Saturniidae, are silk-producing insects with economic values in Bombycoidea10. The Sphingidae are a family of Bombycoidea, commonly known as hawk moths, sphinx moths, and hornworms; this family includes approximately 1,450 species11, 12. Brahmaeidae are a family of Bombycoidea11, 12. The Lasiocampidae are also a family of Bombycoidea, known as eggars, snout moths, or lappet moths. Over 2,000 species occur worldwide, and it is likely that not all have been named or studied13.

Table 1 List of Bombycoidea species analysed in this paper with their respective GenBank accession numbers.

Here, we sequenced the complete mt genomes of two species, A. rubiginosa and R. menciana. We aimed to analyse the mt genomes of these two species and to investigate the phylogeny of Bombycoidea insects. We were particularly interested in the phylogenetic position of Sphingidae and Bombycidae based on the 32 Bombycoidea complete mt genomes available to date.

Materials and Methods

Specimen collection

The moths of A. rubiginosa and R. menciana were collected in Xuancheng, Anhui Province. Total DNA was isolated using the Genomic DNA Extraction Kit (SangonBiotech, China) according to manufacturer instructions. Extracted DNA was used to amplify the complete mt genomes by PCR.

PCR amplification and sequencing

For amplification of the entire mt genomes of A. rubiginosa and R. menciana, specific primers were designed based on mt genomes sequences obtained from other Lepidopteran insects14, 15 (Table 2). The complete mt genomes were obtained using a combination of conventional PCR and long PCR to amplify overlapping fragments spanning the complete mt genomes. All amplifications were performed on an Eppendorf Mastercycler and Mastercycler gradient in 50 µl reaction volumes with 5 µl of 10 × Taq Buffer (Mg2+) (Aidlab), 4 µl of dNTPs (2.5 mM, Aidlab), 2 µl of each primer (10 µM), 2 µl of DNA (~100 ng), 34.5 µl of ddH2O, and 0.5 µl of Red Taq DNA polymerase (5U, Aidlab). PCR was performed under the following conditions: 3 min at 94 °C, followed by 35 cycles of 30 s at 94 °C, 1–3 min at 54–60 °C (depending on primer combination), elongation at 72 °C for 30 s to 4 min (depending on the fragment length) and final extension at 72 °C for 10 min. The PCR products were separated by agarose gel electrophoresis (1% w/v) and purified using a DNA gel extraction kit (Transgene, China). The purified PCR products were ligated into the T-vector (SangonBiotech, China) and sequenced.

Table 2 Primers used in this study.

Sequence analysis

Annotation of sequences were performed using the blast tools in NCBI web site (https://blast.ncbi.nlm.nih.gov/Blast.cgi). The sequences were edited and assembled using EditSeq and SeqMan (DNAStar package, DNAStar Inc. Madison, WI, USA). The graphical maps of A. rubiginosa and R. menciana complete mt genomes were drawn using the online mitochondrial visualization tool mtviz (http://pacosy.informatik.uni-leipzig.de/mtviz). The nucleotide sequences of PCGs were translated with the invertebrate mt genome genetic code. Alignments of A. rubiginosa and R. menciana PCGs with various Bombycoidea mt genomes were performed using MAFFT16. Composition skewness was calculated according to the following formulas:

$${\rm{AT}}\,{\rm{skew}}=[{\rm{A}}-{\rm{T}}]/[{\rm{A}}+{\rm{T}}];\,{\rm{GC}}\,{\rm{skew}}\,=\,[{\rm{G}}-{\rm{C}}]/[{\rm{G}}+{\rm{C}}].$$

Nucleotide composition statistics and codon usage were computed using MEGA 5.017.

Phylogenetic analysis

Thirty complete Bombycoidea mt genomes were downloaded from GenBank (https://www.ncbi.nlm.nih.gov/genbank/). In addition, mt genomes of Biston panterinaria and Phthonandria atrilineata were downloaded from GenBank and used as outgroup taxa. GenBank sequence information is shown in Table 1.

We estimated the taxonomic status of A. rubiginosa and R. menciana within Bombycoidea by constructing phylogenetic trees. Sequences from the PCGs of 34 mt genomes were combined. Two inference methods were used for analysis: Bayesian inference (BI) and Maximum likelihood (ML). BI was performed with MrBayes v 3.2.118. While ML was performed with raxmlGUI19. Nucleotide substitution model selection was done using the Akaike information criterion implemented in MrModeltest v 2.320. ProtTest version 1.421 was used to select the amino acid substitution model. The GTR + I + G model was the best for nucleotide data, and the MtREV + I + G + F model was the best for amino acids. ML analysis was performed on 1000 bootstrapped datasets. The Bayesian analysis ran as 4 simultaneous MCMC chains for 10,000,000 generations, sampled every 100 generations, with a burn-in of 5000 generations. Convergence was tested for the Bayesian analysis by ensuring that the average standard deviation of split frequencies was less than 0.01. Additionally, we tested for sufficient parameter sampling by ensuring an ESS of more than 200 using the software Tracer v1.622. The resulting phylogenetic trees were visualized in FigTree v1.4.223.

Results and Discussion

Genome structure, organization and composition

The complete sequences of A. rubiginosa and R. menciana, 15,282 bp and 15,636 bp in size, respectively, were determined and submitted to GenBank (Accession No. KT153024 and KT258908). These two mt genomes both contain 13 PCGs, two rRNA genes, 22 tRNA genes, and an A + T-rich region. Four of the 13 PCGs (ND5, ND4, ND4L, and ND1), 8 tRNAs (trnQ, trnC, trnY, trnF, trnH, trnP, trnL (CUN), and trnV) and two rRNAs (rrnL and rrnS) are coded with the minority-strand, while the remaining 23 genes are encoded by the majority-strand in A. rubiginosa and R. menciana (Fig. 1, Table 3). The length of the R. menciana mt genome (15,636 bp) is larger than A. rubiginosa (15,282 bp) and smaller than that of Bombyx mandarina (15,928 bp), B. mori (15,643 bp) and B. huttoni (15,638 bp), but it falls within the range (15,236–15,928 bp) of other known Bombycoidea mt genomes in our study (Table 1). The nucleotide composition of the A. rubiginosa mt genome is as follows (Table 4): A = 6,334 (41.4%), T = 6,126 (40.1%), G = 1,144 (7.5%), and C = 1,678 (11.0%). The nucleotide composition of the A. rubiginosa mt genome is A + T rich (81.5%) but is lower than that of R. menciana (82.2%). The AT skew24 is slightly positive and the GC skew is negative in these two mt genomes (Table 4), indicating an obvious bias towards the use of As and Cs. The order and orientation of genes in the A. rubiginosa and R. menciana mt genomes are identical to other bombicoid insects sequenced to date25, but differ from ancestral insects26. The placement of the trnM gene in the A. rubiginosa and R. menciana mt genome is trnM-trnI-trnQ, while in ancestral insects, it is trnI-trnQ-trnM (Fig. 2). Ghost moths exhibited the ancestral insect placement of the trnM gene cluster27. The hypothesis that the ancestral arrangement of the trnM gene cluster underwent rearrangement after Hepialoidea diverged from other Lepidopteran lineages was supported by our results in A. rubiginosa and R. menciana. The tRNA rearrangements are generally presumed to be a consequence of tandem duplication of partial mt genomes28,29,30,31, followed by random or non-random loss of the duplicated copies28, 32, 33.

Figure 1
figure 1

Circular map of the mt genomes of A. rubiginosa (A) and R. Menciana (B). tRNA-Ser1, tRNA-Ser2, tRNA-Leu1 and tRNA-Leu2 denote codons tRNA-Ser1 (AGN), tRNA-Ser2 (UCN), tRNA-Leu1 (CUN), and tRNA-Leu2 (UUR), respectively.

Table 3 Summary of the mt genomes of A. rubiginosa and R. menciana.
Table 4 Composition and skewness in the A. rubiginosa and R. menciana mt genomes.
Figure 2
figure 2

The mitochondrial gene order of ancestral insects and A. rubiginosa and R. menciana.

Protein-coding genes

Summaries of the genes that make up the mt genomes of A. rubiginosa and R. menciana are given in Table 3. Twelve of the thirteen PCGs use standard ATN start codons in A. rubiginosa and R. menciana, except for cox1, which is initiated by the CGA codon (arginine). The CGA codon is highly conserved across most insect groups14, 34. In A. rubiginosa, eight PCGs (atp8, atp6, cox3, nad4, nad4L, nad6, cob, and nad1) have the complete stop codon TAA, while the remaining five terminate with either T (nad2, cox1, cox2, and nad3) or A (nad5). In R. menciana, ten PCGs (nad2, atp8, atp6, cox3, nad3, nad4, nad4L, nad6, cob, and nad1) have the complete stop codon TAA, while the remaining three terminate with either T (cox1 and cox2) or TA (nad5). For A. rubiginosa, the average AT content of the 13 PCGs is 80.3%, and the overall AT and GC skews are –0.133 and 0.038, showing that T and G are more abundant than A and C. Similarly, the A + T composition of the 13 PCGs in the mt genome of R. menciana is 80.7%, while the AT and GC skews are –0.130 and 0.030, showing that T and G are more abundant than A and C (Table 4). Relative synonymous codon usage (RSCU) values for the A. rubiginosa and R. menciana mt genomes are summarized in Table 5 and Fig. 3, which show that NNT and NNA are more frequent than NNG and NNC, indicating a strong A or T bias in the third codon position. The most common amino acids for A. rubiginosa and R. menciana mitochondrial proteins are Leu (UUR), Ile, and Phe (Fig. 4).

Table 5 Codon number and RSCU in the A. rubiginosa and R. menciana mitochondrial PCGs.
Figure 3
figure 3

The relative synonymous codon usage (RSCU) in the mt genomes of A. rubiginosa (A) and R. menciana (B).

Figure 4
figure 4

Amino acid composition in the mt genomes of A. rubiginosa (A) and R. menciana (B).

Transfer RNA and ribosomal RNA genes

A. rubiginosa and R. menciana both contain 22 tRNAs. Eight of these tRNAs (trnQ, trnC, trnY, trnF, trnH, trnP, trnL(CUN), and trnV) are coded with the minority-strand, while the remaining 14 tRNA genes are encoded by the majority-strand in A. rubiginosa and R. menciana (Table 3). The total length of the 22 tRNAs in the mt genome of A. rubiginosa is 1461 bp, and their A + T content is 81.5%. Similarly, the total length of the 22 tRNAs in the mt genome of R. menciana is 1460 bp and their A + T content is 81.8%. The AT skew is slightly positive and the GC skew is negative in the 22 tRNAs of A. rubiginosa and R. menciana (Table 4). The rrnL and rrnS genes of A. rubiginosa and R. menciana are located between trnL1(CUN) and trnV and between trnV and the A + T-rich region, respectively. The A + T content of the two rRNA genes is 84.7% in A. rubiginosa, which is lower than that of R. menciana (85.9%) (Table 4).

A + T-rich region

The A + T-rich regions of A. rubiginosa and R. menciana are located between rrnS and trnM and were 399 bp and 604 bp long, respectively. The A + T-rich regions contain 92.2% and 94.0% A + T contents in A. rubiginosa and R. menciana, respectively, which were the highest across the studied mt genomes (Table 4). The AT skew and GC skew of A. rubiginosa are −0.054 and −0.097, indicating an obvious bias towards the use of T and C. However, in the R. menciana A + T-rich region, AT skew is −0.011 and the number of G and C is the same, meaning that T is more abundant than A and that the usage of G and C is equal. Several conserved structures found in other bombicoid species mt genomes are also observed in the A + T-rich regions of A. rubiginosa and R. menciana. The conserved “ATAGA + poly T” motif is located downstream of the rrnS gene in the A + T-rich region of A. rubiginosa and R. menciana, which may represent the origin of minority or light strand replication31, and is conserved in lepidopteran mt genomes. Multiple tandem repeat elements are typically present in the A + T-rich region of most insects. Only one tandem repeat was found in the A. rubiginosa mt genome (Fig. S1). We identified two tandem repeats elements in the A + T-rich region of R. menciana (Fig. S2).

The mt genome of R. menciana has been previously sequenced, and two complete mt genomes of the species are available35, 36. However, in the present study, there was a difference of approximately 300 nt in the length of the mt genome of R. menciana compared to the two published sequences35, 36. The excess 300 nt of R. menciana in the present study mainly arose from the upper area of the A + T-rich region (Fig. S2). The A + T-rich regions of the R. menciana (Ankang Shaanxi) and R. menciana (Korea) mt genomes were identical. The length of tandem repeats of the A + T-rich region of R. menciana in this study was greater than the two published sequences.

Phylogenetic analysis

Phylogenetic analyses were based on sequences of 13 PCGs of 34 mt genomes using two methods (BI and ML) and alignments performed by MAFFT. B. panterinaria and P. atrilineata were used as outgroups. Thirty bombycoid species mt genomes that were downloaded from GenBank (plus A. rubiginosa and R. menciana) represent five families belonging to the Bombycoidea: Bombycidae, Lasiocampidae, Saturniidae, Brahmaeidae and Sphingidae. It is obvious that A. rubiginosa and Daphnis nerii 37 are clustered on one branch in the phylogenetic tree with high nodal support values. The analyses show that A. rubiginosa belongs in the Sphingidae family. The three phylogenetic trees consistently showed that R. menciana from Ankang was remarkably different from those of Korea and Xuancheng. The bombycid species were Andraca theae + ((R. menciana (Ankang)35 + (R. menciana (Xuancheng) + R. menciana (Korea)36)) + (B. huttoni + (B. mandarina 38 + B. mori))), indicating that R. menciana belongs in the Bombycidae family (Figs 5, 6 and 7).

Figure 5
figure 5

Phylogenetic tree derived for Bombycoidea using BI and ML analyses based on amino acid sequences and using MAFFT for alignment. Bayesian posterior probability (BPP) and bootstrap values (BP) of each node are shown as BPP/BP, with maxima of 1.00/100.

Figure 6
figure 6

Phylogenetic tree derived for Bombycoidea using BI analysis based on nucleotide sequences using MAFFT for alignment.

Figure 7
figure 7

Phylogenetic tree derived for Bombycoidea using BI and ML analyses based on 16S ribosomal RNA and 12S ribosomal RNA sequences of 33 species (there are no 16S ribosomal RNA and 12S ribosomal RNA sequences in the Apatelopteryx phenax (KJ508055)). Bayesian posterior probability (BPP) and bootstrap value (BP) of each node are shown as BPP/BP, with maxima of 1.00/100.

A problem remains with the phylogenetic relationships of families among the Bombycoidea in our study. The phylogenetic trees based on ML and BI analyses of amino acid sequences showed that the phylogenetic relationships were (Lasiocampidae + Brahmaeidae) + (Bombycidae + (Sphingidae + Saturniidae)) (Fig. 5), which is similar to some past studies10, 39. However, the phylogenetic tree based on BI analysis of nucleotide sequences showed that the phylogenetic relationships were (Lasiocampidae + Brahmaeidae) + (Sphingidae + (Bombycidae + Saturniidae)) (Fig. 6). The phylogenetic relationships of families in our study (Figs 5, 6 and 7) differ from the findings of other previous studies, where the families group as Lasiocampidae + (Saturniidae + (Bombycidae + Sphingidae))40. The reason for these differences may be the incorporation of complete mt genomes39. The relationships in the Bombycoidea remain unsettled. More mt genomes from Bombycoidea insects are required to resolve the positions of Bombycoidea in the future.