Introduction

The insect mitogenome is a circular molecule 14–19 kilobases in length. It contains 22 tRNAs, 13 PCGs, ATPase subunits 6 and 8 (atp6 and atp8), cox1-cox3, cytochrome B (cob), NADH dehydrogenase subunits 1–6 and 4L (nad1-6 and nad4L), the small and large subunit rRNAs (rrnL and rrnS), and a non-coding element termed the A + T-rich region (CR), which contains initiation sites for transcription and replication1,2. Because of their unique features, including coding content conservation, maternal inheritance, and rapid evolution, mitogenomes have been informative in diverse studies of molecular evolution, such as phylogenetics, population genetics, and comparative and evolutionary genomics3,4.

Recent advances in sequencing technologies have led to the rapid increase in mitogenomic data in Genbank, including Lepidopteran mitogenomes. Lepidoptera is the second largest order of insects, accounting for more than 160,000 species5. Zygaenidae is a species-rich superfamily of predominantly diurnal moths with a worldwide distribution. This family is particularly diverse in tropical and subtropical Asia and the Palaearctic region6. Because of the broad geographical distribution of species, extensive variation in coloration patterns, and an intriguing chemical defence system, Zygaenidae is of great interest to lepidopterists and evolutionary biologists7. To date, more than 200 complete or near-complete Lepidopteran mitogenomes are available. However, only one mitogenome of Zygaenoidea has been sequenced8. Monema flavescens Walker, 1855 is a moth of the Limacodidae family found in Korea, Japan, China, and the Russian Far East. The mitogenome of M. flavescens has not been sequenced9.

A better understanding of the Lepidopteran mitogenome requires an expansion of taxon and genome samplings. In this study, we sequence and describe the complete mitogenome of M. flavescens. We reconstruct a phylogenetic tree based on PCG sequences in order to analyse the evolutionary relationships among Lepidopteran insects. The assembly and annotation of the M. flavescens mitogenome will further the study of Zygaenidea mitochondrial genome architecture and phylogenetics. Furthermore, characterization of the M. flavescens mitogenome may provide novel insights into the mechanisms underlying mitogenome evolution.

Methods

DNA Extraction

The moths of M. flavescens were collected in Yancheng, Jiangsu Province. Total DNA was isolated using the Genomic DNA Extraction Kit (SangonBiotech, China) according to the manufacturer’s instructions. Extracted DNA was used to amplify the complete mitogenome by PCR.

PCR Amplification and Sequencing

For amplification of the M. flavescens mitogenome, primer sets were designed based upon mitogenomic sequences obtained from other Lepidopteran insects10,11. PCR was performed under the following conditions: 3 min at 94 °C, followed by 35 cycles of 30 s at 94 °C, 1–3 min at 48–60 °C, and 10 min at 72 °C. All amplifications were performed on an Eppendorf Mastercycler and Mastercycler gradient in 50 μL reaction volumes. The PCR products were separated by agarose gel electrophoresis (1% w/v) and purified using a DNA gel extraction kit (Transgene, China). The purified PCR products were ligated into the T-vector (SangonBiotech, China) and sequenced at least three times.

Sequence Assembly and Gene Annotation

Sequence annotation was performed using NCBI BLAST (http://blast.ncbi.nlm.nih.gov/Blast) and the DNAStar package (DNAStar Inc. Madison, WI, USA). The identity of tRNA genes was verified using the tRNAscan-SE program (http://lowelab.ucsc.edu/tRNAscan-SE/)12. The nucleotide sequences of PCGs were translated with the invertebrate mitogenome genetic code. Alignments of M. flavescens PCGs with various Lepidopteran mitogenomes were performed using Clustal X13. Composition skewness was calculated according to the following formulas: AT skew = [A − T]/[A + T]; GC ske = [G−C]/[G + C]. Codon usage was calculated using MEGA version 6.06. Tandem repeats in the A + T-rich region were predicted using the Tandem Repeats Finder program (http://tandem.bu.edu/trf/trf.html)14.

Phylogenetic Analysis

To reconstruct the phylogenetic relationships among Lepidopteran insects, the complete mitogenomes of Lepidopteran species were obtained from GenBank (Table 1). The amino acid sequences for each of the 13 mitochondrial PCGs were aligned using default settings and concatenated. This concatenated set of amino acid and nucleotide sequences was used for phylogenetic analysis, which was performed with the Bayesian inference (BI) and Maximum Likelihood (ML) methods using MrBayes v 3.2.215 and raxmlGUI, respectively. Alignments of individual genes were performed using MAFFT16. Gblocks was used to identify conserved regions and remove unreliably aligned sequences within the datasets17. For the BI and ML analyses, GTR + I + G was the appropriate model for the nucleotide sequences using MrModeltest 2.3 based on Akaike’s information criterion (AIC)18. MtArt + I + G + F was the appropriate model for the amino acid sequence dataset according to ProtTest 3.4 based on AIC19. Four independent runs were conducted for 10,000,000 generations, and each was sampled every 1,000 generations. All analyses converged within 10,000,000 generations. We assessed the credibility of the results in two ways. First, the average standard deviation of split frequencies was less than 0.05 in the process of Bayesian. Second, we observed sufficient parameter sampling using software Tracer v1.6. The value of ESS was more than 200. This cumulatively suggested that our data was convergent. Posterior probabilities over 0.95 were interpreted as strongly supported. The mitogenomes of Hepialoidea insects were used as outgroups. The resulting phylogenetic trees were visualized in FigTree v1.4.2.

Table 1 List of the complete mitogenomes of Lepidopteran insects.

Results and Discussion

Genome Organization and Base Composition

The mitogenome of M. flavescens is a closed circular molecule 15,396 bp in size. The gene content is typical of other Lepidopteran insect mitogenomes, including 22 tRNA genes (one for each amino acid and two each for leucine and serine), 13 PCGs (cox1-3, nad1-6, nad4L, cob, atp6, and atp8), two mitochondrial rRNA genes (rrnS and rrnL), and a major non-coding region known as the CR. The majority strand (J strand) encodes 23 genes, while the opposite (N) strand encodes 14 genes (Fig. 1, Table 2). The arrangement of the genes within Lepidopteran mitogenomes is usually highly conserved. While the order and orientation of genes in the M. flavescens mitogenome are identical to the only other Zygaenoidea insect sequenced to date, this gene order differs from ancestral insects. Specifically, the placement of the trnM gene between the CR and trnI in the M. flavescens mitogenome (CR, trnM, trnI, trnQ, nad2) differs from ancestral insects in which trnM is located between trnQ and nad2 (CR, trnI, trnQ, trnM, nad2)20. However, the ancestral arrangement of the trnM gene cluster was also found in ghost moths21. This result in M. flavescens supports the hypothesis that the ancestral arrangement of the trnM gene cluster underwent rearrangement after Hepialoidea diverged from other Lepidopteran lineages. The tRNA gene rearrangements are commonly considered to be a consequence of tandem duplication in a portion of the mitogenome, followed by random or non-random loss of the duplicated copies22.

Table 2 Summary of the mitogenome of M. flavescens.
Figure 1
figure 1

Circular map of the mitogenome of M. flavescens.

The tRNA genes are labelled according to the IUPAC-IUB. Single-letter amino acids above the bar indicate coding sequence on the major strand, whereas amino acids listed below the bar indicate coding sequence on the minor strand. The one-letter symbols S1, S2, L1, and L2 denote codons trnS1(AGN), trnS2(UCN), trnL1(CUN), and trnL2(UUR), respectively.

Skewness, Overlapping, and Intergenic Spacer Regions

The mitogenome of M. flavescens has a 29 bp overlap between genes in six locations, with the longest 9 bp overlap located in between trnW and trnC. The mitogenome of M. flavescens contains 167 bp of intergenic spacer sequence spread over 17 regions, ranging in size from 1 to 50 bp (Table 2). The longest spacer sequence is 50 bp located between the trnQ and nad2 genes, and it is extremely A + T rich. The nucleotide composition of the M. flavescens mitogenome is as follows: A = 6,275 (40.8%), T = 6,115 (39.7%), G = 1,164 (7.5%), and C = 1,842 (12.0%). As observed in other Lepidopterans, the nucleotide composition of the M. flavescens mitogenome is A + T rich (80.5%). This enrichment is lower than in other species, such as D. punctiferalis (80.6%), M. vitrata (80.7%), M. testulalis (80.8%), L. haraldusalis (81.5%), and T. hypsalis and N. noctuella (both 81.4%). In contrast, this enrichment is slightly higher compared to S. incertulas (77.1%), C. suppressalis (79.7%), and D. saccharalis (80.0%). Additionally, the AT skew for the M. flavescens mitogenome is slightly positive, indicating a higher occurrence of A compared to T nucleotides. The GC skew values for the M. flavescens mitogenome are negative, indicating a higher content of C compared to G nucleotides. This is similar to GC skew values observed in all sequenced Lepidopteran mitogenomes to date.

Protein-Coding Genes

The start and stop codons of the 13 PCGs in the M. flavescens mitogenome are shown in Table 2. Like invertebrate mitogenomes, 12 of these PCGs begin with the standard ATN start codon, except for cox1. Sequence alignment revealed that the open reading frame of cox1 starts with a CGA codon, which encodes arginine. The putative start codon CGA is common in insects10,23,24. An unusual start codon for the cox1 gene has also been described in various arthropods25,26,27. In the M. flavescens mitogenome, the canonical termination codon, TAA, occurs in seven PCGs. However, the nad4L gene utilizes A and the cox1, cox2, nad2, nad4, and cob genes utilize T as a truncated stop codon instead. Similar results have also been found in other animal mitochondrial genes28,29,30,31. Relative synonymous codon usage values for the M. flavescens mitogenome are summarized in Table 3 and Fig. 2. The total number of codons in PCGs is 3,716, and the codons CUC, GUC, CCG, UGG, CGG, and AGG are not represented. The most common amino acids in mitochondrial proteins are leucine 2 (Leu 2, 484), isoleucine (Ile, 455), and phenylalanine (Phe, 393), which are likewise highly abundant in mitochondrial proteins in other animals32,33,34. The average AT content of the 13 PCGs is 78.7%. Furthermore, the AT skew of these PCGs is slightly positive, while the GC skew is slightly negative (Table 4).

Table 3 Codon number and RSCU in M. flavescens mitochondrial PCGs.
Table 4 Composition and skewness in the M. flavescens mitogenome.
Figure 2
figure 2

The relative synonymous codon usage (RSCU) in the mitogenome of M. flavescens.

Transfer RNA Genes and Ribosomal RNA Genes

The tRNAscan-SE Search Server was used to predict the structure of the 22 tRNAs present in the M. flavescens mitogenome. Eight tRNAs are encoded by the L-strand and the remaining 14 are encoded by the H-strand. This tRNA genomic architecture is identical to that found in all Lepidopteran species examined to date. Furthermore, all M. flavescens tRNAs display the typical clover-leaf secondary structure observed in most mitochondrial tRNAs with the exception of the trnS1 (AGN) gene. Interestingly, trnS1 (AGN) lacking a stable dihydrouridine arm has been observed in several insects, including Lepidopteran species and metazoan mitogenomes35,36,37,38. A 7 bp amino acid acceptor stem, in addition to the anticodon stem and loop (7 bp), are both conserved in all tRNAs. While a total of 25 unmatched base pairs were detected in these tRNAs (Fig. 3), 18 of them are G-U pairs, which form a weak bond and are well-known non-canonical pairs in tRNA secondary structures. The remaining seven mismatches include one C-U and six U-U pairs. 22 tRNAs in the M. flavescens mitogenome are 1,513 bp long, each of which range in size from 63 to 73 bp. The A + T content is 82.4%. The AT skew for both tRNAs and rRNAs is slightly positive, indicating a higher occurrence of A compared to T nucleotides. The GC skew for both tRNAs and rRNAs is slightly negative, indicating a higher occurrence of C compared to G nucleotides. The two rRNA genes (rrnS and rrnL) present in M. flavescens mitogenome are located between trnL1 (CUN) and trnV or between trnV and the A + T-rich region, respectively. The sizes of rrnL and rrnS are 1,359 bp and 792 bp, respectively. The A + T content of the two rRNAs is 84.5% (Table 4).

Figure 3
figure 3

Predicted secondary structures for the tRNA genes in the M. flavescens mitogenome.

Control Region

The CR possesses essential elements involved in the initiation of replication and transcription of the mitogenome39. The CR of the M. flavescens mitogenome extends over 401 bp and is located between rrnS and trnM. The CR contains the highest A + T content (93.3%) in the entire mitogenome. Both the AT skew and GC skew for the CR are slightly negative, indicating that T and C are more abundant than A and G, respectively. Several conserved structures found in other Lepidopteran mitogenomes are also observed in the A + T-rich region of M. flavescens. This includes the motif ‘ATAGA’ and a poly-T stretch downstream of rrnS, which is widely conserved in Lepidopteran mitogenomes and may represent the origin of minority or light strand replication40. A poly-A commonly observed in other Lepidopteran mitogenomes is also found immediately upstream of the trnM gene. We identified microsatellite (AT)10 elements in the A + T-rich region. Multiple tandem repeat elements are typically present in the A + T-rich region of most insects. However, only three tandem repeats are found in the CR of the M. flavescens mitogenome (Fig. 4).

Figure 4
figure 4

Features present in the A + T-rich region of the M. flavescens mitogenome.

The reverse strand sequence is shown. Coloured nucleotides indicate the ATATG motif (red), the poly-T stretch (blue), two microsatellite T/A repeat sequences (green), and the poly-A stretch (pink). Two tandem repeats 51 bp in length are indicated in red and black single underline.

Phylogenetich Analyses

Phylogenetic relationships within the Zygaenoidea superfamily are highly debated. In the present study, concatenated amino acid and nucleotide sequences of the 13 PCGs from mitogenomes obtained from nine Lepidopteran superfamilies are used to reconstruct phylogenetic relationships by the BI and ML methods (Figs 5 and 6). The monophyly of each superfamily is generally well supported. The best-supported phylogenetic relationship found in this study is as follows: Yponomeutoidea + (Tortricoidea + Zygaenoidea + (Papilionoidea + (Pyraloidea + (Noctuoidea + (Geometroidea + Bombycoidea))))). The analyses show that M. flavescens belongs in the Zygaenoidea superfamily. Both Papilionoidea and Tortricoidea superfamilies are most closely related to Zygaenoidea. More mitogenomes from Zygaenoidea insects were required to resolve the position of Zygaenoidea and the relationships among these superfamilies. Our phylogeny clearly separates and demonstrates a similar topology as that derived from traditional classifications and other molecular data41,42.

Figure 5
figure 5

Phylogenetic trees inferred from amino acid (red) and nucleotide (black) sequences of 13 PCGs of the mitogenome using BI analysis.

Figure 6
figure 6

Phylogenetic trees inferred from amino acid (blue) and nucleotide (black) sequences of 13 PCGs of the mitogenome using ML analysis.

Additional Information

Accession codes: The M. flavescens mitogenome was submitted under the accession number KU946971 to NCBI.

How to cite this article: Liu, Q.-N. et al. The first complete mitochondrial genome for the subfamily Limacodidae and implications for the higher phylogeny of Lepidoptera. Sci. Rep. 6, 35878; doi: 10.1038/srep35878 (2016).