The first complete mitochondrial genome for the subfamily Limacodidae and implications for the higher phylogeny of Lepidoptera

The mitochondrial genome (mitogenome) provides important information for understanding molecular evolution and phylogeny. To determine the systematic status of the family Limacodidae within Lepidoptera, we infer a phylogenetic hypothesis based on the complete mitogenome of Monema flavescens (Lepidoptera: Limacodidae). The mitogenome of M. flavescens is 15,396 base pairs (bp), and includes 13 protein-coding genes (PCGs), two ribosomal RNA (rRNA) genes, 22 transfer RNA (tRNA) genes, and a control region (CR). The AT skew of this mitogenome is slightly negative and the nucleotide composition is also biased towards A + T nucleotides (80.5%). All PCGs are initiated by ATN codons, except for the cytochrome c oxidase subunit 1 (cox1) gene, which is initiated by CGA. All tRNAs display the typical clover-leaf structure characteristic of mitochondrial tRNAs, with the exception of trnS1 (AGN). The mitogenome CR is 401 bp and consists of several features common to Lepidoptera. Phylogenetic analysis using Bayesian Inference (BI) and Maximum Likelihood (ML) based on nucleotide and amino acid sequences of 13 mitochondrial PCGs indicates that M. flavescens belongs to Zygaenoidea. We obtain a well-supported phylogenetic tree consisting of Yponomeutoidea + (Tortricoidea + Zygaenoidea + (Papilionoidea + (Pyraloidea + (Noctuoidea + (Geometroidea + Bombycoidea))))).

The insect mitogenome is a circular molecule 14-19 kilobases in length. It contains 22 tRNAs, 13 PCGs, ATPase subunits 6 and 8 (atp6 and atp8), cox1-cox3, cytochrome B (cob), NADH dehydrogenase subunits 1-6 and 4L (nad1-6 and nad4L), the small and large subunit rRNAs (rrnL and rrnS), and a non-coding element termed the A + T-rich region (CR), which contains initiation sites for transcription and replication 1,2 . Because of their unique features, including coding content conservation, maternal inheritance, and rapid evolution, mitogenomes have been informative in diverse studies of molecular evolution, such as phylogenetics, population genetics, and comparative and evolutionary genomics 3,4 .
Recent advances in sequencing technologies have led to the rapid increase in mitogenomic data in Genbank, including Lepidopteran mitogenomes. Lepidoptera is the second largest order of insects, accounting for more than 160,000 species 5 . Zygaenidae is a species-rich superfamily of predominantly diurnal moths with a worldwide distribution. This family is particularly diverse in tropical and subtropical Asia and the Palaearctic region 6 . Because of the broad geographical distribution of species, extensive variation in coloration patterns, and an intriguing chemical defence system, Zygaenidae is of great interest to lepidopterists and evolutionary biologists 7 . To date, more than 200 complete or near-complete Lepidopteran mitogenomes are available. However, only one mitogenome of Zygaenoidea has been sequenced 8 . Monema flavescens Walker, 1855 is a moth of the Limacodidae family found in Korea, Japan, China, and the Russian Far East. The mitogenome of M. flavescens has not been sequenced 9 . A better understanding of the Lepidopteran mitogenome requires an expansion of taxon and genome samplings. In this study, we sequence and describe the complete mitogenome of M. flavescens. We reconstruct a phylogenetic tree based on PCG sequences in order to analyse the evolutionary relationships among Lepidopteran insects. The assembly and annotation of the M. flavescens mitogenome will further the study of Zygaenidea mitochondrial genome architecture and phylogenetics. Furthermore, characterization of the M. flavescens mitogenome may provide novel insights into the mechanisms underlying mitogenome evolution.  The PCR products were separated by agarose gel electrophoresis (1% w/v) and purified using a DNA gel extraction kit (Transgene, China). The purified PCR products were ligated into the T-vector (SangonBiotech, China) and sequenced at least three times.

Methods
Sequence Assembly and Gene Annotation. Sequence annotation was performed using NCBI BLAST  sequences for each of the 13 mitochondrial PCGs were aligned using default settings and concatenated. This concatenated set of amino acid and nucleotide sequences was used for phylogenetic analysis, which was performed with the Bayesian inference (BI) and Maximum Likelihood (ML) methods using MrBayes v 3.2.2 15 and raxml-GUI, respectively. Alignments of individual genes were performed using MAFFT 16 . Gblocks was used to identify conserved regions and remove unreliably aligned sequences within the datasets 17   information criterion (AIC) 18 . MtArt + I + G + F was the appropriate model for the amino acid sequence dataset according to ProtTest 3.4 based on AIC 19 . Four independent runs were conducted for 10,000,000 generations, and each was sampled every 1,000 generations. All analyses converged within 10,000,000 generations. We assessed the credibility of the results in two ways. First, the average standard deviation of split frequencies was less than 0.05 in the process of Bayesian. Second, we observed sufficient parameter sampling using software Tracer v1.6. The value of ESS was more than 200. This cumulatively suggested that our data was convergent. Posterior probabilities over 0.95 were interpreted as strongly supported. The mitogenomes of Hepialoidea insects were used as outgroups. The resulting phylogenetic trees were visualized in FigTree v1.4.2.  cob, atp6, and atp8), two mitochondrial rRNA genes (rrnS and rrnL), and a major non-coding region known as the CR. The majority strand (J strand) encodes 23 genes, while the opposite (N) strand encodes 14 genes (Fig. 1, Table 2). The arrangement of the genes within Lepidopteran mitogenomes is usually highly conserved. While the order and orientation of genes in the M. flavescens mitogenome are identical to the only other Zygaenoidea insect sequenced to date, this gene order differs from ancestral insects. Specifically, the placement of the trnM gene between the CR and trnI in the M. flavescens mitogenome (CR, trnM, trnI, trnQ, nad2) differs from ancestral insects in which trnM is located between trnQ and nad2 (CR, trnI, trnQ, trnM, nad2) 20 . However, the ancestral arrangement of the trnM gene cluster was also found in ghost moths 21 . This result in M. flavescens supports the hypothesis that the ancestral arrangement of the trnM gene cluster underwent rearrangement after Hepialoidea diverged from other Lepidopteran lineages. The tRNA gene rearrangements are commonly considered to be a consequence of tandem duplication in a portion of the mitogenome, followed by random or non-random loss of the duplicated copies 22 .   Table 2. Like invertebrate mitogenomes, 12 of these PCGs begin with the standard ATN start codon, except for cox1. Sequence alignment revealed that the open reading frame of cox1 starts with a CGA codon, which encodes arginine. The putative start codon CGA is common in insects 10,23,24 . An unusual start codon for the cox1 gene has also been described in various arthropods [25][26][27] . In the M. flavescens mitogenome, the canonical termination codon, TAA, occurs in seven PCGs. However, the nad4L gene utilizes A and the cox1, cox2, nad2, nad4, and cob genes utilize T as a truncated stop codon instead. Similar results have also been found in other animal mitochondrial genes [28][29][30][31] . Relative synonymous codon usage values for the M. flavescens mitogenome are summarized in Table 3 and Fig. 2. The total number of codons in PCGs is 3,716, and the codons CUC, GUC, CCG, UGG, CGG, and AGG are not represented. The most common amino acids in mitochondrial proteins are leucine 2 (Leu 2, 484), isoleucine (Ile, 455), and phenylalanine (Phe, 393), which are likewise highly abundant in mitochondrial proteins in other animals [32][33][34] . The average AT content of the 13 PCGs is 78.7%. Furthermore, the AT skew of these PCGs is slightly positive, while the GC skew is slightly negative (Table 4).

Skewness, Overlapping, and Intergenic Spacer
Transfer RNA Genes and Ribosomal RNA Genes. The tRNAscan-SE Search Server was used to predict the structure of the 22 tRNAs present in the M. flavescens mitogenome. Eight tRNAs are encoded by the L-strand and the remaining 14 are encoded by the H-strand. This tRNA genomic architecture is identical to that found in all Lepidopteran species examined to date. Furthermore, all M. flavescens tRNAs display the typical clover-leaf secondary structure observed in most mitochondrial tRNAs with the exception of the trnS1 (AGN) gene. Interestingly, trnS1 (AGN) lacking a stable dihydrouridine arm has been observed in several insects, including Lepidopteran species and metazoan mitogenomes [35][36][37][38] . A 7 bp amino acid acceptor stem, in addition to the anticodon stem and loop (7 bp), are both conserved in all tRNAs. While a total of 25 unmatched base pairs were detected in these tRNAs (Fig. 3), 18 of them are G-U pairs, which form a weak bond and are well-known non-canonical pairs in tRNA secondary structures. The remaining seven mismatches include one C-U and six U-U pairs. 22 tRNAs in the M. flavescens mitogenome are 1,513 bp long, each of which range in size from 63 to 73 bp. The A + T content is 82.4%. The AT skew for both tRNAs and rRNAs is slightly positive, indicating a higher occurrence of A compared to T nucleotides. The GC skew for both tRNAs and rRNAs is slightly negative, indicating a higher occurrence of C compared to G nucleotides. The two rRNA genes (rrnS and rrnL) present in M. flavescens mitogenome are located between trnL1 (CUN) and trnV or between trnV and the A + T-rich region, respectively. The sizes of rrnL and rrnS are 1,359 bp and 792 bp, respectively. The A + T content of the two rRNAs is 84.5% (Table 4). Control Region. The CR possesses essential elements involved in the initiation of replication and transcription of the mitogenome 39 . The CR of the M. flavescens mitogenome extends over 401 bp and is located between rrnS and trnM. The CR contains the highest A + T content (93.3%) in the entire mitogenome. Both the AT skew and GC skew for the CR are slightly negative, indicating that T and C are more abundant than A and G, respectively. Several conserved structures found in other Lepidopteran mitogenomes are also observed in the A + T-rich region of M. flavescens. This includes the motif ' ATAGA' and a poly-T stretch downstream of rrnS, which is widely conserved in Lepidopteran mitogenomes and may represent the origin of minority or light strand replication 40 . A poly-A commonly observed in other Lepidopteran mitogenomes is also found immediately upstream of the trnM gene. We identified microsatellite (AT) 10 elements in the A + T-rich region. Multiple tandem repeat elements are typically present in the A + T-rich region of most insects. However, only three tandem repeats are found in the CR of the M. flavescens mitogenome (Fig. 4).
Phylogenetich Analyses. Phylogenetic relationships within the Zygaenoidea superfamily are highly debated. In the present study, concatenated amino acid and nucleotide sequences of the 13 PCGs from mitogenomes obtained from nine Lepidopteran superfamilies are used to reconstruct phylogenetic relationships by the BI and ML methods (Figs 5 and 6). The monophyly of each superfamily is generally well supported. The best-supported phylogenetic relationship found in this study is as follows: Yponomeutoidea + (Tortricoidea + Zygaenoidea + (Papilionoidea + (Pyraloidea + (Noctuoidea + (Geometroidea + Bombycoidea))))). The analyses show that M. flavescens belongs in the Zygaenoidea superfamily. Both Papilionoidea and Tortricoidea superfamilies are most closely related to Zygaenoidea. More mitogenomes from Zygaenoidea insects were required to resolve the position of Zygaenoidea and the relationships among these superfamilies. Our phylogeny clearly separates and demonstrates a similar topology as that derived from traditional classifications and other molecular data 41,42 .