Introduction

The Ailanthus defoliator Eligma narcissus is a member of the family Noctuidae (superfamily Noctuidae), which has spread all over China. This insect pest completes three or four generations every year. It is an important species that is harmful to many plants of economic importance, such as Ailanthus altissima, Amygdalus persica L., Toona sinensis, and others1. Many studies have been conducted on the biology and chemical pest control of E. narcissus1,2. However, no research has been conducted on its mitochondrial genome.

Insect mitochondrial DNA (mtDNA) has been widely used for species identification and for population genetics and molecular evolution studies due to the low or lack of sequence recombination and its maternal inheritance3. In most insects, mtDNA is composed of small, double strands: an L-strand, which is biased in favor of As and Cs, and an H-strand, which has an abundance of Ts and Gs4. Moreover, the mtDNA is a relatively conserved circular molecule of 14–19 kb in length5 and includes 13 protein-coding genes (PCGs: atp6, atp8, cox1, cox2, cox3, cob, nad1, nad2, nad3, nad4, nad5, nad6, and nad4L), large and small ribosomal RNA (rrnL and rrnS) genes, 22 transfer RNA (tRNA) genes and a large non-coding element called the A+T-rich region that contains sequences essential for the initiation of transcription and gene replication6.

Lepidoptera (moths and butterflies) is the second largest order in Insecta, accounting for more than 155,000 insect species7. Although the superfamily Noctuidae is the largest superfamily in Lepidoptera, only few mitogenomes of Noctuidae are available in GenBank (Table 1). In the present study, we determined the complete mitogenome sequence of E. narcissus and compared the nucleotide composition, codon usage, tRNA secondary structure and A+T rich region with those of other lepidopteran species. Furthermore, the concatenated nucleotide sequence of 13 PCGs of E. narcissus was used to provide insight into the phylogenetic relationships among lepidopteran superfamilies.

Table 1 Lepidopteran mitogenomes used in this study.

Results and Discussion

Genome organization and base composition

The complete mitogenome of E. narcissus is a circular molecule of 15,376 bp in size (Fig. 1), which is bigger than the genomes of A. selene (15,236 bp) and H. armigera (15,347 bp) but smaller than the genomes of P. flavescens (15,659 bp) and M. sexta (15,516 bp). The mitogenome of E. narcissus contains a remarkably conserved set of 37 genes, including 13 protein-coding genes (PCGs: atp6, atp8, cox1, cox2, cox3, cob, nad1, nad2, nad3, nad4, nad5, nad6, and nad4L), large and small ribosomal RNA (rrnL and rrnS) genes, 22 transfer RNA (tRNA) genes and a large non-coding element called the A+T-rich region. The gene order is the same as in other sequenced Noctuidae mitogenomes8,9,10, with a trnM-trnI-trnQ order, which is different from the ancestral trnI-trnQ-trnM order11. The nucleotide composition (A: 40.78%, T: 40.21%, G: 7.68%, C: 11.33%) of the E. narcissus mitogenome is biased toward A+T nucleotides (80.99%), which is a higher percentage than that of six other lepidopterans but a lower percentage than that of M. sexta (81.79%) (Table 2). The positive AT skew (0.007) indicates the occurrence of more As than Ts, similar to some lepidopterans such as H. armigera (0.001)12, O. lunifer (0.030)13, and the Chinese B. mandarina (0.057)14.

Figure 1: Circular map of the mitogenome of E. narcissus. cox1, cox2 and cox3 refer to the cytochrome c oxidase subunits.
figure 1

cob refers to cytochrome b. nad1-nad6 refer to NADH dehydrogenase components. rrnL and rrnS refer to ribosomal RNAs. tRNAs are denoted as a one-letter symbol according to the IUPAC-IUB single-letter amino acid codes. Gene names with lines indicate that these genes are located on L strand, whereas the others are located on H strand.

Table 2 Composition and skewness in the lepidopteran mitogenomes.

In the mitogenome of E. narcissus, there are a total of 34 bp that contain overlapping genes. The 34 bp gene overlap in 11 positions of the E. narcissus mitogenome ranges in size from 1 to 8 bp, and the longest overlapping region is present between trnW and trnC. A total of 192 bp of intergenic spacers are dispersed in 16 regions and range in size from 1 to 52 bp, with the longest spacer present between trnQ and nad2. The length of the intergenic spacers is considerably longer than that of A. selene (137 bp over 13 regions) but shorter than that of O. lunifer (371 bp over 20 regions)9,13.

Protein-coding genes and codon usage

The 13 PCGs of E. narcissus are 11,181 bp in length, accounting for 72.72% of the entire mitochondrial genome. Like other lepidopterans, all 13 PCGs in the E. narcissus mitogenome have ATN as their start codon, except for the cox1 gene, which uses CGA instead. The start codon of the cox1 is rare in insect mtDNA; the canonical codons TTG, ACG, TTA and TTAG are often reported as the cox1 start codons15,16,17,18. Ten of the 13 PCGs harbor the complete stop codon TAA, whereas the other three possess the incomplete termination codon T for cox1 and cox2, and TA for nad4 (Table 3). Incomplete stop codons are commonly observed in lepidopteran species9,19.

Table 3 Annotation and gene organization of the Eligma narcissus mitogenome.

The 13 PCGs of the E. narcissus mitogenome contain 3727 codons in total, which is within the range of 3687 in H. armigera and 3742 in Agrotis ipsilon. The complete nucleotide sequences of seven lepidopteran insects were downloaded from GenBank to investigate the codon usage among lepidopterans. These mitogenomes are divided into five superfamilies: four species belong to Noctuidae, and the others belong to Bombycoidea, Pyraloidea, Gelechioidea, and Papilionoidea (Fig. 2). We looked at the behavior of codon families in the PCGs and found that Asn, Ile, Leu, Phe, and Tyr are the most abundant amino acids in the E. narcissus mitogenome (Fig. 3). RSCU for Lepidoptera is shown in Fig. 4. All possible codons are present in the PCGs of the E. narcissus mitogenome, whereas some codons, such as GCG, GGC, GTG and CGC, are not found in four other species. Previous research indicates that codons with high G and C content are likely not to be favored, a phenomenon that is found in some lepidopteran insects7,20.

Figure 2: Comparison of codon usage in mitochondrial genomes in Lepidoptera.
figure 2

Lowercase letters (a–e) above the species name represent the superfamily that the species belong to ((a) Noctuidae, (b) Bombycoidea, (c) Pyraloidea, (d) Gelechioidea, (e) Papilionoidea).

Figure 3: Codon distribution in Lepidoptera.
figure 3

CDspT, codons per thousand codons.

Figure 4: The mitochondrial genome relative synonymous codon usage (RSCU) across five superfamilies in Lepidoptera.
figure 4

Codon families are provided on the X axis. Codons indicated above the bar are not present in the mitogenome.

Ribosomal RNA genes and transfer RNA genes

As with all other insect mitogenome sequences, there were two rRNAs in E. narcissus with a total length of 2122 bp. The large ribosomal gene (rrnL), located between trnL1(CUN) and trnV, had a length of 1335 bp, whereas the small one (rrnS), located between trnV and the A+T- rich region, had a length of 786 bp. The negative AT skew (−0.335) indicated the occurrence of more Ts than As. The A+T content of the two rRNA genes was 84.50%, which was within the range of 82.15% in O. lunifer and 85.42% in M. sexta (Table 2).

The E. narcissus mitogenome contained 22 tRNAs interspersed throughout the entire genome and ranging in size from 63 to 71 bp, which comprised 1450 bp of the total mitogenome overall. Of these genes, 14 were encoded by the H-strand, and eight were encoded by the L-strand. The predicted structures of the tRNAs are shown in Fig. 5. The A+T content of the 22 tRNAs was 82.00%, with a positive AT skew (0.013). A total of 11 mismatched base pairs were identified in the tRNAs of E. narcissus, three of them being mismatched base pairs (1 A–A and 2 U-U) and eight being G-U wobble pairs. In many insect mitogenomes, the trnS1 (AGN) gene has an unusual secondary structure lacking a stable stem-loop structure in the DHU arm9,19; however, we found that all the tRNA genes in E. narcissus could be folded into the expected typical cloverleaf secondary structure observed in mitochondrial tRNA genes. All of the secondary structures were drawn using the RNA structure program.

Figure 5
figure 5

Predicted secondary cloverleaf structures of E. narcissus tRNA genes.

The A+T-rich region

The A+T-rich region, known for the initiation of replication in vertebrates and invertebrates, is located between the rrnS and trnM genes in E. narcissus. This region is 434 bp in length, which is longer than the regions in A. selene (339 bp) and M. sexta (324 bp) but shorter than the regions in Chinese B. mandarina (484 bp) and P. flavescens (541 bp) (Table 2). The region contains the highest A+T content (96.54%) in the mitogenome. There are some conserved structures observed in the E. narcissus A+T-rich region, including the motif ‘ATAGA’ followed by an 19 bp poly-T stretch, the motif ‘ATTTA’ followed by a microsatellite-like (AT) motif6 and a poly-A element upstream of the trnM gene (Fig. 6). The poly-T is located upstream of the rrnS 5′-end and is preceded by the ‘ATAGA’ motif, a structural feature that is found in the majority of lepidopteran insects (Fig. 7). Previous research indicates that the poly-T element may be involved in controlling transcription and replication initiation21. Two repeat elements are found in the A+T-rich region of the E. narcissus mitogenome. Some lepidopteran insects also have repeat elements in the A+T-rich region. The A+T-rich region of S. longistyla harbors two repeat elements22, whereas the C. medinalis and C. suppressalis mitogenomes contain a duplicated 25 bp repeat element and a duplicated 36 bp repeat element, respectively20.

Figure 6: Features present in the A+T-rich region of E. narcissus.
figure 6

The motif, poly-T stretch, microsatellite T/A repeat sequences and poly-A stretch are colored in red, green, blue and purple, respectively.

Figure 7: Sequence alignment of the partial D-loop region of 10 moth species.
figure 7

The boxed nucleotides indicate the ‘ATAGA’ conserved motif.

Phylogenetic analyses

In this study, the mitogenomes of 18 lepidopteran species representing six lepidopteran superfamilies (Noctuoidea, Bombycoidea, Pyraloidea, Tortricoidea, Geometroidea, Papilionoidea) were downloaded from GenBank. The phylogenetic relationships among the superfamilies of Lepidoptera were reconstructed based on concatenated nucleotide sequences of 13 PCGs by using the maximum likelihood (ML) method (Fig. 8). The phylogenetic analyses show that E. narcissus was within Noctuidae. Noctuidae is closely related to Bombycoidea. Although these results do not conflict with other published trees7,23, more studies of a variety of species are needed to provide further insights into the relationships among Noctuidae species.

Figure 8: Phylogenetic analysis of lepidopteran insects.
figure 8

The phylogenetic tree was constructed using the maximum likelihood (ML) method, and bootstrap values (1000 repetitions) of the branches were indicated. D. incompta (NC_025936) and A. gambiae (NC_002084) were used as outgroups.

Materials and Methods

Sample collection and DNA extraction

No specific permits were required for the insect collection necessary for this study. E. narcissus larvae were collected in Hefei city, China. Specimens identified as E. narcissus were preserved in 100% ethanol and stored at −80 °C. Total genomic DNA was extracted with the Aidlab Genomic DNA Extraction Kit (Aidlab Co., Beijing, China) according to the manufacturer’s instructions. The DNA was examined using 1% agarose gels and was then used for PCR amplification of the complete mitogenome.

PCR amplification, cloning and sequencing

To amplify the entire mitogenome of E. narcissus, nine pairs of primers were designed based on the known mitogenomes of lepidopteran species (Beijing Sunbiotech Co., Ltd., Beijing, China) (Table 4). PCR reactions were carried out in a 50 μl reaction volume, including 5 μl of 10× long Taq buffer (Mg2+ plus), 5 μl of dNTP (20 mM), 1.5 μl of DNA template from a single specimen, 2 μl of each primer (10 μM), 35 μl of sterilized distilled water and 0.5 μl (1 unit) of long Taq (Aidlab Co., Beijing, China). The conditions for PCR amplification were as follows: an initial denaturation for 4 min at 94 °C, followed by 35 cycles of 30 s at 94 °C, 40 s at 46–57 °C (depending on primer combination) and 1–3 min (depending on putative length of the fragments) at 72 °C, as well as a final extension step of 72 °C for 10 min. All PCR reactions were performed in a BIO-RAD thermal cycler.

Table 4 List of primers used to amplify the mitogenome of Eligma narcissus.

The above PCR products were resolved by agarose gel electrophoresis (1% w/v) and purified using a DNA gel extraction kit (TaKaRa Co., Dalian, China). The purified PCR fragments were ligated into the T-vector (TaKaRa Co., Dalian, China) and then transformed into competent Escherichia coli DH5α. The positive recombinant clone with an insert was sequenced at least three times (Invitrogen Co., Ltd., Shanghai, China).

Genome assembly and gene annotation

The E. narcissus final consensus mtDNA sequence was performed using the Lasergene software package (DNASTAR Inc. Madison, USA). Sequence annotation was performed using the Online Blast Tool of the NCBI web site (http://blast.ncbi.nlm.nih.gov/Blast). The overlapping regions and intergenic spacers between genes were counted manually. The base composition of nucleotide sequences was described by skewness and was measured according to the following formulas: AT skew = [A − T]/[A+T]), GC skew = [G − C]/[G+C]. The A+T content and relative synonymous codon usage (RSCU) values were calculated using MEGA 5.024. Transfer RNA genes were identified using the tRNAscan-SE program software available online at http://lowelab.ucsc.edu/tRNAscan-SE/25. The secondary structures of tRNA genes were analyzed by comparison with the nucleotide sequences of other insect tRNA sequences. The nucleotide sequences of the PCGs were translated on the basis of the invertebrate mtDNA genetic code. Alignments of PCGs from other lepidopteran mitogenome sequences were performed using ClustalX software26. The entire A+T-rich region was subjected to a search for tandem repeats using the Tandem Repeats Finder program (http://tandem.bu.edu/trf/trf.html)27.

Phylogenetic analysis

Along with the E. narcissus mitochondrial genome, 21 available insect mitogenomes were downloaded from GenBank to illustrate the phylogenetic relationships among lepidopteran insects. The mitogenomes of Drosophila incompta (NC_025936)28 and Anopheles gambiae (NC_002084)29 were downloaded and used as outgroups. The nucleotide and putative amino acid regions for each of the 13 PCGs were aligned using ClustalW, as implemented in the program MEGA. To select the conserved regions of the putative amino acids, all alignments were analyzed with the program Gblock 0.91b using default settings30. Phylogenetic analysis was conducted using the maximum likelihood (ML) method, as implemented in the MEGA 5.0 program31. This method was used to infer phylogenetic trees with 1000 bootstrap replicates. Substitution model selection was also conducted based on the lowest BIC scores (Bayesian Information Criterion) using MEGA 5.0. The mtREV24 + G + F model was the appropriate models for the amino acid sequence dataset.

Additional Information

How to cite this article: Dai, L.-S. et al. Comparative Mitochondrial Genome Analysis of Eligma narcissus and other Lepidopteran Insects Reveals Conserved Mitochondrial Genome Organization and Phylogenetic Relationships. Sci. Rep. 6, 26387; doi: 10.1038/srep26387 (2016).