Comparative Mitochondrial Genome Analysis of Eligma narcissus and other Lepidopteran Insects Reveals Conserved Mitochondrial Genome Organization and Phylogenetic Relationships

In this study, we sequenced the complete mitochondrial genome of Eligma narcissus and compared it with 18 other lepidopteran species. The mitochondrial genome (mitogenome) was a circular molecule of 15,376 bp containing 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes and an adenine (A) + thymine (T) − rich region. The positive AT skew (0.007) indicated the occurrence of more As than Ts. The arrangement of 13 PCGs was similar to that of other sequenced lepidopterans. All PCGs were initiated by ATN codons, except for the cytochrome c oxidase subunit 1 (cox1) gene, which was initiated by the CGA sequence, as observed in other lepidopterans. The results of the codon usage analysis indicated that Asn, Ile, Leu, Tyr and Phe were the five most frequent amino acids. All tRNA genes were shown to be folded into the expected typical cloverleaf structure observed for mitochondrial tRNA genes. Phylogenetic relationships were analyzed based on the nucleotide sequences of 13 PCGs from other insect mitogenomes, which confirmed that E. narcissus is a member of the Noctuidae superfamily.

Scientific RepoRts | 6:26387 | DOI: 10.1038/srep26387 nucleotide sequence of 13 PCGs of E. narcissus was used to provide insight into the phylogenetic relationships among lepidopteran superfamilies.
In the mitogenome of E. narcissus, there are a total of 34 bp that contain overlapping genes. The 34 bp gene overlap in 11 positions of the E. narcissus mitogenome ranges in size from 1 to 8 bp, and the longest overlapping region is present between trnW and trnC. A total of 192 bp of intergenic spacers are dispersed in 16 regions and range in size from 1 to 52 bp, with the longest spacer present between trnQ and nad2. The length of the intergenic spacers is considerably longer than that of A. selene (137 bp over 13 regions) but shorter than that of O. lunifer (371 bp over 20 regions) 9,13 .
Protein-coding genes and codon usage. The 13 PCGs of E. narcissus are 11,181 bp in length, accounting for 72.72% of the entire mitochondrial genome. Like other lepidopterans, all 13 PCGs in the E. narcissus mitogenome have ATN as their start codon, except for the cox1 gene, which uses CGA instead. The start codon of the cox1 is rare in insect mtDNA; the canonical codons TTG, ACG, TTA and TTAG are often reported as the cox1 start codons [15][16][17][18]  incomplete termination codon T for cox1 and cox2, and TA for nad4 (Table 3). Incomplete stop codons are commonly observed in lepidopteran species 9,19 . The 13 PCGs of the E. narcissus mitogenome contain 3727 codons in total, which is within the range of 3687 in H. armigera and 3742 in Agrotis ipsilon. The complete nucleotide sequences of seven lepidopteran insects were    downloaded from GenBank to investigate the codon usage among lepidopterans. These mitogenomes are divided into five superfamilies: four species belong to Noctuidae, and the others belong to Bombycoidea, Pyraloidea, Gelechioidea, and Papilionoidea (Fig. 2). We looked at the behavior of codon families in the PCGs and found that Asn, Ile, Leu, Phe, and Tyr are the most abundant amino acids in the E. narcissus mitogenome (Fig. 3). RSCU for Lepidoptera is shown in Fig. 4. All possible codons are present in the PCGs of the E. narcissus mitogenome, whereas some codons, such as GCG, GGC, GTG and CGC, are not found in four other species. Previous research indicates that codons with high G and C content are likely not to be favored, a phenomenon that is found in some lepidopteran insects 7,20 .  T-rich region, had a length of 786 bp. The negative AT skew (− 0.335) indicated the occurrence of more Ts than As. The A+ T content of the two rRNA genes was 84.50%, which was within the range of 82.15% in O. lunifer and 85.42% in M. sexta ( Table 2). The E. narcissus mitogenome contained 22 tRNAs interspersed throughout the entire genome and ranging in size from 63 to 71 bp, which comprised 1450 bp of the total mitogenome overall. Of these genes, 14 were encoded by the H-strand, and eight were encoded by the L-strand. The predicted structures of the tRNAs are shown in Fig. 5. The A+ T content of the 22 tRNAs was 82.00%, with a positive AT skew (0.013). A total of 11 mismatched base pairs were identified in the tRNAs of E. narcissus, three of them being mismatched base pairs (1 A-A and 2 U-U) and eight being G-U wobble pairs. In many insect mitogenomes, the trnS1 (AGN) gene has an unusual secondary structure lacking a stable stem-loop structure in the DHU arm 9,19 ; however, we found that all the tRNA genes in E. narcissus could be folded into the expected typical cloverleaf secondary structure observed in mitochondrial tRNA genes. All of the secondary structures were drawn using the RNA structure program.
The A+T-rich region. The A+ T-rich region, known for the initiation of replication in vertebrates and invertebrates, is located between the rrnS and trnM genes in E. narcissus. This region is 434 bp in length, which is longer than the regions in A. selene (339 bp) and M. sexta (324 bp) but shorter than the regions in Chinese B. mandarina (484 bp) and P. flavescens (541 bp) ( Table 2). The region contains the highest A+ T content (96.54%) in the mitogenome. There are some conserved structures observed in the E. narcissus A+ T-rich region, including the motif ' ATAGA' followed by an 19 bp poly-T stretch, the motif ' ATTTA' followed by a microsatellite-like (AT) motif 6 and a poly-A element upstream of the trnM gene (Fig. 6). The poly-T is located upstream of the rrnS 5′ -end and is preceded by the ' ATAGA' motif, a structural feature that is found in the majority of lepidopteran insects (Fig. 7). Previous research indicates that the poly-T element may be involved in controlling transcription and replication initiation 21 . Two repeat elements are found in the A+ T-rich region of the E. narcissus mitogenome. Some lepidopteran insects also have repeat elements in the A+ T-rich region. The A+ T-rich region of S. longistyla harbors two repeat elements 22 , whereas the C. medinalis and C. suppressalis mitogenomes contain a duplicated 25 bp repeat element and a duplicated 36 bp repeat element, respectively 20 .
Phylogenetic analyses. In this study, the mitogenomes of 18 lepidopteran species representing six lepidopteran superfamilies (Noctuoidea, Bombycoidea, Pyraloidea, Tortricoidea, Geometroidea, Papilionoidea) were downloaded from GenBank. The phylogenetic relationships among the superfamilies of Lepidoptera were reconstructed based on concatenated nucleotide sequences of 13 PCGs by using the maximum likelihood (ML) method (Fig. 8). The phylogenetic analyses show that E. narcissus was within Noctuidae. Noctuidae is closely related to Bombycoidea. Although these results do not conflict with other published trees 7,23 , more studies of a variety of species are needed to provide further insights into the relationships among Noctuidae species.

Materials and Methods
Sample collection and DNA extraction. No specific permits were required for the insect collection necessary for this study. E. narcissus larvae were collected in Hefei city, China. Specimens identified as E. narcissus were preserved in 100% ethanol and stored at − 80 °C. Total genomic DNA was extracted with the Aidlab Genomic DNA Extraction Kit (Aidlab Co., Beijing, China) according to the manufacturer's instructions. The DNA was examined using 1% agarose gels and was then used for PCR amplification of the complete mitogenome.
PCR amplification, cloning and sequencing. To amplify the entire mitogenome of E. narcissus, nine pairs of primers were designed based on the known mitogenomes of lepidopteran species (Beijing Sunbiotech Co., Ltd., Beijing, China) ( Table 4). PCR reactions were carried out in a 50 μ l reaction volume, including 5 μ l of   10× long Taq buffer (Mg2+ plus), 5 μ l of dNTP (20 mM), 1.5 μ l of DNA template from a single specimen, 2 μ l of each primer (10 μ M), 35 μ l of sterilized distilled water and 0.5 μ l (1 unit) of long Taq (Aidlab Co., Beijing, China). The conditions for PCR amplification were as follows: an initial denaturation for 4 min at 94 °C, followed by 35 cycles of 30 s at 94 °C, 40 s at 46-57 °C (depending on primer combination) and 1-3 min (depending on putative length of the fragments) at 72 °C, as well as a final extension step of 72 °C for 10 min. All PCR reactions were performed in a BIO-RAD thermal cycler. The above PCR products were resolved by agarose gel electrophoresis (1% w/v) and purified using a DNA gel extraction kit (TaKaRa Co., Dalian, China). The purified PCR fragments were ligated into the T-vector (TaKaRa Co., Dalian, China) and then transformed into competent Escherichia coli DH5α . The positive recombinant clone with an insert was sequenced at least three times (Invitrogen Co., Ltd., Shanghai, China).
Genome assembly and gene annotation. The E. narcissus final consensus mtDNA sequence was performed using the Lasergene software package (DNASTAR Inc. Madison, USA). Sequence annotation was performed using the Online Blast Tool of the NCBI web site (http://blast.ncbi.nlm.nih.gov/Blast). The overlapping regions and intergenic spacers between genes were counted manually. The base composition of nucleotide sequences was described by skewness and was measured according to the following formulas: The A+ T content and relative synonymous codon usage (RSCU) values were calculated using MEGA 5.0 24 . Transfer RNA genes were identified using the tRNAscan-SE program software available online at http://lowelab.ucsc.edu/tRNAscan-SE/ 25 . The secondary structures of tRNA genes were analyzed by comparison with the nucleotide sequences of other insect tRNA sequences. The nucleotide sequences of the PCGs were translated on the basis of the invertebrate mtDNA genetic code. Alignments of PCGs from other lepidopteran mitogenome sequences were performed using ClustalX software 26 . The entire A+ T-rich region was subjected to a search for tandem repeats using the Tandem Repeats Finder program (http://tandem.bu.edu/trf/ trf.html) 27 .
Phylogenetic analysis. Along with the E. narcissus mitochondrial genome, 21 available insect mitogenomes were downloaded from GenBank to illustrate the phylogenetic relationships among lepidopteran insects. The mitogenomes of Drosophila incompta (NC_025936) 28 and Anopheles gambiae (NC_002084) 29 were downloaded and used as outgroups. The nucleotide and putative amino acid regions for each of the 13 PCGs were aligned using ClustalW, as implemented in the program MEGA. To select the conserved regions of the putative amino acids, all alignments were analyzed with the program Gblock 0.91b using default settings 30 . Phylogenetic analysis was conducted using the maximum likelihood (ML) method, as implemented in the MEGA 5.0 program 31 . This method was used to infer phylogenetic trees with 1000 bootstrap replicates. Substitution model selection was also conducted based on the lowest BIC scores (Bayesian Information Criterion) using MEGA 5.0. The mtREV24 + G + F model was the appropriate models for the amino acid sequence dataset.