Introduction

Leucoma salicis is a moth that is mainly distributed in China, Korea and Japan. It is a notorious plant pest and causes considerable economic losses. It typically consumes willow and tea leaves, influencing quality and quantity of tea products1; and damages roadside and garden trees in urban areas. Traditionally, the identification of this species was based on morphological characteristics of adult moths2. However, the moth appears mainly in June to August, the rest of its life go through egg and larva stages (which has no easily identifying morphological features), requiring eggs and larvae to be reared to adult stage for identification, which is time consuming and labor intensive. Molecular methods for identification are under development, including polymerase chain reaction-restriction fragment length polymorphism (PCR–RFLP)3. Most previous work on L. salicis has focused on sex pheromone synthesis4, or the nuclear polyhedrosis virus that infects larvae5. Previous studies have not focused on the mitochondrial genome, which can provide systematically-informative information for identification, phylogenetic analysis and evolutionary studies on L. salicis.

Insect mitochondrial DNA (mtDNA) is a double-stranded, circular molecule, ranging in size from 14 to 20 kb. It usually contains a conserved set of 37 genes, including seven NADH dehydrogenase (nad1-nad6 and nad4L), three cytochrome c oxidase (cox1-cox3), two ATPase (atp6 and atp8), one cytochrome b (cob), two ribosomal RNA (rrnL and rrnS), 22 transfer RNA (tRNA) genes, and an adenine (A) + thymine (T)-rich region that contains initiation sites for transcription and replication of the genome6,7. Due to its simple genomic organization, high rate of evolution, and almost unambiguous orthology, mtDNA is typically considered to be an informative molecular marker for species identification and in studies of phylogenetic relationships and population structure8,9.

A better understanding of the lepidopteran mitochondrial genome requires expanded taxon sampling. Lepidoptera contains more than 160,000 described species, classified into 45–48 superfamilies10. Lymantriidae includes about 360 genera and over 2500 species, many of which are agriculturally important. Only eight species have completely-sequenced mitogenomes that are publically available in GenBank, despite the large species diversity in the family. In this study, we sequenced and annotated the complete mitogenome sequence of L. salicis, and compared it with those of other members of Lymantriidae. Our results provide novel methods for species identification of an important pest, as well as phylogenetically-informative sequence data that addresses the position of L. salicis within Noctuoidea.

Results

Geno me organization and composition

The mitogenome of L. salicis was a circular DNA molecule, 15,334 bp in length (Fig. 1). It contained the typical insect mitogenome set of 22 tRNAs, 13 PCGs (nad1-6, nad4L, cox1-3, cob, atp6 and atp8), two rRNAs (rrnS and rrnL), and the non-coding A + T-rich region (Table 1). Nucleotide composition was highly A + T biased (A: 42.07%, T: 38.57%, G: 7.22%, C: 12.14%; Table 2). Nucleotide BLAST (blastn) of the entire L. salicis mitogenome against GenBank returned sequence identities with closely related species of 79% (Lachana alpherakii), 78% (Euproctis pseudoconspersa), 78% (Gynaephora menyuanensis), and 77% (Lymantria dispar) (Table S1).

Table 1 Summary of characteristics of the mitogenome of L. salicis.
Table 2 Composition and skew in different lepidopteran mitogenomes.
Figure 1
figure 1

Map of the mitogenome of L. salicis.

tRNA genes are labeled according to the IUPAC-IUB three-letter amino acids; cox1, cox2 and cox3 refer to the cytochrome c oxidase subunits; cob refers to cytochrome b; nad1-nad6 refer to NADH dehydrogenase components; rrnL and rrnS refer to ribosomal RNAs.

Protein-coding genes and codon usage

The PCG region formed 72.9% of the L. salicis mitogenome, and was 11,172 bp long. Nine of 13 PCGs (nad2, cox1, cox2, atp8, atp6, cox3, nad3, nad6 and cob) were encoded on the H-strand, while the remaining four (nad5, nad4, nad4L and nad1) were encoded on the L-strand. Each PCG was initiated by a canonical ATN codon, except for cox1 (Table 1), which was initiated by a CGA codon. Ten of 13 PCGs used a typical TAA termination codon; but cox1 and cox2 terminated with a single T and nad4 terminated with TA (Table 1).

Relative synonymous codon usage (RSCU) analysis of PCGs in L. salicis revealed that the codons encoding Asn, Ile, Leu (UUA, UUG), Lys, Tyr and Phe were the most frequently present, while those encoding Cys and Arg were rare (Fig. 2). In the PCGs of the eight moth species examined, codon distributions and amino acid content were largely consistent among species (Fig. 3). Codons with A or T in the third position were overused in comparison to other synonymous codons: for example, the codons for valine GTC and GTG were rare, while the synonymous codons GTT and GTA were prevalent (Fig. 4). All used codons were present in the PCGs of the L. salicis mitogenome, except for CGC and GGC. This is similar to codon usage in Hyphantria cunea, Spilonota lechriaspis, and Gabala argentata, which respectively lack CGG and CGC, GCG and CGG, and CGG and CGC.

Figure 2
figure 2

Comparison of codon usage within the mitochondrial genome of members of the Lepidoptera.

Lowercase letters (a,b,c,d and e) above species names represent the superfamily to which the species belongs (a: Noctuoidea, b: Geometroidea, c: Bombycoidea, d: Pyraloidea, e: Tortricoidea).

Figure 3
figure 3

Codon distribution in members of the Lepidoptera.

CDspT = codons per thousand codons.

Figure 4
figure 4

Relative Synonymous Codon Usage (RSCU) of the mitochondrial genome of five superfamilies in the Lepidoptera.

Codon families are plotted on the x-axis. Codons indicated above the bar are not present in the mitogenome.

Ribosomal RNA and transfer RNA genes

The large (rrnL) and small (rrnS) ribosomal RNA subunit genes of L. salicis were located between the tRNALeu1(CUN)/tRNAVal and the tRNAVal/A + Trich regions, respectively (Fig. 1, Table 1). The rrnL gene was 1,344 bp long, while rrnS was 840 bp long. A + T content of the rRNA genes was 83.91%. AT and GC skews were positive (0.029) and negative (0.144), respectively.

The L. salicis mitogenome included 22 tRNA genes, ranging from 64 bp (tRNAHis) to 73 bp (tRNATrp) long. Of these, 14 genes were encoded on the H-strand and eight on the L-strand (Table 1). The tRNA genes were highly A + T biased (82.19%) with a positive AT-skew (0.007) (Table 2). All the tRNAs possessed a typical clover-leaf secondary structure, except tRNASer(AGN), which lacks the dihydrouridine (DHU) arm and forms a simple loop (Fig. 5). Ten of the tRNA genes were each found to have 11 G-U mismatches in their respective secondary structures, which form a weak bond. Ten U-U mismatches were present in the respective amino acid acceptor stems of tRNAGln, tRNATrp, tRNALeu(UUR), tRNAAla, tRNAThr, tRNALeu(CUN), and tRNAVal (Fig. 5). All tRNA secondary structures of the tRNA genes were calculated using the tRNAscan-SE program.

Figure 5
figure 5

Predicted secondary structures of the 22 tRNA genes of the L. salicis mitogenome.

Overlapping and intergenic spacer regions

We identified four overlapping gene sequences, varying from 1 bp to 8 bp, making up 19 bp in total. The longest overlapping region was 8 bp between tRNATrp and tRNACys; there was a 7 bp overlap between atp8 and atp6; 3 bp overlap between tRNAIle and tRNAGln, and 1 bp between tRNAAla and tRNAArg (Table 1).

Intergenic spacers were spread over 18 regions, and ranged in length from 1 bp to 47 bp. The longest (47 bp) contained an A + T-rich region and occurred between tRNAGln and nad2. The 10 bp spacer region between tRNASer (UCN) and nad1 included an ‘ATACTAA’ motif (Fig. 6A).

Figure 6
figure 6

(A) Alignment of the intergenic spacer region between tRNASer (UCN) and nad1 of several Lepidopteran insects. (B) Features present in the A + T-rich region of L. salicis. The ‘ATATG’ motif is shaded. The poly-A stretch is double underlined, and the poly-T stretch is underlined. The single microsatellite T/A repeat sequence is indicated by dotted underlining.

The A + T-rich region

The 325 bp long A + T-rich region of L. salicis was located between the rrnS and tRNAMet genes (Table 1). A + T content in the A + T-rich region was 91.69%, and both AT (−0.248) and GC (−0.408) skews were negative (Table 2). The A + T-rich region did not contain long repeats, though some short repeating sequences scattered over the entire region were present: an ‘ATAGA’ motif followed by an 18 bp poly-T stretch, a microsatellite-like (AT)7 and a poly-A element upstream of the tRNAMet gene (Fig. 6B).

Phylogenetic relationships

We established phylogenetic relationships among 32 insects (Table 3), based on nucleotide sequences of 13 PCGs, using Maximum Likelihood (ML), Neighbor Joining (NJ) and Bayesian Inference (BI) methods. Species clustered by family (Fig. 7A, B and C). Within Lymantriidae, L. salicis was most closely related to G. menyuanensis. Lymantriidae clustered with Erebidae, while Noctuidae clustered with Nolidae. Noctuoidea was most closely related to Bombycoidea in ML and NJ trees, while in the BI tree Bombycoidea was most closely related to Geometroidea. Papilionoidea and Tortricoidea branched together in ML and NJ methods, but were separated from each other in the BI tree.

Table 3 Details of the lepidopteran mitogenomes used in this study.
Figure 7
figure 7

(A) Tree showing the phylogenetic relationships among 32 species, constructed using Maximum Likelihood with 1000 bootstrap replicates. (B) Neighbor Joining (NJ) tree, with 1000 bootstrap replicates. (C) Tree constructed using Bayesian Inference (BI) MCMC consensus tree, with posterior probabilities shown at nodes. Drosophila melanogaster (NC_025936) and Locusta migratoria (NC_002084) were used as outgroups.

Discussion

At the family level, the length of the L. salicis mitogenome (15,334 bp) is marginally smaller than that of Euproctis pseudoconspersa (15,461 bp), but it falls within the range (15,140–16,173 bp) of other known lepidopteran mitogenomes. Gene order and orientation are the same as in previously-sequenced Lymantriidae. Nucleotide BLAST (blastn) result of the entire mitogenome against closely related species revealed that L. salicis has a high similarity with the Lymantriidae species (77% in L. dispar–79% in L. alpherakii). The conserved regions lie in 22 tRNAs and 13 PCGs, while A + T-rich region varies in these species. These remarkable characteristics have been reported in other lepidopteran species7 and could be used as potential markers for identification at genus and species level in recent molarcular techniques. The highly A + T biased nucleotide composition is within the range of previously sequenced lepidopterans (79.64% in L. dispar–81.48% in G. menyuanensis). The positive AT skew (0.043) observed here, indicating the presence of more As than Ts, is similar to that seen in many lepidopterans, including L. dispar (0.014), Rondotia menciana (0.050), and Biston thibetaria (0.064) (Table 2). It is slightly higher than that of other sequenced mitogenomes in Noctuoidea, including Ctenoplusia agnata (−0.023), G. menyuanensis (0.003) and E. pseudoconspersa (0.011). A similar trend has been observed in other lepidopteran superfamilies such as Bombycoidea, where AT skew varies from 0.001 (Sphinx morio) to 0.059 (Bombyx mori)11. In all sequenced lepidopteran mitogenomes, GC skew ranges from −0.268 in G. menyuanensis to −0.155 in Paracymoriza distinctalis (Table 2). The L. salicis mitogenome is moderately skewed (−0.254), showing the presence of more Cs than Gs.

The AT skew value (0.063) of the protein-coding gene region in the L. salicis mitogenome is higher than that of several previously sequenced mitogenomes. Its negative GC skew (0.234) is similar to that seen in other animals. Cox1 is thought to initiate with CGA, as found in other lepidopteran insects12,13. Cox1 and cox2 terminate with a single T, while nad4 terminates with TA. Similar results have been documented in several sequenced lepidopteran mitogenomes, including Artogeia melete14, Phthonandria atrilineata15, Ochrogaster lunifer16, H. cunea17 and Amata emma18. The common termination codon TAA is usually created via post-transcriptional polyadenylation19. The relative synonymous codon usage of the 13 protein-coding genes (PCGs) in L. salicis is consistent with those of published lepidopteran sequences. Similarly, codons with A or T in the third codon position being overrepresented relative to other synonymous codons, is consistent with previous observations of lepidopterans9; likewise the absence or underrepresentation of high-GC codons18,20.

The A + T content (83.91%) of rRNA genes is similar to that seen in Lymantriidae (83.05% in G. menyuanensis). The positive AT (0.029) and negative GC (0.144) skew seen in the L. salicis mitogenome has also been reported in several sequenced lepidopterans (Table 2). For example, H. cunea has a positive AT (0.024) and negative GC (0.137) skew17; and L. dispar also has positive AT (0.023) and negative GC (0.155) skew.

The secondary structure of L. salicis tRNASer(AGN) lacks the dihydrouridine (DHU) arm and forms a simple loop. This has also been observed in several other animal mitogenomes21, including those of insects15,22,23. Ten tRNA genes have 11 mismatches in their secondary structures; most of these are located in the acceptor, DHU and anticodon stems. In addition, tRNACys and tRNASer (UCN) contain an A-A mismatch in the anticodon stem. Unmatched base pairs observed in tRNA sequences can be corrected by RNA-editing mechanisms that are well known for arthropod mtDNA24.

Four overlapping sequences occur in the mitogenome of L. Salicis. The 7 bp overlap between atp8 and atp6 has been documented in several other lepidopteran mitogenomes25,26. The 10 bp intergenic spacer region containing an ‘ATACTAA’ motif, between tRNASer (UCN) and nad1, has also been documented in at least nine other species, suggesting that this region is highly conserved among most of the lepidopteran mtDNAs sequenced to date27.

The length of the A + T-rich region of L. salicis (325 bp) is shorter than those of G. menyuanensis (449), L. dispar (371), H. cunea (357) and B. thibetaria (350), and longer than those of Lista haraldusalis (310) and Choaspes benjaminii (293). Extra tRNA-like structures are often found in the A + T-rich region of lepidopteran mitogenomes. For example Antheraea yamamai has tRNASer(UCN)-like and tRNAPhe-like sequences, each with correct anticodon structure and forming a clover-leaf structure, which suggests that they may be functional, though each has several mismatches in both aminoacyl and anticodon stem regions28. Extra tRNA-like structures have not been seen in L. salicis. The presence of multiple tandem-repeat elements is described as being characteristic of insect A + T-rich regions29. Antheraea pernyi has a repeat element of 38 bp tandemly repeated six times25; and Cnaphalocrocis medinalis has a duplicated 25 bp repeat element25,30. Long conspicuous repeats were not observed in the A + T-rich region of L. salicis, though shorter repeating sequences, an ‘ATAGA’ motif and other features were. These characteristic features have each been found in previously sequenced lepidopteran species27,31,32.

In general, the L. salicis mitogenome contains several features in nucleotide composition, structure of tRNAs and PCGs as well as in the A + T rich region. Particularly in advanced technologies like PCR–RFLP methods3 and DNA barcodes33, these similarities and differences between L. salicis and other insects could be used as potential markers in species identification, especially the differences.

Phylogenetic relationships were established using Maximum Likelihood (ML) Neighbor Joining (NJ) and Bayesian Inference (BI) methods. Species clustered in families, and results were broadly consistent with previous work, e.g. Dong et al.26 and Dai et al.34. Results obtained from our analyses also supported the classification proposed by Fibiger and Lafontaine35, including within Lymantriidae a clade comprised of E. pseudoconspersa, L. salicis, L. dispar and G. menyuanensis. The present analysis showed that within Lymantriidae, L. salicis was most closely related to G. menyuanensis, which is consistent with a recent study on E. pseudoconspersa26. Interestingly, L. dispar is more closely related to G. menyuanensis than E. pseudoconspersa in ML and NJ trees (Fig. 7A and B), whereas in the BI consensus tree L. dispar and E. pseudoconspersa branch together with 0.6406 posterior probabilities (Fig. 7C). We conclude from the above results that differences between BI, ML and NJ methods generate different results on the relationship among different Noctuoidea species.

Because most previous classifications of Lymantriidae species have been based on morphological features, the precise position of Lymantriidae within the Noctuoidea is still unclear. Kitching has suggested that the Lymantriidae are the sister group to a paraphyletic Pantheidae, sharing apomorphies such as the presence of secondary setae in first instar larvae36. Zahiri et al. reclassified the Noctuoidea on the basis of molecular analyses, making the group currently named Lymantriinae a subfamily of Erebidae37. Our results suggest that Lymantriidae can be regarded as a sister group to other families (Erebidae, Nolidae and Noctuidae) in the Noctuoidea, being most closely related to Erebidae that is consistent with previous study of Fibiger and Lafontaine (2005) on higher Noctuoidea classification. They placed the Lymantriidae from a position in front of the Nolidae to a position after Arctiidae to reflect the close association of the arctiids and lymantriids, and moved the Nolidae, Arctiidae and Lymantriidae in front of the upgraded family Erebidae so that their close relationship with the “quadrifids’ is better reflected35. It is concluded that further studies are needed on sequencing and characterization of mitogenomes of the family Lymantriidae that will provide insight to classification of Noctuoidea.

At the level of superfamilies, Noctuoidea was closely related to Bombycoidea in our ML and NJ analyses, while in the BI tree, Bombycoidea was closely related to Geometroidea. Papilionoidea and Tortricoidea branched together in ML and NJ trees, but in the BI tree they formed separate branches, more in line with previous studies. Hepialoidea was the sister group to all other superfamilies, as found previously by Salvato et al.16 and Chai et al.38. While several previous studies have been undertaken on mitogenomes of Noctuoidea, relatively little is known about Lymantriidae specifically. Further taxon sampling within Lymantriidae and related families is required to resolve the placement of Lymantriidae in Noctuoidea.

Materials and Methods

Sample collection and mitochondrial DNA extraction

L. salicis larvae were collected from willow trees within the campus of Anhui Agricultural University, Hefei, China. Total genomic DNA was extracted using the Aidlab Genomic DNA Extraction Kit (Aidlab Co., Beijing, China) according to the manufacturer’s instructions. Quality of extracted DNA was assessed by electrophoresis on a 1% agarose gel stained with ethidium bromide.

Primer design, PCR amplification and sequencing

The full mitochondrial genome of L. salicis was PCR amplified in thirteen overlapping fragments, based on primers that were designed from known mitogenomes of Lymantriidae, and synthesized by Invitrogen Co. Ltd. Shanghai, China (Table 4). All PCRs were performed in a 50 μL reaction volume, including 35 μL sterilized distilled water, 5 μL 10 × Taq buffer (Mg2 + ), 4 μL dNTP (25 mM), 1.5 μL DNA, 2 μL of each primer (10 μM) and 0.5 μL (1 unit) Taq (TaKaRa Co., Dalian, China). PCR conditions were as follows: 4 min at 94 °C, followed by 35 cycles of 30 s at 94 °C, 40 s at 46–58 °C (Table 4), and 1–3 min (depending on putative length of the fragments) at 72 °C; and then a final extension step of 72 °C for 10 min.

Table 4 Details of the primers used to amplify the mitogenome of L. salicis.

All PCR products were visualized by electrophoresis on a 1.0% TAE agarose gel, and purified using a DNA gel extraction kit (Transgen Co., Beijing, China). The purified PCR fragments were ligated into the T-vector (TaKaRa Co., Dalian, China) and transformed into Escherichia coli DH5α, using the manufacturer’s protocol. Recombinants were cultured overnight at 37 °C on Luria-Bertani (LB) solid medium containing Ampicillin (AMP), isopropylthiogalactoside (IPTG) and 5-bromo-4-chloro-3-indolyl-D-galactopyranoside (X-Gal). White colonies carrying insert DNA were selected, cultured overnight in liquid media, and vector inserts were directly sequenced by Sangon Biotech Co., (Shanghai, China).

Sequence assembly and gene annotation

The complete mtDNA sequence was assembled using the SeqManII program from the Lasergene software package (DNAStar Inc., Madison, USA). Sequence annotation was performed using the NCBI’s web interface for BLAST (http://blast.ncbi.nlm.nih.gov/Blast).

Nucleotide sequences of the PCGs were translated into putative proteins based on insect sequences available in GenBank. Initiation and termination codons were identified using an alignment created in ClustalX version 2.0, with other lepidopteran sequences as references. To describe base composition, we analyzed skew as described by Junqueira39: AT skew = [A − T]/[A + T], GC skew = [G − C]/[G + C]. The relative synonymous codon usage (RSCU) was obtained using MEGA 540.

The tRNA genes were verified using the program tRNAscan-SE with default settings41, in addition to using the alignment to visually identify sequences with the appropriate anticodons capable of folding into the typical clover-leaf secondary structure. In the A + T-rich region, tandem repeats were found with the Tandem Repeats Finder program (http://tandem.bu.edu/trf/trf.html)42.

Phylogenetic analysis

A total of 29 sets of 13 PCG sequences were used to perform phylogenetic analysis, including those of L. salicis. Those from other taxa were downloaded from GenBank, with Drosophila melanogaster (U37541.1)43 and Locusta migratoria (JN858212)44 sequences used as an outgroup. Alignments of the 13 concatenated PCGs were conducted using ClustalX version 2.0. Maximum likelihood (ML) phylogenetic analysis was performed using MEGA 5.0 with Tamura-Nei model40. Neighbor Joining (NJ) distance analysis was performed using PAUP4b1045, and Bayesian Inference (BI) MCMC phylogenetic analysis was performed using MrBayes 3.246. The ML analysis was pseudosampled with 1000 bootstrapped datasets. The NJ analysis was done with 1000 bootstrap replicates. The BI analysis used four chains MCMC, running for 1,000,000 generations, with trees being sampled every 1000 generations. The consensus tree was visualized using FigTree v1.4.0 (http://tree.bio.ed.ac.uk/software/figtree/).

Additional Information

How to cite this article: Sun, Y.-X. et al. Characterization of the Complete Mitochondrial Genome of Leucoma salicis (Lepidoptera: Lymantriidae) and Comparison with Other Lepidopteran Insects. Sci. Rep. 6, 39153; doi: 10.1038/srep39153 (2016).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.