Background & Summary

The spongy moth, Lymantria dispar, is one of the most important forestry pests. It is widely distributed across the temperate forests of the northern hemisphere, such as Europe, China and North America1,2. The larvae are destructive polyphagous folivores, and they consume more than 600 plant species ranging from oaks to conifers3. They completely defoliate entire trees, resulting in significant ecological and economic losses4,5. The spongy moth is divided into Asian (L. dispar asiatica) and European (L. dispar dispar) species based on origin6. Introduced to North America in 1869, the European variant has spread widely over 150 years7. The Asia spongy moth poses a greater threat due to its robust reproductive capacity and flight abilities8. Females are particularly drawn to lights in ports, often laying their egg masses on cargo and the superstructure of ships9. At present, how to effectively control the invasion and spread of the spongy moth has become a global research hotspot10.

Chemical control is the primary control method to combat the spongy moth11. Since the last century, a variety of insecticides have been used for spongy moth control12. Unfortunately, the frequent and extensive use of pesticides not only adversely affects biodiversity but also hastens the development of insecticide resistance. Xenobiotic detoxification is a crucial mechanism enabling insects to resist toxic phytochemicals or pesticides. It depends on the constitutive quantitative changes in the expression and activity of multiple detoxification enzymes, including cytochrome P450s (CYP450s), UDP-glucuronosyltransferase (UGTs), glutathione S-transferases (GSTs) and ATP-binding cassette transporters (ABC) family13. Currently, the development of RNAi insecticides targeting these detoxification genes is the focus of the pest control field. However, the lack of genomic information significantly constrains the identification of effective targets in the spongy moth. Additionally, this deficit impedes the understanding of insecticide resistance mechanisms in the spongy moth from the genomic diversity and evolution perspective.

Here, we constructed the first high-quality chromosome-level reference genome of L. dispar using PacBio long-read sequencing and Hi-C sequencing technologies. The final genome size was 997.59 Mb with N50 sizes of 35.42 Mb, and 991.35 Mb genome sequences were further clustered and ordered into 32 chromosomes. A total of 19,532 protein-coding genes was predicted in the genome of L. dispar asiatica. This chromosome-level genome assembly of L. dispar asiatica provides a valuable genomics resource for investigating its evolutionary dynamics and aiding in the control of L. dispar asiatica.

Methods

Insect rare and sample collection

The egg masses of L. dispar asiatica were obtained from a poplar filed in Harbin, Heilongjiang Province and maintained at 4 °C before hatching. Hatched larvae were fed with an artificial diet at 25 ± 1 °C, 14:10 (L:D) photoperiod and 65 ± 5% relative humidity referring to our previous studies14,15. The 2nd, 3rd, 4th, 5th instar larvae, pupae, and adult moth were collected separately. The samples were frozen in liquid nitrogen and then stored at −80 °C.

Genome sequencing and assembly

Genomic DNA was isolated from a fresh female pupa of Lymantria dispar asiatica using the sodium dodecyl sulfate (SDS) extraction method16. For PacBio long-read sequencing, 8 µg DNA was sheared into fragments of 15–20 kb in length by g-TUBE (Covaris USA) and then purified with AMPure PB Beads. High-fidelity (HiFi) libraries were constructed with SMRTbell Express Template Prep Kit 2.0 and sequenced on Pacbio Sequel IIe platform (Pacifc Biosciences, Menlo Park, USA). A total of 33.02 Gb HiFi reads with N50 sizes of 20,583 bp were obtained using Circular Consensus Sequencing (CCS) mode (Table 1). The PacBio HiFi reads of L. dispar asiatica were de novo assembled by using Hifiasm software v0.19.517,18 with default parameters. The draft genome had a total size of 997.53 Mb containing 102 contigs with N50 sizes of 32.048 Mb (Table 2).

Table 1 Statistics of the HIFI sequence data used for genome assembly.
Table 2 Summary statistics of the Lymantria dispar asiatica genome assembly.

Hi-C scaffolding

To construct Hi-C libraries, the 5th instar female larva of L. dispar asiatica was used as inputs following previously described standard protocols19. In detail, the larva was cut into small pieces and pulverized in liquid nitrogen. The tissues were cross-linked by 4% formaldehyde solution for 30 mins. After quenching the crosslinking reaction with 2.5 M glycine, tissue sample was centrifuged at 2500 rpm at 4°C for 10 mins. The pellet was washed with 500 μl PBS and then centrifuged for 5 min (2500 rpm). Subsequently, the pellet was resuspended in 20 ul of lysis buffer, followed by twice washing with 100 μl ice cold 1x NEB buffer. The nuclei were collected by centrifuging at 5000 rpm for 5 min, resuspended with 100 μl NEB buffer, and solubilized with dilute SDS. After quenching the SDS with Triton X-100, the samples were digested overnight at 37 °C with a 4-cutter restriction enzyme MboI (400 units). The linked DNA was labelled with biotin-14-dCTP and then ligated by T4 DNA polymerase. The ligated DNA was sheared fragments by sonication (200–600 base pairs) and sequenced on Illumina HiSeq-2500 platform (PE 125 bp) with the paired-end module. About 110.96 Gb of raw data were obtained from L. dispar asiatica (Table 3).

Table 3 Statistics of the Hic sequence data used for genome assembly.

The high-quality sequencing reads were filtered by fastp v0.23.420. The cleaned Hi-C reads were then mapped to the draft genome using Juicer v1.621. The unique high-quality paired-end reads were taken as input for 3D-DNA v190716 pipeline22 with parameters “-r 0”. Chromosome interaction matrix was manually adjusted by using JuicerBox v1.11.0821. The Hi-C heatmap was drawn with HiCExplorer v3.7.223. Finally, a total of 32 chromosomes was obtained, which contained 99.38% of the assembled contigs (Fig. 1).

Fig. 1
figure 1

The genome features of Lymantria dispar asiatica. (a) genome-wide Hi-C heatmap of chromatin interaction counts. (b) Circos plot of the 32 chromosomes of Lymantria dispar asiatica. From the outermost layer to the innermost layer, the chromosome length, gene density, repeat density, and GC density are sequentially displayed.

After Hi-C scaffolding, the genome integrity was evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO v5.4.3)24. This analysis revealed that L. dispar asiatica chromosome level assembly contained C:99.4% [S:97.9%, D:1.5%], F:0.2%, M:0.4%, n:1367 (Table 4). The results indicated that the genome assembly of L. dispar asiatica were of considerable contiguity, completeness and accuracy.

Table 4 Statistics of repetitive elements in the Lymantria disapr asiatica genome.

RNA sequencing

Larvae in the 2nd, 3rd, 4th, and 5th instars, along with adult females, were collected for RNA extraction. Total RNA was extracted from each tissue respectively using TRIzol reagent. Sequencing libraries were generated by NEBNext Ultra RNA Library Prep Kit (NEB, USA). The transcriptomes were sequenced on the Illumina Hiseq 4000 platform with PE150 strategy, and a total of 33.99 Gb short-read RNA-seq raw data were obtained (Table 5).

Table 5 Statistics of RNA sequcing data of Lymantria disapr asiatica.

Genome annotation

For insect genome annotation, we mainly referred to a genome annotation pipeline developed by the Institute of Insect Sciences, Zhejiang University25. In detail, a de novo repeat library of insect specialization was firstly constructed using RepeatModeler v2.0.426 and RepeatMasker v4.1.527 for repeat sequence annotation. Then the genome was masked by RepeatMasker v4.1.5 with “-xsmall” parameters. In the genomic sequences, a total of 527.67 Mb (52.89%) repetitive elements were identified, mainly including 28.98% retroelements, 5.94% DNA transposons, 7.82% rolling-circles, and 8.89% unclassified repeat sequence (Table 2). To predict protein-coding genes, homology proteins were obtained from other insect species (downloaded from InsectBase 2.025). Transcriptome data was aligned to the genome using HISAT v2.2.128 and the open reading frame (ORF) was predicted using StringTie v2.2.129 combined with TransDecoder v5.7.030. Both homology proteins and transcriptome-based evidence were as inputs to BRAKER v3.0.331, which containing ab initio gene prediction generated by Augustus v3.4.032 and Genemark-ETP mode v1.033. A total of 19,532 protein-coding genes was predicted in the genome of L. dispar asiatica. Functional annotation of protein-coding genes was evaluated based on eggNOG-mapper v2 (http://eggnog-mapper.embl.de/)34.

Data Records

Illumina, PacBio and Hi-C raw data for L. dispar asiatica genome sequencing have been deposited in the NCBI Sequence Read Archive with accession number SRR2605746935, SRR2603651136 and SRR260463037. Illumina transcriptome data for larvae and adult have been deposited in the NCBI Sequence Read Archive with accession number SRP45959738. The final assembled L. dispar asiatica genome has been submitted to the GenBank database of NCBI with accession number GCA_032191425.139. The annotation file is available in figshare40.

Technical Validation

The completeness of L. dispar asiatica genome assembly was evaluated using the BUSCO (in the insects_odb10 database), and the completeness was 99.40% (97.9% single-copied genes and 1.5% duplicated genes), 0.2% fragmented, and 0.4% missing genes. The Hi-C heatmap revealed a well-structured interaction pattern in and around the chromosome inversion regions, with the notable exception of chromosome 17. This chromosome showed a lower probability of contact compared to others, leading to speculation that it may be associated with the W sex chromosome, which is specific to females. Besides, the mapping rates of short-reads sequencing data exceeds 90%. All evidence strongly supported that the completeness and accuracy of L. dispar asiatica genome assembly.