Nuclear and mitochondrial genomes of Polylopha cassiicola: the first assembly in Chlidanotinae (Tortricidae)

Tortricidae is one of the largest families in Lepidoptera, including subfamilies of Tortricinae, Olethreutinae, and Chlidanotinae. Here, we assembled the gap-free genome for the subfamily Chlidanotinae using Illumina, Nanopore, and Hi-C sequencing from Polylopha cassiicola, a pest of camphor trees in southern China. The nuclear genome is 302.03 Mb in size, with 36.82% of repeats and 98.4% of BUCSO completeness. The karyotype is 2n = 44 for males. We identified 15412 protein-coding genes, 1052 tRNAs, and 67 rRNAs. We also determined the mitochondrial genome of this species and annotated 13 protein-coding genes, 22 tRNAs, and one rRNA. These high-quality genomes provide valuable information for studying phylogeny, karyotypic evolution, and adaptive evolution of tortricid moths.


Background & Summary
Tortricidae, the leafroller moths, is one of the largest families of Lepidoptera (butterflies and moths) 1 , including numerous notorious economic pests such as the spruce budworm, Choristoneura fumiferana 2 , oriental fruit moth Grapholita molesta 3 and codling moth, Cydia pomonella 4 .The two main subfamilies are Tortricinae and Olethreutinae, which are relatively young 5 , comprising over 95% of tortricid species.Genomes of many species in these two subfamilies have been determined 6 , revealing an ancestral sex chromosome-autosome fusion and two subsequent autosome fusions relative to the ancestral karyotype of Lepidoptera 7 .Compared to the two successful subfamilies, the relict subfamily Chlidanotinae is much more limited in distribution range, host range, species richness, and population size.Species of this subfamily are mainly distributed in tropical regions, indicating varied climatic adaptability compared to species of the other subfamilies.Thus, this group can provide valuable insights into the phylogeny and pest adaptation and evolution of Tortricidae.However, no genome has been assembled for species of Chlidanotinae.
Here, we present the first chromosome-level genome assembly and annotation in the Olethreutinae using high-coverage long-read and Hi-C sequencing from Polylopha cassiicola 8 .This species is mainly distributed in the southern coastal regions of China and Southeast Asia.It is a pest of trees Cinnamomum cassia and C. camphora.We also assembled the mitochondrial genome of this species from the Illumina short sequencing reads.These genomes are expected to provide information for understanding the phylogeny, karyotypic, and adaptive evolution of Tortricidae.

Methods
Sample collection and sequencing.P. cassiicola larvae were collected from the tops of C. camphora in Guangxi, China.The larvae were reared in the laboratory to pupae and adults for genomic and transcriptome sequencing.Three individuals were used for three types of genome sequencing: one male pupae for Nanopore long-read sequencing, one male pupae for Illumina short-read sequencing, and one female adult for Hi-C sequencing.In addition, four larvae were used for RNA sequencing.Nucleic acid extraction and sequencing libraries was contracted by BerryGenomic (Beijing, China).Methods for nucleic acid extraction, platforms for sequencing, and sequencing outputs are provided in Table 1.

Genome assembly.
The Nanopore long reads were assembled into 76 contigs using NextDenovo 2.5.2 (https://github.com/Nextomics/NextDenovo)with parameters: "read_cutoff = 4k, genome_ size = 400 m, nex-tgraph_options = -a 1".Redundant sequences in contigs were removed using Purge_dups 9 .The cleaned contigs containing 65 sequences were then assembled to chromosome-level using Hi-C information.In this analysis, we mapped the Hi-C reads to cleaned contigs using BWA 10 with options: "mem -SP5", anchored contigs using YaHS 1.2a.1 11 with option: "-e GATC", and manually adjusted using Juicerbox 1.22.01 12 .We removed the contigs that did not have any contact information with the chromosomes, which could be from potential contamination, such as symbiotic microbes.At last, the chromosomal-level genomic sequences were subjected to two rounds of long-read polishing and two rounds of short-read polishing using Nextpolish 1.4.1 13 .The obtained P. cassiicola genome is 302.03Mb in size and contains 21 autosomes and one Z sex-chromosome (Fig. 1a).
We also assembled mitochondrial genome using MitoZ 3.6 14 based on the short-reads.In the mitochondrial genome, we identified 13 protein-coding genes, 22 tRNAs, and 1 rRNA (Fig. 2).
Gene prediction and functional annotation.Gene structure was predicted using an ab initio method, Helixer 21 , with options: "-subsequence-length 320760-batch-size 6", and with a pre-trained model for invertebrate "invertebrate_v0.3_m_0200".Gene function, Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) items for predicted genes were annotated using eggNOG-Mapper 22 web tools, against the egg-NOG Database 5. A total of 15412 protein-coding genes were predicted, in which 12671 genes were functionally annotated.

Data Records
The Nanopore reads, Illumina reads, Hi-C reads, and RNA reads for P. cassiicola genome assembly were deposited at NCBI under Sequence Read Archive under accession number SRP479759 23 .The nuclear and mitochondrial genome assemblies were deposited in Genbank under accession number GCA_038024825.1 24 .The genome annotation files are available in Figshare 25 at https://doi.org/10.6084/m9.figshare.24902046.

technical Validation
To validate the accuracy of the final genome assembly, we mapped the Illumina short reads and Nanopore long reads to the P. cassiicola genome using Minimap2 26 with option "-ax sr" for short reads and option "-ax map-ont" for long reads.The mapping rates for the short reads and long reads were calculated using Samtools 27 .Analysis revealed 96.38% and 98.73% mapping rates for the short and long reads, respectively.We examined the coverage of short reads along the mitochondrial genome and showed 100% coverage (Fig. 1b).
Completeness of the assembly and gene prediction were evaluated using BUSCO 5.4.7 28 with lepidop-tera_odb10 database.In this analysis, BUSCO examined the states and proportions of 5,286 single-copy orthologous of Lepidoptera in our genome assembly: single-copy (S), duplication (D), fragment (F), and missing (M).The analyses showed completeness ranging 95.1%-98.4% for each assembly stage (Table 3), and 97.8% for predicted gene set: "C: 97.8% [S: 97.2%, D: 0.6%], F: 0.9%, M: 1.3%".Quality of gene prediction was manually evaluated using RNA-seq data.Specifically, RNA-seq reads were mapped to the genome using Hisat 29 and Samtools 27 .We imported the obtained BAM file and annotation file into the IGV browser 30 .Based on manual examination, we found that the machine learning-based annotation method

Fig. 1 Fig. 2
Fig. 1 Genomic feature of nuclear genome of Polylopha cassiicola.(a) Hi-C contact matrix of 22 putative chromosomes.(b) Synteny among four tortricid species from four subfamilies and an outgroup.The labels at the bottom marked the ancestral linkage groups of Lepidoptera 6 .

Table 1 .
Methods and outputs for sequencing experiments.NA, not available.

Table 2 .
Statistics of repeat elements and non-coding RNAs in Polylopha cassiicola genome.SINEs, short interspersed nuclear elements; LINEs, long interspersed nuclear elements; LTR, long terminal repeat.has predicted a near-complete gene structure.These results indicate that we have obtained a high-quality assembly and annotation for P. cassiicola genome.