Nuclear and mitochondrial genomes of the plum fruit moth Grapholita funebrana

The plum fruit moth Grapholita funebrana (Tortricidae, Lepidoptera) is an important pest of many wild and cultivated stone fruits and other plants in the family Rosaceae. Here, we assembled its nuclear and mitochondrial genomes using Illumina, Nanopore, and Hi-C sequencing technologies. The nuclear genome size is 570.9 Mb, with a repeat rate of 51.28%, and a BUCSO completeness of 97.7%. The karyotype for males is 2n = 56. We identified 17,979 protein-coding genes, 5,643 tRNAs, and 94 rRNAs. We also determined the mitochondrial genome of this species and annotated 13 protein-coding genes, 22 tRNAs, and 2 rRNA. These genomes provide resources to understand the genetics, ecology, and genome evolution of the tortricid moths.


Background & Summary
The plum fruit moth Grapholita funebrana is an important fruit borer from the family Tortricidae of Lepidoptera 1,2 .Larvae of G. funebrana cause damage by boring the fruits of many wild and cultivated stone fruits and other plants in the family Rosaceae, such as apricot, cherry, peach, and plum 3 .This species is native to Europe and currently found in fruit-growing regions of Europe, northern Africa, and Asia 4 .In the orchards, G. funebrana often co-occur with other fruit borers, such as the oriental fruit moth Grapholita molesta (Busck), the codling moth Cydia pomonella, and peach fruit moth Carposina sasakii Matsumura 5 .While many studies have focused on the biology and management of fruit borers, research on G. funebrana is lagging behind [6][7][8][9][10] .In addition, moths from the family Tortricidae are ideal for unveiling the evolution of chromosome fusion 11,12 .While species from the order Lepidoptera often have a conserved chromosome number of n = 31, in the Tortricidae family, many species have a reduced number of chromosomes due to the fusion of chromosome pairs 13,14 .Recent research has found that a common ancestor of the suborders Tortricinae and Olethreutinae diverged from the ancestral lepidopteran chromosome pattern due to a fusion of sex chromosomes with autosomes 15 .The karyotype of tortricid moths was traditionally studied by cytogenetic methods and fluorescence in situ hybridization 15 .Determining the genome sequences will improve understanding of the molecular evolution of chromosomes of tortricid moths 16 .Currently, chromosome-level genomes have been published for the C. pomonella 16 , and G. molesta 17 , as well as many publicly available assemblies for Tortricidae in the GenBank (https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=7139).
In this study, we assembled a chromosome-level genome for the G. funebrana as well its mitochondrial genome using Oxford Nanopore Technologies (ONT) long-read sequencing, Illumina short-read sequencing, high-throughput chromatin conformation capture (Hi-C) sequencing, and RNA-sequencing (RNA-seq).We yielded a nuclear genome assembly of 570.9 Mb, with an N50 of 21 Mb.These high-quality genomes will provide invaluable resources for the study of G. funebrana and in-depth investigation of chromosome evolution on macroevolutionary and microevolutionary levels.

Methods
Material and sequencing.Apricot (Prunus armeniaca) fruits with G. funebrana larvae were collected from Yanqing, Beijing, China, and reared in the laboratory for about 30 days to obtain specimens of different developmental stages.To decrease the effect of heterozygosity, a single larva was used for long-read, short-read, and Hi-C library construction.Single larva, pupa, and adult (unknown sex) were collected for the construction of RNAseq libraries, respectively.All samples were immediately flash-frozen in liquid nitrogen and stored at −80 °C for subsequent experiments.
Genomic DNA was extracted using the Magnetic bead method (Invitrogen, Thermo Fisher Scientific, USA), while RNA was extracted using RNAprep Pure Plus Kit (Tiangen, China), respectively.The quantity of DNA was measured using Qubit 3.0.To generate short-read data for the genome survey, an Illumina library with an insert size of 350 bp was constructed and sequenced on the Illumina NovaSeq 6000 platform.To perform de novo genome assembly, a 15~20 kb ONT library was prepared and sequenced on the ONT platform to generate long-read data.To generate the Hi-C data, tissue from a larva was fixed with paraformaldehyde and digested with restriction enzymes DnpII, generating fragments with sticky ends.These sticky ends were repaired using DNA polymerase and ligated together to form chimeric circles using DNA ligase.The ligated DNAs were then decrosslinked, purified, and sheared into 350 bp insertion size.The Hi-C sequencing library was sequenced on the Illumina NovaSeq 6000 platform to generate 150-bp paired-end reads.Paired-end libraries were constructed using the VAHTSTM mRNA-seq V2 Prep Kit (Vazyme, Nanjing, China) and then sequenced on the Illumina NovaSeq 6000 platform with PE reads of 150 bp for genome annotation.A total of 33.7 Gb Illumina short read, 69.7 Gb ONT long-read, 58.3 Gb Hi-C reads, and 21.9 Gb RNA-seq reads data were generated.The raw data of Illumina reads were filtered by Fastp v0.21.0 18 with default parameters.
Genome survey.Genome survey was performed using a k-mer based method.The k-mer coverage was counted from Illumina short reads using Jellyfish version 2.2.10 19 with parameters: 'count -m 21 -C -s 5 G' .Genome size, heterozygosity, and duplication rate were estimated using GenomeScope version 2.0 20 .The results showed a genome size about 515 Mb, a heterozygosity rate of 1.91%, and a duplication rate of 1.21%.

Genome assembly.
The Nanopore long reads were assembled to the primary set of nuclear genome contigs using NextDenovo v2.5.1 21 with parameters: 'read_cutoff = 1k, genome_size = 400 m, pa_correction = 20, nextgraph_options = -a 1' .The contigs contain 215 sequences, with a size of 594 Mb, and N50 of 6.6 Mb.Due to the high error rate of assembly based on ONT reads, the primary contigs were polished using NextPolish 1.4.1 22 with one round based on long reads and one round based on short reads.To achieve chromosome-level assembly, the polished contigs were anchored into pseudomolecules based on Hi-C reads information.Specifically, the Hi-C reads were mapped to contigs using Chromap 0.2.4 23 with options: "-preset hic-remove-pcr-duplicatestrim-adapters-SAM".The SAM output was sorted by read name and output to BAM format using Samtools v1.17 24 with options: "sort -n -O BAM".Yahs v1.2a.1 25 and Juicerbox 1.22.01 26 were then used for unsupervised and supervised scaffolding, respectively.After scaffolding, most contigs (95.3% contigs and 99.86% base-pairs) were anchored into 28 pseudo-chromosomes (Fig. 1a), consistent with the karyotype of most species in the subfamily Olethreutinae.To fill the gaps between contigs, we performed two rounds of polishing based on long-and short-reads using Nextpolish.The final assembly has a genome size of 570.9 Mb, with a N50 of 21 Mb.The assembled genome is 56.9 Mb larger than the estimated genome size.MitoZ v3.6 pipeline 27 was performed to assembly using Megahit v1.29 28 ("-kmers_megahit 39 59 79 99 119 141-requiring_taxa Lepidoptera") and annotate mitochondrial genome.The mitochondrial genome of G. funebrana was 15,488 bp in length and contain 13 protein coding genes, 22 tRNA genes and 2 rRNA genes (Fig. 1b).
Chromosome feature.The gene number, repeat sequence density, and Guanine-Cytosine(GC) content were calculated in 500 Kb non-overlapping sliding windows using Bedtools v2.30.0 35 .The name of the chromosomes was assigned as lepidopteran ancestral linkage groups 14 , based on homology to Sesia bembeciformis 36 .The homology was detected using LAST 37 alignment.A Circos plot of chromosome feature was generated by TBtools v2.021 38 (Fig. 2a).

Data Records
Illumina, Nanopore, Hi-C, and transcriptome data for G. funebrana genome sequencing have been deposited in the NCBI Sequence Read Archive with accession number SRP482231 39 .The final assembled nuclear genome of G. funebrana has been deposited in the NCBI Genbank with accession number GCA_038095595.1 40 .The mitochondrial genome has been deposited in the NCBI Genbank with accession number PP776023 41 .The genome assembly and annotation files are available in Figshare 42 .

technical Validation
The Hi-C heatmap revealed a well-structured interaction pattern.Short-read sequencing data were mapped to the final assembly with BWA v0.7.17 43 , revealing a mapping rate of 97.7%.The completeness of G. funebrana genome assembly was evaluated using the BUSCO 44 base on the lepidoptera_odb10 database (n = 5286).The completeness of the initial assembly (contig level) was 90.9%, while it increased to 97.7% (97.2% single-copied genes, 0.5% duplicated genes, 0.6% fragmented, and 1.7% missing genes) after polishing with NextPolish 22 (Table 1).We identified 14,547 protein-coding genes, 11,673 of which were functionally annotated.The completeness of the annotated gene set was 95.8% (94.8% single-copied genes and 1.0% duplicated genes, 1.1% fragmented, and 3.1% missing genes).A synteny analysis between G. funebrana and G. molesta 17 was performed using MCSCAN in JCVI package 45 .Strong syntenic blocks were found between the two closely related species (Fig. 2b).All evidence strongly supported the completeness and accuracy of G. funebrana genome assembly.

Fig. 1
Fig.1The interaction heat map of nuclear genome (a), and distribution of genes and read coverage on mitochondrial genome (b).

Fig. 2
Fig.2Chromosome features of Grapholita funebrana genome.(a) Circos plot of GC content, gene count, and repeat content.Chromosomes were labeled using Merian elements according to the homology with the Lepidopteran ancestral linkage groups14 .(b) Synteny blocks between the G. funebrana and G. molesta reveal the same number of chromosomes and highly conserved gene order in the two moths.The chromosomes of two genomes were numbered according to their length.The grey lines show the synteny blocks between two genomes.