Background & Summary

Spiders are considered one of the most diverse terrestrial predators with more than 50,000 species described1. They conquer most area of the planet and are well known for their sophisticated use of silk and venom2. Moreover, spiders are important model for evolution and adaptive radiation given the varied environmental conditions of their habitats3. Taking advantage of the ever expanding genomic information, researchers are able to untangle different aspects of spider biology, with particular interest on venom, silk, and the phylogeny4,5,6,7,8,9,10,11,12,13,14,15,16,17. The increasing numbers of genomes will facilitate the phylogenomic study to unravel the roots of spiders and comparative studies to decipher the morphological and behavioural traits of spiders. However, the high-quality genomes are far from adequacy for systemic studies when the species diversity is concerned. Different from various clades of web-weaver spiders, a number of wandering spiders constitute the retrolateral tibial apophysis (RTA) clade barely building foraging webs. This spider lineage is remarkably diverse with over 25,000 species, including the popular families Salticidae (jumping spiders) and Lyosidae (wolf spiders), and thus plays an important role in spider evolution. However, no high-quality genome has been reported in the RTA clade. We here report the pond wolf spider Pardosa pseudoannulata (Fig. 2b) genome assembly at the chromosome-level to fill the gap. P. pseudoannulata predates on a broad range of insects including planthoppers and leafhoppers18 and represent an important natural enemy in agricultural ecosystem, where they encounter various environmental stresses, such as coldness, drought, and insecticides. Our previous studies have revealed that the cuticle proteins are involved in spiders’ adaptation to stresses19. Taking advantage of the genome assembly, we conducted a systemic analysis of the expanded gene family encoding the cuticle proteins with R&R Consensus (CPR) and evaluated their involvement in spiders’ adaptation to coldness during overwintering and to insecticide nitenpyram exposure.

Through k-mer analysis, the estimated genome was 1.93 Gb and highly heterozygous (Fig. 1, Table 1). Genome sequencing yielded 1032.45 Gb of clean data from PacBio, Illumina, Hi-C, and Iso-seq data (Table 2). The final genome assembly was 2.42 Gb with 94.83% of the sequences assembled into 15 chromosomes (Fig. 2a,b, Table 4). The genome completeness assessed via BUSCO was 96.2%. The size of repeats was 1.28 Gb, accounting for 52.79% of the genome (Table 5). We predicted 24,530 protein-coding genes with the BUSCO completeness 95.8% (Tables 37). The number of GO items and KEGG items was 13,925 and 19,197, respectively. Comparative genomic analysis revealed 265 gene families of significant expansion and 33 of significant contraction (Fig. 3b). Fifteen most significantly expanded gene families with the highest number of genes involved the gene regulation and protein processing including the cuticle protein gene family (Fig. 4, Table 10). Two clusters of Hox genes, one complete and one incomplete, were found in the genome. Cluster A was complete and occurred in a colinear order on chromosome 7 while Cluster B occurred on chromosome 15 with ftz and Hox3 absent (Fig. 2c). We identified totally 243 genes encoding cuticle proteins containing the Rebers and Riddiford (RR) consensus, 24 categorized as RR-1 type and 219 as RR-2 type (Fig. 5). Except for four genes in scaffolds, CPR genes were distributed in all 15 chromosomes where many formed tandem arrays or gene clusters, the majority of which exhibited high sequence similarity. Using the transcriptomic data from overwintering spiders and nitenpyram-treated spiders, we investigated the transcriptional responses of CPR genes to the two stresses. The majority of CPR genes were expressed at higher level in Sept. and Oct., and along with the drop in temperature, a large number of CPR genes were down regulated and significantly to ground level in Nov. Dec. and Jan. (Fig. 6a). However, one gene Papse09523 was upregulated in the course of overwintering though at a medium expression level among all CPRs. In the nitenpyram-exposure spiders, ten CPR genes were noticeable for their different transcriptional levels from those in control spiders (Fig. 6c). Three CPR genes were significantly up-regulated and they all encoded CPRs belonging to RR-1 subfamily. Seven CPR genes were noticeably down-regulated, six of which were located in a gene cluster of 12 genes in chromosome 10.

Fig. 1
figure 1

Genome size estimation of Pardosa pseudoannulata using Illumina reads.

Table 1 Genome survey data.
Table 2 Statistics of the DNA sequence data used for genome assembly.
Fig. 2
figure 2

Characterization of Pardosa pseudoannulata genome. (a) Hi-C heatmap of the P. pseudoannulata genome. (b) Circos plot of the genomic characteristics. A female spider hunting planthoppers in the center, photographed by Dr. Lixin Huang. From the outer ring to the inner ring are the distributions of chromomsome length, GC content, gene density, TEs (DNA, SINE, LINE, LTR), AND simple repeats. (c) Hox genes in P. pseudoannulata genome. Black horizontal lines represent chromesomes and rectacular blocks represent Hox genes. Arrows indicate the direction of genes. (d) Hox genes in Arachnida species.

Table 3 Genome assembly information.
Table 4 Summary statistics of the P. pseudoannulata genome assembly and annotation.
Table 5 Repeat annotation in the Pardosa pseudoannulata genome.
Table 6 Annotations of non-coding RNAs in P. pseudoannulata.
Table 7 Statistics of protein-coding gene annotation.
Fig. 3
figure 3

Phylogeny and gene family evolution of Pardosa pseudoannulata. (a) Comparison of orthologous genes between P. pseudonannulata and 9 other species. Horizontal coordinates represent the number of genes classified into 6 groups (single-copy, multi-copy, species-specific, unassigned, other, and common genes unique to Araneae). (b) Phylogenetic analysis of gene family. The estimated species divergence times (millions of years ago, MYA) are indicated under the tree. Node values indicate gene families showing expansion, contraction, and rapid evolution (red).

Fig. 4
figure 4

GO (a) and KEGG (b) annotation of expanded gene families in P. pseudoannulata.

Table 8 Statistics of gene family in P. pseudoannulata.
Table 9 Statistics of orthologous gene families.
Table 10 Gene family expanson in Pardosa pseudoannulata.
Fig. 5
figure 5

Expansion of genes encoding cuticle proteins with R&R consensus. (a) Schematic representation of the location of CPR genes on chromosomes. The light grey bars represent chromosomes; bars or lines represent the CPR gene clusters. The numbers below each bar or line represent the number of genes within the cluster, and the numbers above the bar or line represent the number of tandemly arrayed genes within the gene cluster, tandem arrays within one cluster was separated with “-”. (b) Phylogenetic analysis of CPR genes in a maximum-likelihood tree. The major genes clusters are indicated in colour.

Fig. 6
figure 6

The expression pattern of CPR genes under environmental stresses. (a) The transcription levels of CPR genes in overwintering spiders. Spiders were collected in nature in five consecutive months. The transcriptional levels of genes were represented as log2(FPKM + 1). (b) The FPKM values of the gene Papse09523 in overwintering samples. (c) CPR genes exhibiting significant difference in transcription after nitenpyram exposure. The transcriptional levels of genes were represented as FPKM.

In summary, we assembled and annotated a chromosome-level genome of P. pseudoannulata here. The expanded cuticle protein gene family is highly related to P. pseudoannulata’s adaptation to varied environmental stresses as a wandering species. The duplicated clusters of Hox genes support the whole genome duplication event during the evolution of spiders. The genome data provide a reliable resource for functional and comparative studies in Arachnida.

Methods

Sample collection and sequencing

Five adult Pardosa pseudoannulata females (NCBI: txid330961) were collected in a paddy field in Jurong City, Jiangsu Province, China in 2020. Legs and muscle attached with cuticle from prosoma were collected and stored at −80°C. Genomic DNA was extracted with QIAGEN DNeasy Blood & Tissue kit. Two short paired-end insert libraries of 350-bp sequences were constructed for survey analysis using a Truseq DNA PCR-free kit and sequenced using the Illumina NovaSeq. 6000 platform. PacBio sequencing was performed with libraries constructed with insert size of 40 kb using a SMRTbell™ Template Prep Kit 2.0 on PacBio Sequel II. In addition, Hi-C sequencing was carried out with muscle from a single female. Muscle tissues were fixed with formaldehyde and lysed. The cross-linked DNA was digested with restriction enzyme MboI, biotinylated and proximity-ligated to form chimeric junctions that were physically sheared into size of 350 bp. Chimeric fragments representing the original cross-linked long-distance physical interactions were then processed into paired-end sequencing libraries, and 150-bp paired-end reads were generated using the Illumina NovaSeq. 6000.

RNA-seq samples were prepared in several individual batches of spiders. First, the Iso-seq sample was prepared by pooling the total RNA from spiders of different developmental stages (egg, juvenile, male and female adults) at equal molar ratio. Second, overwintering samples included spiderings (cephalothoax length 0.30–0.35 cm) collected monthly from paddy fields in Jurong city, Jiangsu Province, China from September 2019 to January 2020 when the average temperature went down from 26.5 to 3.0 degrees Celsius. Three replicate samples were prepared from each month with five individuals per replicated sample. Third, nitenpyram-exposure samples included spiderlings (14–20 days post egg-sac) fed with 1% acetone (solvent control) or 27.9 mg/L nitenpyram (treatment, sublethal dose) for 14 days. Three replicate samples were prepared for each control or treatment. Total RNA was extracted with TRIzol™ Reagent. RNA libraries were constructed with TruSeq RNA v2 kit and sequenced using the PacBio Sequel II platform. PacBio sequencing and Hi-C sequencing, and RNA sequencing, were performed by Berry Genomics (Beijing, China) and BGI Genomics (Shenzhen China).

We obtained 456.40 Gb PacBio DNA reads, 299.95 Gb Illumina DNA reads, 273.23 Gb Hi-C, and 2.87 Gb Iso-Seq data for genome assembly (Table 2).

Genome survey and assembly

The Illumina reads were quality controlled using BBTools suite v38.6720. Duplicates were removed with “clumpify.sh”. Low-quality reads were discarded with “bbduk.sh” (qtrim = rl trimq = 20 minlen = 15 ecco = t maxns = 5 trimpolya = 10 trimpolyg = 10 trimpolyc = 10). The filtered reads were applied to estimating genome size. The k-mer distribution was estimated with khist.sh (BBTools). A 21-mer was selected for k-mer anlaysis with the maximum k-mer coverage cutoff set to 10,000 using Genomescope v2.021. The estimated genome was 1.93 Gb and highly heterozygous (Fig. 1). The size of repeat sequence was estimated to be 619.26 Mb, accounting for 32.10% of the estimated genome (Table 1).

PacBio sequencing reads were first corrected with NextDenovo v2.3.1 (https://github.com/Nextomics/NextDenovo) and then subjected to assembly in Raven v1.3.0 (–weaken -p 0)22. Two rounds of polishing were carried out with Illumina short reads using NextPolish v1.3.023 to improve the accuracy of the assembly. The heterozygous regions of the assembly were removed with two rounds of Purge_dups v1.2.5 (-a 70)24.

The chromosome-level assembly of the genome was generated by incorporating the Hi-C sequences. The Hi-C reads were firstly quality-controlled to remove duplicates and then aligned to the assembly using Jucier v1.6.225. Chromosome-level pseudo-chromosomes were assembled with the defaults using 3D-DNA v18092226. Resulting assembly was manually corrected using Juicebox v1.11.0825 and completed using 3D-DNA. The final genome assembly was 2.42 Gb with 94.83% of the sequences assembled into 15 chromosomes (Fig. 2a,Tables 3, 4).

Genome annotation

A de novo repeat database was initially constructed by using RepeatModeler v2.0.2a27 with LTR searching activated (-LTRStruct). The de novo repeat database was then combined with RepBase-2018102628 to obtain the reference repeat database. The repetitive elements were predicted and masked by searching against the reference repeat database using RepeatMasker v4.1.029. The size of repeats was 1.28 Gb, accounting for 52.79% of the genome (Table 5). The most abundant repeat types were unknown (25.01%), DNA elements (13.52%), simple repeats (6.95%), LTR elements (2.37%), and LINEs (2.16%).

The non-coding RNAs (ncRNAs) were identified by predicting rRNA, snRNA and miRNA against Rfam database using infernal v1.1.430 and annotating tRNA using tRNAscan-SE v2.0.931 in the genome. tRNAs of low confidence were removed using the “EukHighConfidenceFilter” script in tRNAscan-SE. In total, 6,011 ncRNAs were predicted in the genome (Table 6).

Protein-coding genes were predicted via a strategy integrating three methods (ab initio, transcript-based, and protein homology-based evidence) with the repeat-masked genome in MAKER v3.01.0332. Ab initio prediction was carried out by Augustus v3.4.033 and GeneMark-ES/ET/EP 4.68_lic34, both gene finders trained in BRAKER v2.1.535 with the mapped RNA-seq and protein orthologs via ProtHint v2.4.0. RNA-seq data were firstly mapped to the genome assembly using HISAT2 v2.2.136. Using the genome assembly as a reference, the mapped RNA-seq data were assembled into transcripts via StringTie v2.1.637. Lastly, the reference protein sequences from Stegodyphus dumicola (GCA_010614865.1), Parasteatoda tepidariorum (GCA_000365465.3), Argiope bruennichi (GCA_015342795.1), Daphnia magna (GCA_020631705.2), Ixodes scapularis (GCA_016920785.2), Trichonephila antipodiana (PRJNA627506) were downloaded from NCBI database and served as the protein homology-based evidence for MAKER. The structural annotation predicted 24,530 protein-coding genes. The average number of exon per gene is 6.1, with a mean exon length of 262.7 bp and mean intron length of 3,456.9 bp (Table 4).

Predicted proteins were firstly searched against protein databases including UniProtKB (SwissProt + TrEMBL) and NR with a very sensitive mode (–very-sensitive -e 1e-5) using Diamond v2.0.11.14938. 22,345 proteins (91.09%) matched the UniProtKB entries. The predicted proteins were then subjected to domain searching via InterProScan 5.48-83.039 against the databases Pfam40, SMART41, Superfamily42, and Conserved Domain Database (CDD)43. Structural domains of 18,021 proteins were identified through InterProScan. In addition, the predicted proteins were analysed for Gene Ontology (GO) and KEGG pathway annotation via eggNOG-mapper v2.1.544 in eggNOG v5.044. The number of GO items and KEGG items was 13,925 and 19,197, respectively (Table 7).

Phylogenetic analyses and GO/KEGG enrichment analyses

Orthologous gene families were obtained using OrthoFinder v2.3.845 from the protein-coding genes from 10 representative species including five Araneae species (A. bruennichi, P. tepidariorum, P. pseudoannulata, S. dumicola, and T. antipodiana), one Xiphosura (Tachypleus gigas), one Scorpiones (Centruroides sculpturatus), and three Acari (I. scapularis, Galendromus occidentalis, and Tetranychus urticae) (Table 9).

The phylogenetic tree of these ten species were constructed with 805 single-cope genes. The genes were firstly aligned using the strategy L-INS-I in MAFFT v7.39446 and trimmed to remove the region of low-homology using ‘-m BLOSUM90 -h 0.4’ in BMGE v1.1247. The resulting alignments were then concatenated using FASconCAT-G v1.0448. The phylogenetic tree was constructed with the concatenated alignments as supermatrix using IQ-TREE v2.0.749 after removing the genes failing the SRH (stationary, reversible, and homogeneous) model using ‘–symtest-remove-bad–symtest-pval 0.10’. Protein substitution model was set as LG, combined with the partitioning algorithm (-m MFP–mset LG–msub nuclear–rclusterf 10). Node support values were assessed using the ultrafast bootstrap and SH-aLRT (Shimodaira-Hasegawa-like approximate likelihood ratio test) algorithms.

The divergence time was estimated using MCMCTree MCMCTree (clock = 2, BDparas = 1 1 0.1, kappa_gamma = 6 2, alpha_gamma = 1 1, rgene_gamma = 2 20 1, sigma2_gamma = 1 10 1) in the package PAML v4.9j50. Fossil records were retrieved from the PaleobioDB (https://www.paleobiodb.org/navigator/) with Chelicerata (genus Paleomerus) 516.0–541.0 million years ago (mya), Arachnida (Acariformes, Pseudoprotacarus scoticus) 407.6-419.2 mya, and Parasitiformes (Deinocroton draculi) 93.5–145.5 mya.

The likelihood of gene family expansion and contraction was identified using CAFÉ v4.2.151 on the bases of the single birth-death parameter λ and a p-value threshold of 0.01. GO and KEGG enrichment of the significantly expanded gene families was analyzed using clusterProfiler v3.10.1 in R52. Among the genomes of the ten selected species, 186,060 genes were classified into 18,287 gene families, including 805 single-copy gene families and 3,428 multi-copy gene families (Table 8). In P. pseudoannulata, 22,591 out of the 24,530 genes were grouped in 10,741 families with 3,873 species-specific genes in 484 gene families (Fig. 3a,Table 9). P. pseudoannulata exceeded the other spider species in terms of the number of species-specific gene families.

Phylogenetic tree was constructed with 734 single-copy genes with 330,721 amino acid sites after IQ-TREE removed 71 single-copy genes. The UFB/SH-aLRT ratios of the branches in the maximum likelihood tree were all 100/100 with the exception for that of Centruroides sculpturatus and Stegodyphus dumicola-Pardosa pseudoannulata being 98.8/94 and 99.5/98, respectively. The number of gene families experienced expansion and contraction was 1,753 and 3,619, respectively, with 265 gene families of significant expansion and 33 of significant contraction (Fig. 3b). Fifteen most significantly expanded gene families with the highest number of genes involved the gene regulation and protein processing (Table 10). Among them, the cuticle protein and cytochrome P450 genes underwent expansion in P. pseudoannulata, which is likely to be in consistent with its adaptation to environment stresses, as also the case in T. antipodiana11. In addition, GO/ KEGG enrichment analyses of the expanded gene families further underlined the importance of cuticle protein genes as we detected the “structural constituent of cuticle” in GO and the “cutin, suberine and wax biosynthesis” in KEGG pathway enrichment (Fig. 3).

Annotation of Hox genes

According to the annotation method of Hox genes in Argiope bruennichi15, the most complete sequences of the ten arthropod Hox gene classes from spiders were chosen as the Hox gene set. A TBLASN search against the genome assembly was performed to retrieve the Hox genes in P. pseudoannulata. The genomic position of best hits (E-value < 1.00 × 10−20 and identity >60%) were compared with the AUGUSTUS gene predictions for those locations. Only the Hox gene with the longest match-length in the same genomic position was retained. In P. psuedoannulata, the Hox genes have been manually annotated, and their genomic positions were retrieved (Fig. 2c). Two clusters of Hox genes, one complete and one incomplete, were found in P. psuedoannulata, in consistence to the results from three other spiders. One of the two clusters (Cluster A) was complete and occurred in a colinear order on chromosome 7. The other cluster (Cluster B) occurred on chromosome 15 with ftz and Hox3 absent. Genes in Cluster B was divided by non-Hox genes into two subculsters as AbdB-AbdA-Ubx-Antp and lab-pd-Dfd in a reversed order on chromosome 15. In contrast to the complete cluster of Hox genes shared in the four spider genomes, the incomplete cluster occurs in a species-specific manner with ftz absent in P. tepidariorum53, Hox3, ftz, Ubx, and AbdA absent in T. antipodiana11, and AbdA and AbdB switching position in A. bruennichi15 (Fig. 2d). The Hox cluster duplication in P. pseudoannulata supports the whole-genome duplication (WGD) predicted to have occurred in the common ancestor of spiders and scorpions11 as evidenced also in three spiders (P. tepidariorum, A. bruennichi and T. antipodiana) and one scorpion (C. sculpturatus).

Annotation and characterization of cuticle protein genes

We further examined the genes involved in the significantly expanded gene family “Insect cuticle protein” and structural domain analysis suggested that most genes encoding cuticle proteins with Rebers and Riddiford (R&R) consensus (CPRs). Genes encoding CPRs were identified following the method previously described in Li et al.19 with the manual verification of the R&R consensus. The protein sequences of all CPRs were aligned using ClustalW in MEGA6 and a phylogenetic tree was constructed with maximum-likelihood method, the bootstrap value of 1000. FPKM (Fragments Per Kilobase of exon model per Million mapped fragments) values of all CPR genes were retrieved from the transcriptomic data obtained from overwintering samples (NCBI accession: PRJNA907545) and nitenpyram-exposure samples (NCBI accession: PRJNA1015725)54. The log2(FPKM value + 1) was subjected to draw the heatmaps illuminating the transcriptional change of CPR genes in overwintering spiderlings and nitenpyram-exposed spiderlings. CPR genes with FPKM values less than 1 were considered not expressed and were excluded from analyses.

We identified totally 243 CPR genes including all of the 152 genes reported previously19. Twenty-four CPRs were categorized as RR-1 type and 219 as RR-2 type. Except for four genes in scaffolds, the majority of the CPR genes were distributed in all 15 chromosomes where many formed tandem arrays or gene clusters (Fig. 5). The CPR genes were unevenly located in chromosomes, with chromosome 4 being the most CPR-abundant (91 CPR genes), followed by chromosome 2 (45), chromosome 5 (43), chromosome 14 (25), and chromosome 10 (17), while one single CPR gene occurred in chromosomes 1, 6, 9, 11, 13, and 15 (Fig. 5a). The clustering of CPR genes in chromosomes may explain their uneven distribution, which has been reported in many arthropods55,56,57. In P. pseudoannulata chromosome 4, seven CPR gene clusters were present with the biggest cluster containing 59 genes in a tandem array. The most of CPRs clustered into two distinct branches, RR-1 subfamily and RR-2 subfamily, in the phylogenetic analysis (Fig. 5b). Four RR-1 members scattering in the RR-2 branch is probably attributed to their highly conserved flanking sequence with RR-2s. The huge number of closely located genes might be the result of gene duplication and the phylogenetic analysis supported it. The majority of the genes in tandem arrays or clusters exhibited high similarity, forming branches in the ML tree (Fig. 5b). For example, the 59 tandemly located gene in chromosome 4 closely clustered together in the ML tree. Nevertheless, some CPR genes were tandemly arrayed in chromosomes but their corresponding proteins scattered in the ML tree, such as the ones from cluster “Chr05-29-2” and “Chr05-29-4”.

As an essential component of cuticle, CPRs play important roles in the adaptation of arthropods to the habitat changes, especially to environmental stresses. P. psuedoannulata wanders in varied ecosystems including farm fields and faces complex environmental challenges, including low temperature in winter and insecticides exposure. Using the transcriptomic data from overwintering spiders and nitenpyram-treated spiders, we investigated the transcriptional responses of CPR genes to the two stresses. The majority of CPR genes were expressed at higher level in Sept. and Oct., and along with the drop in temperature, a large number of CPR genes were down regulated and significantly to ground level in Nov. Dec. and Jan. (Fig. 6a). The downregulation of CPR genes is in consistent with the observation that P. pseudoannulata did not molt during overwintering. However, one CPR gene Papse09523 was upregulated in the course of overwintering though at a medium expression level among all CPRs (Fig. 6b), and the functions of this gene in cold tolerance triggers further investigation. Interestingly, Papse16785 kept down-regulated during overwintering whereas it was significantly up-regulated after acute 0 °C exposure in our previous work (CPR12)19, suggesting that it might contribute to the different mechanisms underlying short-term and long-term coldness tolerance. In the nitenpyram-exposure spiders, ten CPR genes were noticeable for their different transcriptional levels from those in control spiders (Fig. 6c). Three CPR genes (Papse06745, Papse11013 and Papse17376) were significantly up-regulated and they all encoded CPRs belonging to RR-1 subfamily. Seven CPR genes were noticeably down-regulated and six (Papse11014-Papse11024) were located in a gene cluster of 12 genes in chromosome 10. We previously reported several CPR genes were involved in P. pseudoannulata’s responses to different stresses19. Therefore, CPR genes might respond to chemical stresses in a coordinative way. The present study provides more valuable information on the genomic distribution and transcriptional responses to stresses of CPR genes for further functional studies.

Data Records

The RNA-seq data were deposited in the SRA at NCBI for Iso-seq (SRR19534759)58, overwintering (SRR22498904-SRR22498907, SRR22498911, SRR22498913- SRR22498922)59, and nitenpyram-exposure (SRR26044769- SRR26044774)60. The genome assembly and annotation files are available in Figshare (https://doi.org/10.6084/m9.figshare.24190083)61 and GenBank under the accession JAGEOH00000000062.

Technical Validation

We detected possible contaminant sequences via blastn-like searches against the NCBI nucleotide (nt) and UniVec databases with a sequence identity of 0.8 (‘-min-seq-id 0.8’) using MMseqs. 2 v12-113e363. Sequences with over 80% hits were checked via online BLASTN analysis in the NCBI nucleotide database. Sequences with over 90% hits in the databases were considered contaminants and removed from the assembled scaffolds.

The completeness of assembly was evaluated with BUSCO V5.4.464 searching the arthropoda_odb 10 dataset (n = 1,013). The mapping rate was assessed by mapping the clean reads of Illumina and PacBio sequences to the genome assembly using Minimap2. The genome completeness assessed via BUSCO was 96.2% (Table 3).

On the basis of BUSCO analysis, the predicted protein-coding genes were identified 971 (95.8%) complete, 109 (10.8%) duplicated, 22 (2.2%) fragmented, and 20 (2.0%) missing orthologs (Table 4).