Chromosome-level genome assembly of the silver pomfret Pampus argenteus

Pampus argenteus (Euphrasen, 1788) is one of the major fishery species in coastal China. Pampus argenteus has a highly specialized morphology, and its declining fishery resources have encouraged massive research efforts on its aquacultural biology. In this study, we reported the first high-quality chromosome-level genome of P. argenteus obtained by integrating Illumina, PacBio HiFi, and Hi-C sequencing techniques. The final size of the genome was 518.06 Mb, with contig and scaffold N50 values of 20.47 and 22.86 Mb, respectively. The sequences were anchored and oriented onto 24 pseudochromosomes based on Hi-C data corresponding to the 24-chromatid karyotype of P. argenteus. A colinear relationship was observed between the P. argenteus genome and that of a closely related species (Scomber japonicus). A total of 24,696 protein-coding genes were identified from the genome, 98.9% of which were complete BUSCOs. This report represents the first case of high-quality chromosome-level genome assembly for P. argenteus and can provide valuable information for future evolutionary, conservation, and aquacultural research.


Background & Summary
Pampus argenteus (Euphrasen, 1788; Fishbase ID: 491), also known as the silver pomfret, is a commercially important fish in the Northwest Pacific area that is widely distributed throughout the South China Sea to coastal Japan, Korea, and Russia 1,2 .It belongs to the family Stromateidae of the suborder Stromateoidei 3 , which was identified in Scombriformes according to a recent phylogenetic study 4 .This species is one of the major fishery species in coastal China, with harvests exceeding three million tons in 2016 5 .Overfishing and environmental changes have resulted in a noticeable decline in P. argenteus fishery resources in recent years 6,7 .The aquaculture of P. argenteus has made substantial progress, which in some ways compensates for the decline in fishery resources 8,9 .However, the industry is still facing many restrictions and issues owing to the high sensitivity of P. argenteus to injury and pathogenic infection during aquaculture and transportation 10 .Due to the medusivorous habit of P. argenteus 11 , its aquaculture greatly relies on fish bait composed of jellyfish and minced fish meat.Using fish bait leads to higher costs in water quality control and risking outbreaks of harmful pathogens, which have become one of the major bottlenecks in P. argenteus aquaculture, necessitating substitution with better formulated feeds 12 .However, the digestive and immune systems of P. argenteus are considered specialized for the digestion of jellyfish and tolerance of medusocongestin 13,14 .The inclusion of jellyfish in an artificial diet can significantly improve the growth performance and survival rate of P. argenteus larvae and juveniles 15 .The impact of changing fish bait to formulated feed on P. argenteus at different growth stages still requires further clarification.Clarifying the genetic basis of the physiological process of P. argenteus, particularly those related to the immune response 16 , intestinal enzyme activities 14 , stress responses 17 , etc., is becoming increasingly important for the future prospects of the aquaculture industry.However, the genome of P. argenteus, which represents the foundation of physiological responses 18 , has not yet been completely sequenced.
In addition to its fishery importance, P. argenteus is considered one of the most advanced species within Stromateoidei 19 .The dorsal and anal fin spines of P. argenteus are reduced into small blades, with a pelvic fin absent from its abdominal region.Stromateoidei is distinct from other Actinopterygii by having a unique pharyngeal sac immediately behind the last gill arch, which functions to fragmentize food 19 .The saccular structure of P. argenteus, which primarily feeds on small crustaceans and medusae, is smaller, more elongated, and densely covered within elongated tooth-like papillae; additionally, this species ably adapts to shredding rubbery tissue of jellyfish 19 .The pharyngeal sac is believed to be a key innovation for stromateiods, while the specialized shape of pharyngeal sac in the genus Pampus might bring further advantages and lead to its broad success in the Indo-Pacific region 19 .Clarifying the genetic basis for the formation of the pharyngeal sac is crucial to understanding the evolution of the genus Pampus.
In this study, a high-quality chromosome-level genome assembly of P. argenteus was generated by integrating multiple sequencing technologies, including Illumina sequencing, PacBio circular consensus sequencing (CCS), and Hi-C techniques (Fig. 1).The final assembly size for the P. argenteus genome was 518.06 Mb, with 97.30% of the contigs anchored to 24 chromosomes (Table 1 & Fig. 2).The contig and scaffold N50 lengths for the genome were 20.47 and 22.86 Mb, respectively.The genome consisted of 13.45% repeated sequences and 17.18% nod-coding genes.A total of 24,696 protein-coding genes were predicted, 93.38% of which were functionally annotated.Compared to the Pampus genome reported by AlMomin et al. 20 , the genome of P. argenteus generated herein was assembled into fewer and longer contigs and scaffolds (Table 1).More genes and repetitive regions were identified from this genome, with an average protein-coding gene length 7.5-fold greater than that of the previous version 20 .These results suggested that the genome developed in this study has a much higher integrity and quality.The chromosome-level genome assembly of P. argenteus will provide valuable information for establishing effective molecular markers for future conservation and aquaculture goals.The genome also represents the first case of high-quality chromosome-level genome assembly for stromateoids; this information could be an important reference for whole-genome sequencing of its close relatives, and, foreseeably, could become one of the foundations for exploring the genomic evolution and phylogeny of the Stromateoidei.

RNA-Seq data
Fastp Filter Filter Filtered Fig. 1 Workflow overview for the P. argenteus chromosome-level genome assembly.

Methods
Sample collection.In 2021, a female P. argenteus specimen was caught from the wild for whole-genome sequencing using a fishing boat in Shengsi, Zhejiang Province, China.The specimen was identified based on the morphological descriptions of P. argenteus in Liu et al. 3 , who designated the P. argenteus neotype.Eye, muscle,

This study
AlMomin et al. 20 Technologies  ovary, heart and liver samples for DNA and RNA sequencing were isolated from the specimens immediately after they were caught.The samples were washed three times with phosphate-buffered saline (PBS), frozen in liquid nitrogen for three hours, and subsequently stored at −80 °C until DNA extraction.All the experiments were conducted under the approval and regulations of the Institutional Animal Care and Use Committee of the Institute of Oceanology, Chinese Academy of Science.
Library construction, sequencing and data preparation.The Illumina, PacBio HiFi, and Hi-C data were obtained and used for generating a chromosome-level genome assembly of P. argenteus.For Illumina sequencing, total genomic DNA was isolated from muscle samples using the cetyltrimethylammonium bromide (CTAB) method 21 .The quality of the extracted DNA was assessed using a Qubit 2.0 (Thermo Fisher Scientific, USA) and a NanoDrop ® Series (Thermo Fisher Scientific, USA).For Illumina sequencing, a short-fragment library with an insert size of 300-500 base pairs (bp) was prepared using the NEBNext ® ΜLtra ™ DNA Library Prep Kit (New England Biolabs, USA) following the manual instructions.The library was purified with AMPure XP Beads (Beckman Coulter, USA) and then subjected to sequencing on an Illumina NovaSeq 6000 platform (Illumina, USA) to generate 150-bp paired-end (PE150) reads.After filtering in Fastp (v0.20.0) 22, a total    Fig. 3 17-mer frequency distribution in the P. argenteus genome, the numbers of k-mers of each sequencing depth are indicated.of 75.52 Gb of clean Illumina PE150 data were obtained, with Q20 and Q30 being 97.2% and 92.5%, respectively (Table 2).For PacBio CCS, total genomic DNA total genomic was extracted from muscle samples using the sodium dodecyl sulfate (SDS) method 23 .The high-molecular-weight gDNA was sheared to 8-10 kb using g-TUBEs (Covaris, USA).The HiFi library was then prepared using the SMRTbell prep kit 3.0 and sequenced in CCS mode on the PacBio Sequel II system (Pacific Biosciences, USA) following the manufacturer's protocols.After the removal of low-quality reads and adaptors from the raw data, 63.80 Gb of clean HiFi data was retained, with Q20 and Q30 values of 99.9% and 54.78%, respectively (Table 2).Hi-C library preparation was performed with muscle tissue using a Frasergen Hi-C Kit (Frasergen, China) following the protocol instructions, which included crosslinking, lysis, fragmentation, repairing, biotin labeling, ligation, extraction, purification, and library construction.All the purification steps were performed using AMPure XP beads (Beckman Coulter, USA), while the biotin-labeled DNA was enriched with Pierce ™ Streptavidin Magnetic Beads (Thermo Fisher Scientific, USA).The library was assessed with an Agilent 2100 Bioanalyzer (Agilent, USA) to determine a sufficient concentration and an insert size of 300-800 bp.The Hi-C library was subjected to sequencing on an Illumina HiSeq X Ten platform (Illumina, USA) to generate PE150 reads.After filtering in Fastp 22 , a total of 138.39 Gb of clean Hi-C data were obtained, for which the Q20 and Q30 were 96.57% and 90.54%, respectively (Table 2).
To assist in gene prediction, muscle, eye, ovary, heart, and liver tissues were pooled to obtain the transcriptome of P. argenteus.Total RNA was extracted from the pooled sample using a TRIzol reagent kit (Invitrogen, USA) following the manufacturer's instructions.The quality and concentration of the extracted RNA were assessed using a NanoDrop ® Series (Thermo Scientific, USA) and an Agilent 2100 Bioanalyzer.For RNA-seq data, three cDNA libraries (i.e., Pa-op1, Pa-op2, and Pa-op3; Table 3) were prepared via total RNA extraction using the NEBNext ® Ultra ™ RNA Library Prep Kit (New England Biolabs, USA) and subsequently subjected to sequencing on an     Illumina NovaSeq USA).After filtering via Fastp 22 , a total of 20.91 Gb of clean RNA-seq data were obtained from the five tissue samples (Table 3).For isoform data, a single cDNA library was reverse transcribed from the total RNA using the Clontech SMARTer PCR cDNA Synthesis Kit (Takara Bio, USA) following the manufacturer's instructions.The PCR products were purified using AMPure XP Beads (Beckman Coulter, USA) and used for SMRTbell library construction via the SMRTbell Prep Kit 3.0.The library was sequenced and   processed with the PacBio Sequel II system (Pacific Biosciences, USA).After filtering, a total of 34.96 Gb of isoform data were obtained (Table 3).The reference genome and protein-coding gene data of closely related species of P. argenteus [i.e., Dunckerocampus dactyliophorus (Bleeker, 1853) 24 , Hippocampus zosterae Jordan & Gilbert, 1882 25 , Scomber japonicus Houttuyn, 1782 26 , Thunnus albacares (Bonnaterre, 1788) 27 , and T. maccoyii (Castelnau, 1872) 28 ] were downloaded from GenBank and subsequently used for gene prediction and comparisons.

Genome survey.
A survey of the P. argenteus genome was performed using the k-mer method.K-mer analysis was conducted using jellyfish (v2.2.6) 29 with 75.52 Gb of Illumina data and the best K value of 17.After the removal of abnormal k-mers, 60,502,700,002 k-mers were yielded with a k-mer peak at a depth of 126.64 (Table 4 & Fig. 3).The genome size, heterozygosity rate, repetition rate, and GC content estimated from GenomeScope (v1.0.0) 30 were 463.10 Mb, 1.55%, 29.89% and 39.45%, respectively (Table 4).
Protein-coding gene prediction and annotation.The protein-coding genes were predicted based on four different strategies, namely, RNA-seq-based, isoform-based, homology-based, and de novo predictions.

Data records
The Illumina (SRR27308594), PacBio HiFi (SRR27308592-SRR27308592), Hi-C (SRR27308591), RNA-seq (SRR27308587-SRR27308589) and isoform (SRR27308590) data used for the genome assembly of P. argenteus were deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) under sequence read project SRP479325 78 .The chromosome-level assembly of the P. argenteus genome was deposited in the NCBI genome database under accession number GCA_036321115 79 .The chromosome assembly of P. argenteus, genomic annotation results, and software settings can be found in the figshare database 80 .

Technical Validation
Evaluation of the genome assembly and annotation.The quality of this chromosome-level genome assembly was assessed using the following three criteria: (i) the mapping rate of Illumina PE150 reads, (ii) the Core Eukaryotic Genes Mapping Approach (CEGMA) 81 , and (iii) the Benchmarking Universal Single-Copy Orthologs (BUSCO) assessment 82 .In brief, 99.30% of the Illumina PE150 reads could be aligned to the P. argenteus genome using BWA (v0.7.12) 34 , for a coverage rate of 99.95%, which indicates high mapping efficiency and sufficient coverage.A total of 230 (92.74%) of the 248 highly conserved core genes for eukaryotes provided in CEGMA could be completely aligned with their homologous genes in the P. argenteus genome.In BUSCO (v4.1.2) 82 , 98.90% of the complete BUSCOs were detected in the P. argenteus genome, whereas fragmented and missing BUSCOs only comprised 1.08% of the total orthologs.This evidence indicated the high integrity and quality of the obtained chromosome-level genome assembly of P. argenteus.

Fig. 2
Fig. 2 (a) A photo of P. argenteus; (b) Circos plot indicating gene density, repetitive sequences, GC content, and colinear relationship among chromosomes of the P. argenteus genome assembly.

Fig. 6
Fig. 6 Venn diagram indicating number of genes annotated by different gene databases.

Table 1 .
Comparison 20 the Pampus genome assemblies in this study and the study of AlMomin et al.20.

Table 2 .
Sequencing data used for the P. argenteus genome assembly.

Table 3 .
The transcriptomic data of P. argenteus used for gene prediction.The raw isoform data refers to the subread data generated in PacBio CCS, while its clean data is the CCS reads generated from the subreads.

Table 5 .
Repeat sequences of the P. argenteus genome annotated using different methods.Total length (Mb) and percentage (within bracket) of the P. argenteus genome for each type of repeat sequence are shown.

Table 6 .
Information of different types of non-coding RNA genes identified in the P. argenteus genome.

Table 7 .
Genes predicted in the P. argenteus genome using different methods.

Table 8 .
Gene function annotation statistics of the assembled genome for P. argenteus.