Introduction

The Pacific saury (Cololabis saira) is a small pelagic fish with a wide distribution across the North Pacific Ocean, spanning from Japan’s east coast to the United States’ west coast1,2. This species holds significant economic importance in countries and regions bordering the Northwest Pacific, including Japan, Korea, Russia, Vanuatu, China, and Chinese Taiwan3. Small pelagic fishes are crucial components of marine ecosystems and are crucial links in the food chain4,5. Previous research efforts have focused on various aspects of the Pacific saury, including its life history, migratory patterns, population dynamics, distribution in fisheries, responses to environmental changes, and the study of mitochondrial genes6,7,8,9,10. Despite this, genomic resources for the Pacific saury remain limited. The absence of systematic genome resources has impeded our understanding of the species’ evolutionary history, potential adaptive traits, and the genetic diversity within its populations.

Previous studies have suggested that immune regulation may play a pivotal role in the migratory behaviors of fish11,12. The Pacific saury is renowned for its extensive, seasonal, and wide-range migrations13. During these migrations, the species follows a significant route that takes it from subtropical waters through the complex and ever-changing environment of the Kuroshio-Oyashio transition zones, allowing it to reach the subarctic waters. In autumn, the Pacific saury migrates from the subarctic back to the subtropical waters1. This extended migration exposes the fish to diverse marine viruses and parasites14. Studies have reported that parasites can easily infect Pacific saury during this migration7,15. Moreover, the prolonged movement challenges the species’ antioxidant capacity16. However, the precise mechanisms by which the Pacific saury adapts to its long-distance migration remain unclear. Studying the critical genes associated with migration adaptations in Pacific saury is crucial for understanding the species’ evolutionary process17.

High variability in the biology of Pacific saury has been observed across different geographic regions18,19. For example, Suyama, et al.18 identified two spatially separated groups from the east and west using the first otolith annual ring. Li, et al.19 applied otolith shape analysis to differentiate two otolith morphologies in eastern and western Pacific saury through cluster analysis, suggesting the presence of two distinct geographic groups. By contrast, the genetic diversity of Pacific saury is generally low, and there is no significant differentiation at the population level based on mitochondrial genes9,20. However, the mitochondrial genome has a limited number of genes and is maternally inherited, which has insufficient testing power for studying population differentiation21. Genome-wide SNPs are widely distributed, genetically stable, and highly representative22. They provide more informative loci that can accurately delineate the population genetic structure of Pacific saury23. Nevertheless, population genetic studies of Pacific saury based on whole-genome variations are still in their early stages.

In order to explore mechanisms underlying the adaptation of Pacific saury migratory lifestyle and to examine the genetic diversity, we have performed the genome and population analysis for the species. Firstly, we sequenced the genome of Pacific saury with the PacBio HiFi and Hi-C technologies, resulting in a phased and near-complete genome assembly, which allowed us to investigate the species’ phylogenetic placement and identify potential functional genes that contribute to its migratory adaptability. Secondly, we generated deep whole-genome resequencing data for 80 Pacific saury individuals from eight sites in the North Pacific Ocean. By leveraging whole-genome variations, we explored Pacific saury’s genetic diversity and population structure.

Results

Genome assembly, phasing and annotation

We initiated the genome assessment and de novo assembly using a Pacific saury specimen (Supplementary Table 1). Our K-mer analysis estimated the genome size of Pacific saury to be 1296 Mb, characterized by a 3.19% heterozygosity and a 60.42% repeat content (Supplementary Table 2 and Supplementary Fig. 1). PacBio sequencing generated a dataset of approximately 49.4 Gb Circular Consensus Sequencing (CCS) HiFi reads (Supplementary Table 3), resulting in a sequencing depth of roughly 38x. Using HiFiasm and the CCS reads, we assembled one draft genome comprising 3833 contigs, with size of 2152 Mb (Supplementary Tables 4 and 5). The assembly size is roughly twice the K-mer estimate, suggesting the successful assembly of two haploid genomes for Pacific saury. This observation was confirmed by the genome-wide BUSCO analysis, revealing that 94.95% of BUSCO are duplicated (Supplementary Tables 5 and 6). Using whole-genome sequencing (WGS) short reads data and HiFi CCS data, 99.81% of WGS short reads and 100% HiFi CCS data were mapped to our genome assembly, and these reads covered 99.5% and 99.98% of the total assemble (Supplementary Tables 7 and 8).

To further enhance the assembly quality and assemble the contigs into chromosome-level structures, we employed 130.79 Gb of Hi-C data to assess contact frequencies among contigs. This effort enabled us to assign the majority of contigs to 48 chromosomes, resulting in an impressive chromosome anchoring rate of 96.39% (Supplementary Tables 9 and 10, Supplementary Fig. 2). Consequently, we successfully generated two chromosome-level haploid genomes for Pacific saury and arbitrary divided the chromosomes into two haploid genomes, resulting in two haploids with sizes of 1103 Mb and 1072 Mb, respectively (Fig. 1, Supplementary Tables 11). For the purposes of subsequent analyses, we referred to the two haploid genomes of Pacific saury as CSA (1103 Mb) and CSa (1072 Mb). From 3,565 single-copy orthologues, 97.94% and 97.64% of complete and single-copy BUSCOs were found in the CSA and CSa genomes (Supplementary Tables 5 and 12). The estimated genome consensus quality (QV) for two haploid genomes was above 40 (Supplementary Tables 5).

Fig. 1: The genome features of Pacific saury using sliding windows of 1 Mb length.
figure 1

A GC content distribution, measured by the GC proportion of each 1Mb-window. B gene distribution, measured by the proportion of gene sequences in each 1Mb-window. C Repeat content, measured by the proportion of repeat regions in each 1Mb-window. D LINE distribution, measured by the proportion of LINE in each 1Mb-window. E Collinearity of two haploid genomes.

The repeat element prediction revealed that approximately 63.1% of the content in the two haploid genomes consisted of repetitive sequences, primarily comprised of DNA, LINE, and LTR repeat elements (Supplementary Fig. 3, Supplementary Tables 13 and 14). Our gene annotation, using homolog-based, de novo-based, and RNA-seq-based gene annotation approaches, successfully predicted 44,823 protein-coding genes within the two haploid genomes (Supplementary Tables 5). To note, out of all protein-coding genes, 43,635 were effectively functionally annotated through homologous searches against publicly available databases, accounting for 97.35% of the total genes (Supplementary Table 15).

Gene family, phylogenetic and collinearity analysis, and gene evolution

For the comparative genomic analysis, we focused on CSA of Pacific saury, along with Zebrafish (Danio rerio), Chinook salmon (Oncorhynchus tshawytscha), Medaka (Oryzias latipes), Yellowtail kingfish (Seriola lalandi) and Yellowfin tuna (Thunnus albacares). Genes from these six species were categorized into 14,469 gene families, and 11,825 gene families were shared among all species. Additionally, we identified 110 gene families exclusive to Pacific saury (Supplementary Table 16). The phylogenetic relationship among these species was investigated using 3068 single-copy genes, revealing that Pacific saury diverged from their common ancestor of Medaka ~64.9 Ma ago (Fig. 2A). Furthermore, the collinearity analysis of protein-coding genes along chromosomes illuminated a strict chromosome karyotype conservation between Pacific saury and Medaka (Fig. 2B).

Fig. 2: Phylogenetic relationship and genome conserved syntenies for Pacific saury.
figure 2

A phylogenetic trees and gene family contractions and expansions. Green numbers indicate the number of expanding gene families, and red numbers indicate the number of contracting gene families. Blue numbers indicate the divergence time between branches, and the numbers in parentheses indicate the divergence time supported by 95% of HPD (highest posterior density). B diagrams showing Pacific saury (Cololabis saira) chromosome synteny relations with Medaka (Oryzias latipes).

Gene family analysis showed that Pacific saury exhibited expansion in 223 gene families, while 2131 gene families showed contraction (Fig. 2A). The significantly expanded gene families were primarily associated with functions related to zinc ion binding and DNA repair (Supplementary Fig. 4 and Data 1). Notably, those expanded gene families were also enriched in immunity-related pathways, including phagosomes and programmed necrosis (Supplementary Fig. 4 and Data 2). Gene families for igh, hla-a, and gpr35 genes showed a significant expansion in the Pacific saury genome. (Supplementary Table 17).

Within the expanded gene families, the gpr35 gene stood out with seven copies located in Chr9 of the CSA genome. In addition, we identified another four copies in the homologous region of the CSa genome, further confirming the expansion of the gpr35 gene family in the Pacific saury genome. The expanded gpr35 genes in both CSA and CSa exhibited a high degree of homology with closely related fish species (Fig. 3A). Furthermore, our analyses revealed that gpr35 is a single-exon gene residing within a LINE transposon. This finding suggests that the expansion of gpr35 genes in the Pacific saury genome possibly resulted from LINE-mediated tandem duplication. Phylogenetic analysis on gpr35 genes from GSA/GSa of Pacific saury and other fish species, demonstrated that all Pacific saury gpr35 genes form a single clade, indicating that the expansion of the gpr35 gene family occurred after the specie divergence from other related fish species (Fig. 3B).

Fig. 3: Conserved syntenies and phylogeny for gpr35. A, gpr35 gene loci.
figure 3

The blue region represents the gpr35 locus. Different colored arrows represent different genes and the direction of the arrow indicates the direction of transcription. B phylogenetic tree of the 17 vertebrate gpr35 genes. Different branches of the tree are highlighted by different colors.

From a pool of 3068 single-copy genes, we successfully identified 610 positively selected genes with signifcant false discovery rate (FDR)-corrected p values (<0.05) in Pacific saury (Supplementary Data 3), which were predominantly associated with key biological pathways such as DNA repair, homologous recombination, Fanconi anemia, peroxisome, and mismatch repair (Fig. 4, Supplementary Table 18 and Data 4). In addition, these genes were found to be closely linked to processes such as RNA polymerase I formation, monosaccharide catabolic processes, and DNA replication (Supplementary Fig. 5 and Data 5).

Fig. 4: Schematic diagram of the positively selected gene in related pathways.
figure 4

A Fanconi anemia pathway. B DNA replication pathway. C Homologous recombination pathway. The gene with a red background indicates positively selected genes.

Population genetic analysis

We used 80 samples collected from eight sites for whole-genome resequencing (Fig. 5 and Fig. 6A, Supplementary Table 1). The mapping rate for each sample ranged 85.72% to 88.80%, and the mean mapping depth was determined to be 8 folds (Supplementary Table 19). Whole-genome SNPs and InDels were detected and filtered according to details in the Materials and Methods section. This process yielded 1,633,773 SNPs, which were subsequently used in our analyses (Supplementary Table 20). Firstly, we calculated the genome-wide FST among sites and showed that there was barely any population structure of Pacific saury (FST < 0.005) (Fig. 6B). Combined with Pacific saury migration routes (Fig. 6A), clustering using site-wise FST allowed us to categorize the eight sites into two distinct groups (Fig. 6B, C). This suggests the presence of underlying genetic differentiation among populations from the various sampling sites. Based on this clustering analysis, we have referred to the populations at sites S1, S2, S3 and S5 as the ‘west group’, and the populations at sites S4, S6, S7, S8 as the ‘east’ group.

Fig. 5: Distribution of de novo sequenced (triangles) and re-sequenced (circles) samples.
figure 5

The colors of the background represent the depths of the ocean.

Fig. 6: Population genetic analysis of eight sites of Pacific saury.
figure 6

A Distribution of eight sampling sites for Pacific saury. The black dashed line indicates the presumed different migratory routes of Pacific saury13. B FST values among sites based on whole-genome SNPs. Eight sites were clustered using the “ward. D2” method in “Pheatmap” function based on the Euclidean distance. The eastern group (I) is shown in blue, and the western group (II) is shown in red. C PCA within eight sites from FST values. PC, principal component.

We also performed selective sweep analysis to reveal differentiated genomic regions that displayed differentiation between the east (I) and west groups (II). We found that massive genomic regions characterized by significant genetic diversity between these two groups, especially in Chr2 and Chr11 (Fig. 7A). Our more focused examination of differentiation between the two groups focused on the region around 18 Mb of Chr2 and another region around 22 Mb of Chr11 (Fig. 7B). Within these regions, we identified 181 genes exhibiting a high degree of differentiation (FST > 0.05), and these genes featured non-synonymous mutation loci (Supplementary Table 21). These genes were significantly enriched in functions related to tRNA threonylcarbamoyladenosine and DNA replication/repair (Fig. 8A and Supplementary Tables 22). Remarkably, we identified a non-synonymous mutation at amino acid 188 within TRMU (Fig. 8C), a gene associated with the tRNA threonylcarbamoyladenosine metabolic process (Supplementary Table 22). We found that this mutation is specific to Pacific saury and the mutation frequency of the trmu gene significantly differs (p < 0.05) between the east and west groups (Fig. 8B).

Fig. 7: Patterns of genomic differentiation between east and west groups.
figure 7

A patterns of genomic differentiation between east and west groups. FST values correspond to the weighted mean per 50-kb window with 5-kb increments. Different colors represent different chromosomes. The red box indicates the chromosomes where differentiation is more concentrated. B patterns of genomic differentiation between east and west groups in Chr2 and Chr11. FST values correspond to the weighted mean per 10-kb window with 1-kb increments. The red box indicates the areas where differentiation is more concentrated.

Fig. 8: Enrichment analysis of 181 differentiated genes.
figure 8

A GO-enriched pathways of highly differentiated (FST > 0.05) genes containing non-synonymous mutations within a concentrated differentiation region on Chr2 and Chr11, including 181 genes. Only showed significant enrichment (p < 0.05) of the GO pathway. B Types of trmu genes and distribution in the east and west groups. C Three-dimensional views of TRMU protein. The three-dimensional protein model was generated by Phyre298. Pacific saury mutated amino acids are highlighted in red.

Discussion

Besides an important economic fish species, Pacific saury is also a pelagic fish of the North Pacific Ocean, making it a species of substantial research interest1,4. Genomes could greatly promote reliable population structure and evolutionary studies of Pacific saury. Previous studies have shown that Pacific saury is a diploid species with high heterozygosity24. K-mer analysis revealed a high heterozygosity (3.19%) for Pacific saury genome, posing a significant challenge for the genome assembly of the species25. Meanwhile, most previous diploid genome assemblies usually resulted in a single mosaic reference genome, consisting of portions of the parental alleles26. To overcome the challenges for the genome assembly of Pacific saury and to obtain a phased genome, we employed the PacBio HiFi and Hi-C technology to successfully assemble the first phased chromosome-level genomes of Pacific saury. We assembled twohaploid genomes with sizes of 1.10 Gb (CSA) and 1.07 Gb (CSa). The number of protein-coding genes was 22,206 (CSA) and 22,617 (CSa), which was similar to the published genome size and annotation results. This further confirmed the reasonableness of our assembly results24,27. Unfortunately, the biological information collection of Pacific saury in this study was not completed, and detailed bioinformatics data will be conducted in subsequent studies.

The genome analysis revealed the natural selection of numerous immunity-related functional genes. Immunoglobulin (Ig) is a glycoprotein that plays important role in adaptive immunity, produced by B lymphocytes upon exposure to antigens28,29. Among the essential components of immunoglobulins, IGH holds particular significance as it contributes to antigen recognition and signaling30. Furthermore, genes such as hla-a, ifn3 and ifng, which were highly expressed in the kidney of Pacific saury, are also involved in immune regulation27,31. Previous studies have shown the high prevalence of large parasitic copepods infecting Pacific saury7,15,32. Moreover, the migratory route of Pacific saury traverses complex and expansive sea areas, such as the Kuroshio-Oyashio transition zone33, which potentially harbors various marine viruses14. The expansion of immunity-related gene families observed in Pacific saury may be a response to the elevated risk of viral and parasitic infections associated with its habitat and migratory patterns.

The intestine plays an essential role in the reaction to pathogens infections, given that intestinal epithelial cells serve as a crucial physical barrier against bacterial infection in the gut34. Within the intestinal immune system, numerous immune cells are at work, and the Gut-associated lymphoid tissue (GALT) constitutes about 70% of the entire immune system in teleost fish34. Gpr35 is abundantly expressed in the intestine35, and research has demonstrated its critical role in macrophages during intestinal inflammation35, contributing to mucosal repair via migration of colonic epithelial cells36. Within the Pacific saury genome, the expansion of the gpr3 gene, alongside other immunity-related genes, is possibly to be a critical factor in enhancing resistance to marine viral and parasitic infections. This expansion underscores the species’ adaptation to its diverse and challenging habitats.

Pacific saury shows robust motility as an adaptation to its migratory lifestyle, with a substantial portion of its life history dedicated to migration13. Physical activity is closely linked to the generation of free radicals, particularly oxygen radicals, and prolonged endurance exercise can escalate free radical production16. Reactive oxygen species, such as hydroxyl radicals, produced during oxidative cellular respiration, possess the potential to induce damages to DNA bases and cause DNA strand breaks37. Repair pathways associated with DNA cross-linking damage, including Fanconi anemia (FA), nucleotide excision repair, and homologous recombination repair, assume vital roles in mitigating these types of DNA damages38,39. ATM, a key regulator of DNA damage response, is involved in various processes, such as cell cycle checks, DNA damage repair, and the maintenance of telomeres40. ERCC6, on the other hand, functions in DNA repair, the preservation of chromosome stability, and as a co-factor in base excision repair41. POLD3 participates in various DNA repair processes42,43. It is notable that genes associated with DNA repair show signatures of positive selection, which could be indicative of a specialized antioxidant mechanism that has evolved in Pacific saury in response to its demanding lifestyle.

The extensive migratory range and prolonged migration periods of Pacific saury necessitate a continuous cellular energy supply. Lipids serve as a crucial means to store and generate energy, and Pacific saury is distinguished by its elevated fatty acid content compared to other fish species44. Peroxisomes are closely associated with cellular redox metabolism, fatty acid oxidation, and detoxification of free radicals45,46,47. Key protein in this context include PEX14, which is a central component of the peroxisomal matrix protein transport system48, PEX6, which promotes fatty acid oxidation49, and HPCL2 and PHYH, essential for fatty acid alpha oxidation50,51. These positively selected genes are possibly linked to the high energy requirements associated with Pacific saury’s long migrations and may also play a role in mitigating the accumulation of metabolically generated free radicals.

To further understand the molecular adaptations to long-duration and long-distance migrations in fish, we analyzed the positively selected genes shared by two highly migratory fish species, Pacific saury and Yellowfin tuna. The identified genes under positive selection, including efnb2, crlf3, and gpx3, are primarily associated with hematopoietic cells, hypoxic response, and antioxidants (Fig. 9 and Supplementary Table 23). EFNB2 (ephrin B2) is a member of the ephrin (EPH) family, and plays an essential role in developing the nervous system and erythropoiesis. It can promote erythroid differentiation under hypoxic conditions52. CRLF3, a neuroprotective erythropoietin receptor, suggests potential impacts on primitive hematopoiesis and downstream hematopoietic progenitors in zebrafish studies53. GPX3 belongs to the glutathione peroxidase family, safeguarding cells from oxidative damage and serving as a crucial antioxidant enzyme54. These shared positively selected genes in Pacific saury and Yellowfin tuna possibly reflect a typical evolutionary pattern of oceanic migratory fish, particularly concerning their oxygen-carrying capacity and antioxidant capacities. These possible adaptations are common responses to the challenges posed by sustained or rapid movements in aquatic environments.

Fig. 9: Summary of genes associated with migratory lifestyles in Pacific saury.
figure 9

The main results are outlined: Igh, gpr35, hla-a, ifn3 and ifng genes were expanded and associated with the immune system. Atm, plod3 and ercc6 were under positive selection, and may be associated with DNA damage caused by metabolic free radicals. Pex6, hpcl2 and phyh were under positive selection, and may be associated with energy metabolism. Efnb2 and crlf3 were under positive selection that may be associated with oxygen supply.

Population analysis based on mitochondrial genes previously suggested the presence of a single Pacific saury population9. However, recent research has proposed the existence of distinct migratory routes for Pacific saury in the North Pacific region13. These differing migratory routes could potentially lead to differentiation of Pacific saury populations. Furthermore, otolith studies have revealed morphological variations in the first otolith annual ring and the sharpness of otoliths between ‘east’ and ‘west’ Pacific saury groups. Our population genomic analysis has provided additional evidence of genetic differentiation among sampling sites, and clustering of sites into two groups appears to be consistent with the two approximate migratory routes of Pacific saury (Fig. 6A, B). Notably, we have identified substantial differentiation in specific genomic regions on Chr2 and Chr11 (Fig. 7B) between the two groups. A similar phenomenon was previously found in studies of different ecotypes of Chinook salmon55. Also, studies on Atlantic cod have shown that in the presence of gene flow, ecological divergence still exists and affects specific genomic regions56. Accordingly, we postulate that the differentiated regions in Chr11 and Chr2 may also be linked to the differentiation between migratory groups within the Pacific saury population. However, confirming this hypothesis requires more further in-depth investigations. We also found that trmu gene exhibited significantly different allele frequency in the two groups. This gene has been associated with inner ear hair cell development in zebrafish studies, influencing otolith size and shape57. These findings provide compelling insights into the potential genetic basis of morphological variations in Pacific saury populations.

In conclusion, our data indicate that expanded genes and genes under selection may be associated with persistent movements, corresponding to the migratory characteristic of Pacific saury. Based on the whole-genome resequencing of 80 individuals, the Pacific saury population was divided into two groups, with genomic differences between the two groups focused on two chromosomes. Furtherly, trmu gene associated with otoliths on chromosome 2 exhibited significantly different allele frequencies in the two groups. These findings could help identify the Pacific saury population at the genetic level.

Materials and Methods

Sample collection

A female Pacific saury collected in the North Pacific Ocean (153°08’E, 43°00’N) on 22 October 2019 was used for genome sequencing and assembly. After the euthanasia, the fish was immediately dissected to extract muscle, gills, liver, intestine, and skin. All samples were frozen in the liquid nitrogen and stored in the −80°C fridge. Muscle tissues were used for DNA extraction, genome sequencing and assembly, and Hi-C library construction. We used a modified CTAB method58 to extract genomic DNA (gDNA) from the tissues for Illumina short-read and PacBio long-read genome sequencing. The concentration of gDNA was detected by NanoDrop Technologies 2000 (Wilmington, DE, USA) and Qubit fuorometer (ThermoFisher, MA, USA), further the quality of gDNA was detected by 0.8% agarose gele lectrophoresis. Gill, liver, intestine, and skin were used for RNA sequencing. A total of 80 samples were collected at eight stations in the North Pacific Ocean from June 2019 to November 2019 for resequencing analysis (Fig. 5 and Supplementary Table 1). The same modified CTAB method was used for DNA extraction of resequencing samples. During the course of this experiment, the operators strictly adhere to the Code of Ethics of the Ethics Committee for Laboratory Animal Management and Use of Ocean University of China and follow the rules and regulations of the Special Committee on Scientific Ethics of the Academic Committee of China Ocean University of China.

Genome size estimation

We used 1 μg of DNA to construct the whole-genome sequencing short-read library with 300–350 bp size for the Illumina NovaSeq 6000 platform. The sequencing library was constructed strictly according to the manufacturer’s recommendations. HTQC v1.92.31059 filtered the raw sequencing data. Finally, a total of 65.92 Gb WGS short reads (CRA015908) was generated. Clean reads were used for K-mer analysis based on GCE60 software with 17 K-mer frequencies.

PacBio HiFi-CCS sequencing and de novo genome assembly

We used g-TUBE (Covaris) to break the genomic DNA into fragments of about 15Kb randomly. SMRTbell Express Template Prep kit 2.0 reagent (Pacific Biosciences) was used to construct the SMRT bell HiFi library. The FEMTO Pulse and Qubit dsDNA HS kits were used for detecting the library size and quality. At the end, the primer and Sequel II DNA polymerase were annealed separately and combined with SMRT bell templates. The constructed gene libraries were sequenced using PacBio Sequel II in CCS mode for 30 h. CCS workflow61, with “-minPasses 3” setting, was used to generate HiFi reads from raw subreads (https://github.com/pacificbiosciences/ccs). A total of 49 Gb HiFi CCS (CRA015952) subreads was generated. The coverage was sufficient for de novo assembly according to recommendation61. Subsequently, HiFiasm62 was employed for genome assembly.

We evaluated the completeness and accuracy of the assembled genome in three methods. In the first method, Minimap2 (v2.5 default parameter63) was used to compare HiFi CCS data to assembled genomes, counting the ratio of reads, coverage of genomes, and distribution of sequencing depths In the second method, the WGS short reads data were compared with the assembled genome using BWA64 to count the ratio of reads. In the third method, BUSCO v5.7.065 was used to evaluate genome completion with actinopterygii_odb10 database. The fourth method used Merqury (v1.3 default parameter66) to evaluate genome consensus quality (QV) based on the WGS short reads.

Chromosome assembly using Hi-C technology

Muscle tissue was taken for Hi-C assisted assembly. For Hi-C library construction, the DNA was fragmented into 300–500 bp and purified using magnetic beads. Subsequently, Hi-C library was sequenced with 150-bp paired-end reads on the Illumina NovaSeq 6000 platform. Hi-C raw data were filtered using HTQC v1.92.310. A total of 138 Gb Hi-C data (CRA015885) was Generated. After filtering, the data were compared using BWA v0.7.16a-r1181, and ALLHiC67 was used to remove reads for single-end comparisons and sequences outside of 500 bp from the restriction sites.. The contigs were clustered, sorted, and oriented using ALLHiC to obtain chromosome-level genomes. The assembled genomes were assisted in constructing interactions using Juicer68 and visualized for error correction using JuiceBox69.

RNA sequencing of short reads

A total amount of 2 μg RNA was used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext® Ultra™ RNA Library Prep Kit for Illumina (#E7530L, NEB, USA) following the manufacturer’s recommendations and index codes were added to attribute sequences. The aimed products were retrieved and PCR was performed, then the library was completed. The clustering of the index-coded samples was performed on a cBot cluster generation system using HiSeq PE Cluster Kit v4-cBot-HS (Illumina) according to the manufacturer’s instructions. After cluster generation, the libraries were sequenced on an Illumina NovaSeq 6000 platform and a total of 11.68 Gb RNA short reads (CRA015982) were generated.

Genome annotation

To identify repeated elements, we used both de novo and homology-based approaches. For the homology-based approaches, RepeatMasker (open-4.09)70 and RepeatProteinMask (open-4.09) were used to search transcriptional elements (TE) by aligned Pacific saury genome with RepBase (release 21.01)71. For de novo approaches, RepeatModeler v272 and LTR-FINDER v1.0.573 were used to construct a de novo repeat library. Then RepeatMasker was used to identify the repeat sequence with RepBase. Tandem Repeat Finder (TRF)74 was used to identify tandem repeats.

We combined homology annotation, de novo annotation, and transcriptome-based annotation approaches to predict gene structure and function. For the homology annotation, we downloaded protein sequences of Sheepshead minnow (Cyprinodon variegatus), Medaka (Oryzias latipes), Marine Medaka (Oryzias melastigma), Mummichog (Fundulus heteroclitus), and Annual killifish (Austrofundulus limnaeus) from NCBI. Pacific saury genome was aligned with these genomes using TblastN75. We used EXconerate76 to predict the protein-coding gene structures based on the aligned data. For the de novo annotation, Augustus v3.377 and Genscan v3.0.478 were used to predict the protein-coding gene. We extracted and sequenced RNA reads from the gill, liver, intestine, and skin for the transcriptome-based annotation. Tophat (default parameters)79 was used to match them to Pacific saury reference genome. Cufflinks (default parameters)80 were used to splice the sequences from the match to obtain the structure of the protein-coding genes. MAKER v3.0081 was used to integrate the gene sets predicted by the various methods into a non-redundant, more complete, and reliable gene set. Finally, the proteins were functionally annotated with the help of exogenous protein databases (SwissProt82, TrEMBL83, KEGG84, InterPro85, GO86 and NR (https://www.ncbi.nlm.nih.gov/)).

Gene family clustering analysis

We selected Zebrafish (Danio rerio), Chinook salmon (Oncorhynchus tshawytscha), Medaka (Oryzias latipes), Yellowtail kingfish (Seriola lalandi), Yellowfin tuna (Thunnus albacares), and Pacific saury (Cololabis saira) for gene family analysis. The filtered dataset was compared all-versus-all using BLASTP v2.11.0 (-evalue 1e-5) to obtain similarity relationships of protein sequences. The genes were clustered into families using OrthoMCL v2.0.9 (-l 1.5)87.

Phylogenetic analysis using whole-genome information

Phylogenetic trees were constructed using single-copy genes from the six species identified by OrthoMCL. Multiple sequence alignment was performed using MAFFT v7.48788 for each single-copy gene family, with parameters by default. The super alignment matrix is constructed by combining all single-copy alignment results. We concatenated the results of all single-copy genes and extract the conserved sequences using Gblocks v0.91b (-t = c)89. All loci, phase1 loci, and 4D loci data were obtained. RAxML v8.2.12 was used to construct the phylogenetic trees of the six species (-f a -N 100 -m GTRGAMMA)90 using the maximum-likelihood method with 1,000 bootstraps. The final species phylogenetic relationships were determined based on known species relationships and the degree of agreement among phylogenetic trees. Divergence times were estimated using MCMCtree in the PAML91 software and were corrected using TimeTree (http://www.timetree.org/).

Gene family contraction and expansion and positive selection analysis

Expansion and contraction of each gene family were identified using CAFÉ v5.0 (P < 0.05)92. We used the CodeML v4.9 module in PAML to detect positive selection effects in a single-copy gene. The multiple-protein alignments of single-copy gene were generated by MAFFT v7.487 and used to estimate the dN/dS ratio (ω). Likelihood values were calculated separately using Modal A (model=2, NSsites=2, fix_omega=0) and null Mode (model=2, NSsites=2, fix_omega=1, omega=1.0) based on multiple-protein alignments. The likelihood ratio test was performed on the above likelihood values by the chi2 program from PAML, and genes with p value less than 0.05 were treated as candidates that underwent positive selection. The posterior probability of being considered a positive selection site was obtained using Bayes empirical Bayes method. Finally, KEGG and GO enrichment were used for the positively selected genes to identify functional categories and pathways.

Conserved syntenies

JCVI was used to identify and visualize regions of conserved synteny between Pacific saury and Medaka based on protein-coding gene regions using JCVI93. MCScanx94 was used to identify the location of orthologous and paralogous genes between Pacific saury and the other species.

Population genetics

Genomic DNA from the 80 individuals were sequenced using Illumina NovaSeq 6000. The sequenced data were filtered and aligned to the CSA genome using BWA software. GATK was used to perform SNP calling. Related filtering parameters for GATK were set as “QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0”. SnpEff95 was used to annotate the genetic variants. Whole-genome SNPs were further filtered (-maf 0.2, -geno 0.1, -hwe 0.0001) using Plink96. As a result, a total of 1,633,773 SNPs loci were obtained after filtering. FST values among sites were calculated along the sliding window (a window size of 50 kb with 5 kb increments) of the genome using Vcftools97. A window size of 10 kb with 1-kb increments was applied for more fine-scale analysis. Pheatmap package was used to cluster the eight sites based on the FST. ClusterProfiler package was used to perform enrichment analysis with the Zebrafish database.

Statistics and reproducibility

The statistical significance of GO and KEGG terms was evaluated using Fisher’s exact test in combination with FDR correction for multiple testing (P  <  0.05). For whole-genome resequencing, we analyzed 10 samples from each of the eight sites under the same conditions to ensure comprehensive and accurate detection of variation. *p-value  <  0.05 and **p-value  <  0.01 were considered significant and extremely significant differences, respectively.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.