Background & Summary

The rockfish of genus Sebastes Cuvier 1829 is the most specious in the family Sebastidae (Actinopterygii: Scorpaeniformes)1,2. The genus contains nearly 110 species worldwide and most of the species are subjected to substantial commercial and recreational fisheries2. Such great species diversity is likely attributed to recent species diversification processes2,3,4, thus resulting taxonomic confusion in some areas due to morphological similarity. The rockfish species have provided valuable opportunities for evolutionary studies, shedding light on the origin and diversification within the genus3,5. In addition, as ovoviviparous teleost, rockfish could provide exceptional clues for studying evolution of their reproductive ecology. Ovoviviparity is a unique fish reproduction mode, in which fertilized eggs cannot be delivered from the female ovary until the embryos are mature6. In these respects, molecular information such as whole genome data would contribute to providing more comprehensive insights into evolutionary biology of these species.

In this study, we report whole genome data of three marine ovoviviparous fish in genus Sebastes, viz., Sebastes schlegelii Hilgendorf 1880, Sebastes koreanus Kim and Lee 1994, and Sebastes nudus Matsubara 1943. The three rockfish are commercial species commonly distributed in Korea, Japan, and northeast coast of China1. Herein, a total of three male adults (each individual representing one species) were collected from coastal waters of Qingdao, China. Prior to sequencing, the genome sizes of three species were estimated as ~800 Mb, thus nearly 40 Gb sequencing data (about 50× genome coverage) of each species was produced by Illumina HiSeq2500 sequencing platform. We intend to develop genomic resources for further studies on taxonomy, phylogenetics, conservation and evolution of these commercially important rockfish in genus Sebastes.

The experimental design, sequencing and analysis pipeline is shown in Fig. 1. After data filtering, a total of 38.54, 41.26, and 37.43 Gb sequence data were produced for S. schlegelii, S. koreanus, and S. nudus, respectively (Table 1). K-mer analyses revealed the genome size was 846.4, 832.5, and 813.1 Mb for the respective three species (Table 2). The genome sequences of S. schlegelii, S. koreanus, and S. nudus were assembled into scaffolds with a total size of 755.1, 751.7, and 748.5 Mb, respectively. The estimated genomic information of three rockfish species were shown in Table 2.

Fig. 1
figure 1

Overview of the experimental design and analysis pipeline.

Table 1 Summary of the high-throughput sequencing in this study.
Table 2 Statistical information of k-mer analysis and genome assembly in this study.

The filtered clean data were mapped to the reported S. steindachneri (GCA_001910785.2) reference genome and the generated bam files were subsequently investigated in demographic analyses. A coalescent-based hidden Markov model, the pairwise sequentially Markovian coalescent (PSMC) model, was used to infer the history of effective population sizes (Ne). The PSMC results exhibited contrasting demographic changes in the last glacial, revealing Ne increase in S. schlegelii and decrease in other two species (Fig. 2). The demographic analyses suggested that drastically different responses to climate changes can be detected in closely related species, as reported in demographic changes of two closely related dolphin species7. Such contrasting demographic changes could be due to the altered ecology of competitors and the pattern of population differentiation7. Further studies are warranted to specify the contrasting demographic patterns among closely related species. In addition, phylogenetic relationship of species in genus Sebastes were reconstructed based on whole genome sequences. Supplemented with six reported genome sequences, a total of 14,821,089 single nucleotide polymorphisms (SNPs) were identified. After SNP filtering, the remaining 46,624 SNPs were employed in phylogenetic reconstruction. The neighbour-joining topology revealed closer relationship of S. schlegelii, S. koreanus, and S. nudus, compared to other rockfish species in this genus (Fig. 3). Based on a literature survey and author knowledge, the reported whole genome data in the present study is the first whole genome information present to the public of the three rockfish, therefore, these data could be valuable for further studies on taxonomy, phylogenetics and evolutionary biology of rockfish species.

Fig. 2
figure 2

Demographic history of three rockfish species in this study. PSMC estimates of demographic changes in effective population size (Ne) over time inferred from the draft genome sequences of the three rockfishes. Thick lines represent the median and thin light lines correspond to 100 rounds of bootstrapping.

Fig. 3
figure 3

Phylogenetic relationship reconstructed based on whole genome sequences of nine rockfish species. The whole genome sequences of 9 rockfish species (including 6 reported species and 3 species in this study) were used for phylogenetic reconstruction based on neighbour-joining algorithm.

Methods

Sample collection

Animal experiments were conducted in accordance with the guidelines approved by the Zhejiang Ocean University Animal Ethics Committee and the national legislation. The sample collection procedure was following the description of our previous published work (ref.8). To obtain enough genomic DNA for the Illumina sequencing, we collected fresh epaxial white muscle tissues from Sebastes schlegelii, S. koreanus, and S. nudus sampled from Qingdao, China. The samples were quickly frozen in liquid nitrogen for 1 hour before storing at −80 °C. Genomic DNA was extracted using a standard phenol/chloroform extraction protocol. The integrity of genomic DNA molecules was checked using agarose gel electrophoresis, showing a main band around 20 Kb and satisfying the requirement for Illumina library construction by the manufacturer’s protocol.

Whole-genome sequencing

Whole genome sequencing was performed commercially at Novogene Co. Ltd in Beijing. In brief, 1.0 μg of genome DNA was fragmented using an E210 Focused-ultrasonicator (Covaris, Woburn, MA). The sheared DNA fragments were used to prepare pair-end libraries with an average insert size of 350 bp for all samples according to the manufacturer’s instructions (Illumina Inc., San Diego, CA). Each library was sequenced in two independent lanes of HiSeq 2500 platform (Illumina Inc.) using 150-bp pair-end fashion. The raw data were converted to single-sample FASTQ files through base calling procedure and after filtering interference information such as adaptors and low-quality reads, the clean data FASTQ files of each sample were employed for further bioinformatics analyses.

Genome assembly

The genome size, heterozygous ratio and repeat ratio were estimated using k-mer analysis (K = 17) performed in GCE v1.0.09. Pair-end reads were assembled into contigs and scaffolds in SOAPdenovo v2.0110 with a k-mer of 41 by applying the de Bruijn graph structure.

Phylogenetic analysis

The generated genome data were supplemented with publicly available sequences of six rockfish species in genus Sebastes, i.e. S. steindachneri (GCA_001910785.2), S. aleutianus (GCA_001910805.2), S. minor (GCA_001910765.2), S. nigrocinctus (GCA_000475235.3), S. norvegicus (GCA_900302655.1), and S. rubrivinctus (GCA_000475215.1) downloaded from NCBI database. The clean reads were aligned to the genome reference of S. steindachneri by using the bwa-mem algorithm in BWA 0.7.1211 with default parameters. Single nucleotide polymorphisms (SNPs) calling was implemented in SAMtools 1.3.112 with default parameters. SNP filtering was produced using VCFtools13. The SNP calling procedure and parameters are expanded versions of descriptions in our related work14. In order to avoid sex bias affecting topological structure, contigs containing SNPs were cross-validated with the sex-determining loci identified in the previous study15. Sex-determining SNP loci were excluded in phylogenetic analysis. Phylogenetic tree of the nine species of Sebastes based on the filtered SNPs was reconstructed using neighbour-joining (NJ) method in Tassel 516 with default parameters. However, potential sampling bias should be raised as a caveat when performing phylogenetic analyses based on SNPs derived from one single individual per species. Further analyses are warranted to obtain more robust results by sampling more individuals.

Demographic analysis

Analysis of demographic history for all three rockfish species was done using the PSMC model, as implemented in the PSMC package17. The “fq2psmcfa” and “splitfa” tools from the PSMC package were used to create the input file for the PSMC modelling. The PSMC analysis command included the options “-N25” for the number of cycles of the algorithm, “-t15” as the upper limit for the most recent common ancestor (TMRCA), “-r5” for the initial θ/ρ, and “-p 4 + 25*2 + 4 + 6” atomic intervals. The reconstructed population history was plotted using “psmc_plot.pl” script using the substitution rate “-u 2.5e-8” adopted from medaka18, and a generation time of 8 years. The generation time was calculated as: g = a + [s/(1 − s)]19, where s is the expected adult survival rate which is assumed as 80%, and a is sexual maturation age that is 4 years for S. schlegelii20. Therefore, the generation time was determined as 8 in the PSMC analysis. To determine variance in the estimated effective population size, we performed 100 bootstraps for each species.

Data Records

All sequencing raw reads for the three rockfish species have been deposited within NCBI Sequence Read Archive21, and the assembly genome sequences (Sebastes schlegelii22, S. nudus23, and S. koreanus24) have been deposited within GenBank. Also, the assembly genome sequences, aligned VCF files and phylogenetic tree file were stored in Figshare25.

Technical Validation

In our present study, the sampled fish individuals were captured using hook-and-line fishing in the coastal waters of Qingdao, China. Taxonomic determination was implemented in the laboratory by identifying morphological characters. The DNA quality was checked using agarose gel electrophoresis (Fig. 4). The preprocessing steps including quality evaluation and data filtering of raw reads were implemented by the following procedures as in the previous study8. The quality of raw reads was evaluated using FastQC26 software and low-quality reads were filtered using HTQC27 software according to the following criteria: (1) adaptors in the reads were trimmed and removed; (2) read pairs were removed when either of the reads had more than 10% of N bases; (3) read pairs were removed if either of the reads had more than 20% low-quality bases (phred quality score < 5); (4) ambiguous or low-quality fragments at the two ends of reads within a window size of 5 bp and an average quality threshold of 20 were trimmed. The sequencing quality was also assessed by examining GC-content, Q20-statistics and error rate (Table 1, Fig. 5). FastQC output files can be also viewed within the Supplementary Information. Moreover, the parameters used in bioinformatics analyses were following the default settings or the published literatures, which were provided in the Methods section.

Fig. 4
figure 4

Agarose gel electrophoresis of DNA integrity assessment. The DNA lanes presented here have been cropped from a larger image with multiple DNA samples. Two kinds of DNA markers (M-1 and M-2) were used for DNA integrity assessment. Numbers embedded in the diagram (33, 34, and 35) represent S. schlegelii, S. koreanus, and S. nudus, respectively.

Fig. 5
figure 5

Quality evaluation including base composition, quality scoring and error rate of sequencing data. Sequencing quality met the requirement of further bioinformatics analyses in all three species. Illustrated here by the example of S. nudus.