Introduction

Genetic maps represent essential tools for genomic research in aquaculture. Originally, linkage mapping studies were mainly based on microsatellite (SSR) and AFLP markers1,2; nevertheless, they recently reached a milestone with the development of genotyping methods based on cost-effective massive parallel sequencing. The genomic revolution has made single-nucleotide polymorphisms (SNPs) very popular, opening up access to a simple biallelic marker with a wide distribution and high abundance across the genome. As consequence, an increasing number of high-density genetic maps is nowadays reported in non-model organisms including aquaculture fish3,4. These maps have proven to be useful to provide new clues on genome evolution and speciation between closely related lineages, and to unravel the genetic architecture of both simple Mendelian and complex quantitative traits in many fish species, thus facilitating marker-assisted selection in aquaculture5,6. More recently, a new application of high-density linkage maps as backbones to anchor de novo genome assemblies into pseudo-chromosomes has become more widespread7,8. Although long-read sequences have significantly enhanced the average size of scaffolds in de novo assembled genomes9, the total number of scaffolds are still far beyond the expected number of chromosomes. The large arrays of repeated sequences and the degree of conservation for some tandem repeats families widely distributed across the genome still remain a major obstacle for most de novo assembly algorithms, resulting in fragmented scaffolds or even misassembled sequences within chimeric contigs. Linkage maps thus provide highly valuable tools to anchor physical maps into pseudo-chromosomes, while enabling the identification of chimeric or misassembled contigs towards enhancing the quality of new genome assemblies7.

Flatfish (Pleuronectiformes) is an attractive group of fish that have long been investigated due to the drastic morphological, physiological and behavioural remodelling changes that occur during metamorphosis from a pelagic larva to a benthic juvenile stage. Several flatfish species are worldwide exploited in fisheries and aquaculture, thus representing an important resource for human consumption. This taxonomic group diverged from carangimorphs in the early Paleocene, and underwent a major diversification in the middle Paleocene10. Cytogenetic studies have suggested that the Pleuronectiformes ancestor should have 2n = 48 chromosomes in agreement with the most frequent number of chromosomes found in the sister clade Carangidae, and in the most deep-branching flatfish families (Pleuronectidae and Paralichthyidae)11. However, the number of chromosomes in flatfish encompasses a wide range varying from 2n = 26 to 2n = 5011,12. An intense cascade of Robertsonian rearrangements and pericentromeric inversions seems to have shaped flatfish genome evolution, especially reducing the chromosome number in most recently diverged families of Soleidae, Cynoglossidae and Achiridae11. A recent comparison of the turbot genome with other fish assemblies clearly pointed out the high degree of conserved synteny across chromosomes in Pleuronectiformes, although with high rates of intrachromosomal reorganisations. Moreover, some chromosome fusions identified through comparative mapping are thought to have given arise to a new karyotype organization in turbot3. Hence, integrated genetic and physical maps are important genomic resources to understand chromosome evolution in flatfish.

The Senegalese sole is an important flatfish in aquaculture and fisheries. A genetic linkage map based on 129 SSRs grouped into 27 linkage groups (LG) was previously reported13. Moreover, an integrated map using BAC clones and repetitive DNA families was also developed using a multiple fluorescence in situ hybridization (mFISH) technique with at least one BAC mapped to each chromosome arm14. This cytogenetic study evidenced a lack of heteromorphic sex chromosomes and identified the largest metacentric chromosome to result from a Robertsonian fusion of two acrocentric chromosomes during flatfish evolution15,16. Moreover, a preliminary draft genome sequence of a female Senegalese sole was reported (600.3 Mb, N50 of 85 kb), and then further improved with a hybrid assembly using Nanopore and Illumina reads (608 Mb long, N50 of 340 kb)17,18. This genome information was used to design whole-genome multiplex PCR and create a new integrated SSR map with 234 markers. Nevertheless, further efforts are required to better assemble and anchor scaffolds onto the 21 expected chromosomes, and to better understand the genomic architecture of sex-determination.

The aim of this study was to: (1) generate an improved de novo assembly of a male Senegalese sole based on a combination of long and short read sequencing; (2) build a high-density genetic map using ddRAD markers; (3) anchor the physical to the genetic map in order to (4) improve the scaffolding of the reference genome assembly; (5) estimate genome-wide variation in recombination rates; and (6) carry out GWAS analysis to identify sex-associated markers and intra- and interspecific comparative mapping to better understand the evolutionary history of chromosome rearrangements in flatfish.

Material and methods

Animals

Soles used for the preparation of ddRAD libraries and sequencing were selected from the genetic breeding program carried out by the IFAPA in collaboration with a commercial aquaculture company (CUPIMAR S.A.). Production of families used in this study, genotyping and parentage assignment were previously published19,20. Five families (three full-sib and two maternal half-sib families) containing between 48 and 96 individuals per family (total n = 356) were selected to construct the genetic linkage map (Table 1). Moreover, seven families with sex ratios close to 1:1 were selected for genome-wide association analysis (GWAS). Average weight and length of each family are depicted in Table 1. As genotyping of parents was also required to build the genetic map, five fathers and three mothers involved in family production were sampled for blood by puncturing in the caudal vein using a heparinized syringe, adding heparin (100 mU) and keeping at − 20 °C until use. To obtain high-molecular weight genomic DNA for genome sequencing, a wild male from the broodstock (weight higher than 2 kg; code Sse05_10M) was sampled for blood as indicated above.

Table 1 Families used to construct the genetic linkage map (LM) and association study (A).

All procedures were authorized by the Bioethics and Animal Welfare Committee of IFAPA and given the registration number 10/06/2016/101 by the National authorities for regulation of animal care and experimentation. The study was carried out in compliance with the ARRIVE guidelines and all procedures were performed in accordance with Spanish national (RD 53/2013) and European Union legislation for animal care and experimentation (Directive 86\609\EU).

Genome sequencing and assembly

Methods for genome sequencing and assembly are fully described in “Supplementary method”. Briefly, high-molecular weight genomic DNA was prepared from heparinized whole blood using the MagAttract HMW DNA kit (Qiagen). Once confirmed quality, four libraries were prepared for sequencing using the Oxford nanopore Technology (ONT) MinION platform. Overall, 19.2 Gb of genome information was generated with an average read length of 4.3 kb. In parallel, the same sample was also sequenced in a NextSeq550 sequencer (Illumina, USA) that overall generated 43 Gb of sequence from 143 million reads (average length 147 nt). The main features of the libraries used during the genome assembly are presented in Supplementary Table S1. The raw read data were deposited to the NCBI Sequence Read Archive (SRA) under accession number SAMN16809702. The hybrid genome assembly was carried out using MaSuRCAv3.2.321,22 with the Illumina libraries (57.3 × coverage) and the error-corrected Nanopore reads (25.5x). The LR-hybrid assembly was characterized for completeness using Benchmarking Universal Single-Copy Orthologs (BUSCOv3.0.2)23,24 containing 4,854 single-copy orthologs from actinopterygii_odb9.

ddRAD-seq library preparation and sequencing

Genomic DNA from the caudal fin (offspring) or whole blood (parents) were purified using the Isolate II Genomic DNA Kit (Bioline). DNA was sent to the company LifeSequencing S.L. (Valencia, Spain) and a total of 346 samples were selected for library construction (Table 1). Libraries were constructed based on the protocol described by Peterson et al.25 using the EcoRI/NcoI enzyme combination that generated as average 24,874 SNPs per sample. Pools of libraries were loaded on a Novaseq 6000 sequencer (Illumina), following the manufacturer's instructions and the specifications mentioned above. The total number of reads generated for each library are indicated in Supplementary Table S2.

Genetic linkage map and scaffold anchoring

Illumina reads were processed using Stacks v2.3e26 as indicated in “Supplementary method”. To construct the map, SNPs were filtered using Plink v1.927 to remove markers that segregated with Mendelian errors in more than 10% of individuals. Moreover, those individuals with more than 5% of markers with Mendelian errors were removed (Supplementary Fig. S1). The final SNP dataset contained 40,041 markers from 327 individuals (Table 1) and 8 parents that were imported in LepMap37. The SNPs were assigned to 21 linkage groups (named as SseLGs) corresponding to the expected number of chromosomes (2n = 42) using the·"SeparateChromosomes” module. A LOD threshold of 11 and a size limit of 200 were selected as the most adequate parameters to keep an optimal number of markers grouped in the expected number of SseLGs (Fig. 1A,B). Module JoinSingles2 was run to assign additional single SNPs to existing SseLG using decreasing LOD score iterations from 10 to 5 (Fig. 1B). Finally, the genetic distances between markers on each SseLG was calculated with the OrderMarkers2 module (male, female, sex average (SA)) using the Kosambi mapping function. The resulting genetic map was visualized using the software linkagemapview28. Scaffolds anchoring was carried out using the Lep-Anchor program following the author's recommendation29 and indicated in “Supplementary method”.

Figure 1
figure 1

Selection of LOD score limit (Lod) to construct genetic map in LepMap3. (A) The average of number of markers (nMarkers) positioned in linkage groups (left Y axis) and the number of linkage groups (nLG; right Y axis) for Lod values from 1 to 15 as implemented in the "SeparateChromosomes” module. Lod11 (shaded) indicates the value selected that grouped the markers in 21 LGs. (B) Average number of markers recovered and added to the 21 LGs using decreasing LOD score iterations from 10 to 5 in the JoinSingles2 module.

Genome annotation

Genome annotation was performed by combining alignments of Danio rerio, S. maximus and S. semilaevis proteins, RNAseq from several tissues and developmental stages alignments and ab initio gene predictions. Annotation process is described in “Supplementary method” with a higher detail. Functional annotation was performed on the male annotated proteins with Blast2GO30. After performing an alignment-based strategy to determine equivalences between female and male genomes (see “Supplementary method”), the female proteins inherited the functional annotation of their male equivalences. Next, functional annotation was performed in the female genes that remained unannotated after this step. Gene Ontology (GO) enrichment was carried out with topGO in those genes that were unique to one of the genomes (Supplementary Table S3).

Recombination rates, association analyses and cross-species comparisons

Recombination rate variation along the genome was evaluated by comparing the consensus linkage map for both sexes and SA and the physical map of each pseudo-chromosome using MareyMap31. The cumulative recombination frequency (RFm) along LGs was used to infer the chromosome type as previously described32. GWAS analysis were carried out with seven families (Table 1) using a logistic mixed model (multi-step) approach as implemented in the R package GENABEL (v1.8–0)33 for binary traits (Female = 0 and Male = 1). A highly detailed analysis of synteny across flatfish is beyond the scope of this study, but a chromosome alignment analysis was carried out to identify chromosomal rearrangements in flatfish using D-Genies34. We then used the SatsumaSynteny to compute whole-genome synteny blocks35 that were later represented using Shinycircos36.

Results

Male genome assembly and annotation

A de novo hybrid genome for a male sole was assembled using a combination of Illumina and Nanopore long-reads. Main features about the total number of input reads used for each sequencing platform, the average read length and quality and total sequencing information used in the assembly are indicated in Supplementary Table S1. The hybrid assembly draft sequence was generated using MaSuRCA and later refined with Pilon to correct bases, mis-assemblies and filling gaps. Main statistics about the assembly are depicted in Supplementary Table S4. The new assembly consists of 3,403 contigs with a total length of 609,359,514 bp, and a N50 of 513 kb. Overall, 49.4% of contigs had a size longer than 50 kb and the largest fragment was 4.5 Mb long. The estimated gene integrity, as determined by BUSCO analysis, revealed 97.0% completeness. For comparison purposes, the assembly statistics for a recent female genome draft of S. senegalensis20,20 are also shown in Supplementary Table S4. Both genome assemblies had a similar size (608–610 Mb) although the newly assembled male genome had longer contigs with higher N50 values. A dot-plot alignment using the scaffolds of both genomes indicated that with 92.8% of genomic information highly similar (> 75%) and only 5.3% had no similarity (average similarity 94%) (Fig. 2).

Figure 2
figure 2

Dot plot comparison of scaffolds (SCF) assembled (A) or 21 pseudo-chromosomes (B) in the male with respect to SCF in the female. Scale is indicated below.

Assembly annotation statistics are depicted in Table 2. The number of protein-coding genes in the male assembly (27,175) was slightly lower than in the female (28,988) but with a longer mean length (7.4 vs 6.7 kb). The estimated percentages of annotated transcripts (69.4–72.1%) and gene density (45.03–47.68) were similar between both assemblies. Around 85% of the annotated genes in each assembly had an equivalent gene in the other assembly. However, a few genes were only present in one of the genomes (unique genes). Some of these gene differences might be due to genome heterozygosity and repeat content or even sex-specific genes. A GO enrichment analysis using these unique genes indicated that categories related to the cell-cycle regulation and regulation of transcription, involving canonical histones H3.2 and H4 and retinoid X receptor alpha (rxra), were highly significantly overrepresented in the female (p-value < 10–3). Mapping of these two histone genes on female assembly showed that they were co-localized in five scaffolds (Sosen1_s0284, Sosen1_s0324, Sosen1_s1454, Sosen1_s1522, Sosen1_s1726), four of which clustered in SseLG1 and one in SseLG16. In male, the most significant enriched categories for unique genes were skeletal system development and morphogenesis although with P-values > 0.001 (Supplementary Table S3). Some short, single-exonic unique genes might be the result of scaffold splitting or annotation processes. The non-coding gene annotation resulted in 23,822 female and 21,123 male transcripts, respectively. From these, 6,549 and 6,007 female and male transcripts were long non-coding RNAs (lncRNAs) and the rest short non-coding RNAs.

Table 2 Summary annotation statistics for male and female assemblies.

ddRAD sequencing and SNP detection for genetic linkage map

Three full-sib and two half-sib families consisting of 47 to 95 individuals were used for ddRAD analysis (Table 1). The total number of paired-end reads generated for each family ranged between 280,609,738 (F5) and 398,313,256 (F2) with an average length of 150 nt (Table 3). The average number of reads per individual in each family varied between 6,444,752 (F1) and 11,692,072 (F5) (Table 3 and Supplementary Table S2). For parents, the average number of reads was 8,847,913.

Table 3 Main statistics of ddRAD libraries, mapping and SNP detection.

The new assembled male genome was used as reference to map the ddRAD reads. The average fraction of primary alignments onto this reference genome ranged between 88.04 (F6) and 89.71% (F2). An average of 10.5% of reads had insufficient mapping qualities or excessively soft-clipped primary alignments while less than 0.34% were unmapped. A total of 199,188 ddRAD loci were reconstructed with an average number of loci per sample ranging between 23,828 (F1) and 30,550 (F7) and a mean insert length of 330.7 bp. The effective coverage per sample was 193.3 ± 110.4 (ranging from 146 to 242 between families) and the estimated mean number of sites per locus was 242.8 (Table 3).

Construction of a linkage genetic map and anchoring to physical map

To construct the genetic map, only those SNPs detectable in at least 80% of samples with a coverage of 10 reads per sample were considered. Moreover, SNPs with a significant deviation from Mendelian segregation were also removed (a total of 2,439 markers, 5.7% SNPs). By family, the number of markers with Mendelian errors ranged from 1.5 to 1.7% (Supplementary Fig. S1). Moreover, those animals with markers that had more than 5% of Mendelian errors (19 specimens) were also removed. Overall, the final dataset contained 40,041 SNPs segregating in eight parents and their 327 offspring.

For linkage analysis, the ParentCall2 module retained only 16,287 informative markers after checking for segregation distortion (P < 0.05). Markers grouped into 21 SseLGs (via the SeparateChromosomes2 module) with a LOD = 11 (Fig. 1), which is consistent with the number of chromosomes in S. senegalensis. Each SseLG contained between 530 and 1,337 markers with an average number of 21.9 markers per Mb (Fig. 3, Table 4 "Anchoring genetic map and physical map"). In total, the genetic map allowed the anchoring and positioning of 1,665 out of 3,403 total contigs, ranging between 50 to 129 contigs in each SseLG. The genome sequence positioned on the linkage map was larger (746.3 bp) than the assembly size, mainly due to the presence of chimeric contigs (n = 133) positioned in various chromosomes.

Figure 3
figure 3

Genetic distance (cM) and SNP distribution across 21 linkage groups (SseLG) of the Senegalese sole.

Table 4 Information for anchored physical map (LepMap3 step), after genome re-scaffolding (Lep-anchor3 step) and after removal of markers with discrepancies between genetic and physical maps (MareyMap step).

Rescaffolding of reference genome with the genetic map

SNP marker information was further used for fine-scale correction of genome contigs to build 21 pseudo-chromosomes. After masking the repetitive sequences, the contigs were orientated and sorted within each SseLG (Table 4 "Genome re-scaffolding"). The total number of positioned contigs reduced from 1,665 to 1,563. Lep-anchor corrected the contig errors removing six contigs, splitting another 105 into two fragments, 20 in three fragments, and two in more than four fragments. After these corrections, the total number of markers assigned to the SseLGs decreased by 1.3% (16,075 SNPs) and 212 markers were moved to unplaced with an average density of 10.3 markers per contig. After these corrections, 548.6 Mb out of the 610.4 Mb total assembly length (89.9%) were assigned to the 21 SseLGs and only 61.9 Mb remained as unanchored (Table 4). The total map length was 2,408.1 cM, SseLG1 was the largest group (42,924,012 bp and 147.3 cM) and SseLG4 showed the highest marker density per megabase (33.1). The average marker interval reached 0.155 cM. A further refining of anchored markers was carried out through the comparison of physical and genetic distance in MareyMap. The average genome-wide recombination rate (RR) was 4.35 cM/Mb (ranging between 3.45 and 5.26 cM/Mb among chromosomes) (Table 4 "Marker refining"). An alignment of the anchored and refined reference male genome with the scaffolds of the female assembly (Fig. 2B) slightly increased to 93.2% the regions with more than 75% similarity and provided a clear sequence alignment in the diagonal with only dispersion in unplaced scaffolds.

Analysis of recombination rates

Consensus genetic maps for female and male were 2,698.4 cM (15,022 markers) and 2,036.6 cM (15,390 markers), respectively. These differences in map size were observable for the 21 SseLGs (Fig. 4A and Table 5). Overall, the female-to-male ratio (F:M) for genetic distances was 1.32, ranging from 1.08 (SseLG15) to 1.77 (SseLG5) (Table 5). The genetic map length of chromosomes was highly positively correlated with their physical length in both males (r = 0.43) and females (r = 0.60) (Fig. 4B). The average genome-wide RR was estimated 3.02 ± 0.37 cM/Mb in males and 4.51 ± 0.57 cM/Mb in females (Table 5). The overall female-to-male ratio (F: M) for RR was 1.49, ranging from 1.43 to 1.90 across chromosomes. In the case of males, SseLG12 showed the lowest (2.47 cM/Mb) and SseLG16 the highest (3.60) mean RR values. In females, SseLG4 had the lowest (3.57 cM/Mb) and SseLG5 the highest (5.65 cM/Mb) mean RR values.

Figure 4
figure 4

Comparison of male and female genetic maps. (A) Male vs female linkage groups lengths (cM) for the 21 Senegalese sole chromosomes. All chromosomes exhibit female-biased recombination. (B) Correlation between recombination map and physical map lengths in both males (blue) and females (orange). The determination coefficient R2 is shown separately for each sex.

Table 5 Refined genetic maps for male (M) and female (F).

The local RR value as estimated by the relative distance to the nearest telomere was clearly different between males and females. High RR values were mainly concentrated close to the telomeres in males (Fig. 5A), while they were more uniformly distributed in females with higher RR being found around 15% of the distance to the nearest telomere (Fig. 5B). This was illustrated by contrasted chromosomal RR landscapes between males and females, as shown Fig. 5C,D for SseLG1 (landscape for all SseLGs are represented in the Supplementary Fig. S2 for males and Supplementary Fig. S3 for females). We detected some regions within SseLGs (i.e. 5, 11, 13, 14, 15, 18) with very low RR. In the case of SsseLG18, partially restricted male or female RR was detected in the region comprised between 9.5 and 10.9 Mb. This region had very low RR in males (1.2) and females (0.6) compared with average SseLG18 (3.0 and 4.9 RR, respectively). Cumulative RR crossed between both sexes around chromosomal position 10 Mb with female RR closed to zero in 10.8–10.9 Mb (Fig. 6, Supplementary Fig. S2 and S3). Moreover, recombination frequencies were used to describe and classify chromosome morphologies. Figure 7 depicts the typical RFm plots for an acrocentric (SseLG20) and a metacentric (SseLG1) chromosome (for all SseLG see Supplementary Fig. S4).

Figure 5
figure 5

Recombination landscape averaged across linkage groups for (A) male and (B) female. The recombination rates (cM/Mb) and the relative distance from the nearest telomere scaled by the chromosome length (f) is represented. The red dashed line indicates the observed tendency. Panels (C,D) show the relationship between physical and genetic distances for SseLG1 in male and female, respectively. The square inside the panels (C,D) show the specific recombination landscape. The complete information for all SseLGs is shown in Supplementary Fig. S2 and S3.

Figure 6
figure 6

Sex-associated SNPs and RR landscape for males and females in SseLG18. (A) Manhattan plot of GWAS results for sex-associated SNPs using seven families. Significant markers are indicated in green. The horizontal red line represents the Bonferroni significance threshold. (B) Distribution of all 66 sex-associated significant markers using seven families and by family (in red, Supplementary Table S5) and RR (cM/Mb) landscape of males and females. A hot region from 9.5 to 10.9 Mb containing the candidate gene fshr is indicated on the right side. Physical positions of SseLG18 in Mb are indicated in black. Black lines indicate non-significant markers in SseLG18.

Figure 7
figure 7

Plots illustrating the recombination frequency estimates (RFm) for intervals between markers along SseLG1 and SseLG20 in the male and female. For each LG, RFm was calculated from both chromosomal extremities (right: red circles; left: blue circles), using each of the two terminal markers as a reference starting point. The RFm plots of SseLG1 and SseLG20 show a classical metacentric and acrocentric pattern, respectively. The RFm plots of all SseLGs are illustrated in Supplementary Fig. S4.

Association analyses for sex

To identify genome regions associated with sex, a GWAS analysis was carried using seven families (Table 1) and a total of 10 426 markers. Data for RAD-seq data and markers are indicated in Table 3. The results showed 30 markers significantly associated with sex after bonferroni correction using seven families (P ≤ 4.8 × 10–6; Fig. 6A and Supplementary Table S5). When the association analysis was repeated separately by family, five families provided some new 36 significant markers (Supplementary Table S5). All of them (66 SNPs including the whole-population and families) were spread in the SseLG18 with a hot region around 9.5–10.9 Mb (Fig. 6B). RR in this region was low (see above) with partially restricted RR associated with sex. Overall, 80.7% of significant markers using the whole population were preferentially heterozygous in males although penetrance was incomplete in most of them. This model is compatible with a nascent XY system. It should be noted that specific markers in family 4 had an expected high number of heterozygous loci in females.

To detect candidate sex-related genes, the full-length transcriptome38 was blasted onto the SseLG18 (Supplementary Fig. S5) and a total of 229 genes were positioned. The significant SNPs were highly distributed through the pseudo-chromosome, but the follicle stimulating hormone receptor (fshr) gene just appeared located in the hot region revealing as a clear candidate gene for sex determination.

Interspecific chromosome rearrangements

An alignment of SseLGs pseudo-chromosomes with the chromosomes of three other Pleuronectiformes genomes (Cynoglossus semilaevis, Scophthalmus maximus, Paralichthys olivaceus) showed high similarity rates of and conserved macrosynteny level for fifteen out of 21 SseLGs (Fig. 8 and Supplementary Table S6). However, deviations from diagonal in the dot plot alignment indicated extensive intrachromosomal rearrangements among species. The three largest SseLGs appeared to be the result of total or partial chromosome fusions when compared with other flatfish genomes (Supplementary Fig. S6 and S7), and S. maximus seemed to be the flatfish species with the highest number of chromosome rearrangements between the four species compared. Genome comparisons using D-Genies34 indicated that the highest similarity was with P. olivaceus (no match 57.3%), followed by S. maximus (no match 59.6%), and C. semilaevis (no match 78.4%).

Figure 8
figure 8

Chromosomal alignment and synteny analysis between flatfish genomes. Top panel, Dot plot comparison of 21 pseudo-chromosomes of S. senegalensis with the genomes of the flatfish C. semilaevis (left), S. maximus (center) and P. olivaceus (right). Chromosome numbers or SseLGs are indicated. The chromosome fusions are boxed. Identity scale is indicated below. Bottom panel, syntenic comparison between flatfish genomes.

When the reduction of the number of chromosomes was explored three main Robertsonian fusions in the SseLG1 (Chr18-Chr11), SseLG2 (Chr14-Chr15) and SseLG3 (Chr9-Chr16) could explain the reduction from n = 24 in P. olivaceus to n = 21 in S. senegalensis (Fig. 7, Supplementary Fig. S6 and S7 and Supplementary Table S6). When compared to S. maximus (n = 22), the SseLG1 appeared as a fusion of Chr7 and Chr21. Moreover, translocations of regions from Chr1, Chr4, Chr7, Chr14 and Chr16 were also observed. In the case of C semilaevis with sexual chromosomes (ZW) and the same number of chromosome than S. senegalensis, a Robertsonian fusion in SseLG1 between Chr3–Chr20 was observed. Moreover, the SseLG3 appeared as a new chromosome resulting of the fission of Chr1 (mainly located in SseLG16) and Chr8 (mainly located in SseLG18). Two other major features in this species with respect to S. senegalensis were: (i) a translocation of a Chr14 region to Chr16 to create the SseLG2; and (ii) sexual ZW chromosomes appear concentrated in SseLG5 although high similar sequences are widely distributed throughout the genome. Comparison among all flatfish species (Fig. 7, Supplementary Fig. S6 and S7, Supplementary Table S6) indicated that those chromosomal regions associated with SseLG2 and SseLG3 were mainly involved in the changes of karyotypes of the four Pleuronectiformes species whereas the SseLG1 arose as a lineage-specific fusion event.

Discussion

Genome assemblies and genetic linkage maps provide complementary information that can be integrated to produce high-quality physical maps. The resulting accurate chromosome assemblies are suitable to investigate genome evolution and species diversification, the genetic architecture of QTLs and the regulation of targeted genome regions. In this study, a de novo hybrid assembly for a male sole and a high-density SNP map were generated and combined to provide a polished draft assembly of 21 pseudo-chromosomes. A genome for a female sole was previously reported17 although it was highly fragmented (N50 = 85 kb, 600.3 MB long). Later, this assembly was improved by integrating Nanopore and Illumina reads, resulting in 5,748 contigs with N50 = 339.9 kb and 608 Mb long20 (Supplementary Table S4). In this study, the newly obtained male assembly has a lower number of contigs (3,403) and higher N50 (512.7 kb) and confirmed that the genome size of sole is around 609 Mb. This genome size is similar or even a bit larger than other flatfish39,40,41,42. A dot-plot alignment analysis indicated a high similarity between male and female genome assemblies perfected aligned along the diagonal (Fig. 2) with a completeness similar to other high-quality fish assemblies (> 95.5% complete genes)40,43,44.

Male genome characterization identified 50,133 transcripts and 27,175 protein-coding that agrees with the number of predicted transcripts in a recently assembled informative transcriptome38. Moreover, a small subset of unique genes was identified in both sexes with a high overrepresentation of cell-cycle regulation and regulation of transcription categories (including mainly the histones H3.2 and H4) in the female. In mammals, unique histone variants are specifically expressed in spermatogenic cells45. Moreover, expansion of histone multigene clusters in scleractinians was associated with sexually dimorphic expression of some variants playing a role in the control of gene expression in female and male germ cells during gametogenesis46. In sole, at least two loci of canonical histones in the largest metacentric chromosome SseLG1 linked to dmrt1, a key determination gene in other flatfish, were reported in sole16,39,47. This chromosome arose after a Robertsonian fusion and intense reorganization events12 that could have birth to new histone clusters under purifying selection48. Although we cannot exclude that some differences in the number of histone copies between both genomes could be attributed to individual variation, one plausible hypothesis is that some of these histone clusters could have subfunctionalizated and acquired a role in gametogenesis in a sex-specific manner. This hypothesis is supported by the identification of a rxra-like receptor also represented in such GO categories able to mediate the masculinizing effects of females mediated by its ligand TBT in rockfish females 49.

De novo assembled male genome was used as reference to map the ddRAD sequences and construct a high-density genetic map. The sole consensus map size and the number of high-quality markers used (Fig. 3; Table 4) were similar to those reported for turbot (2,622.09 cM)6 and flounder (3,497.29 cM)50 although with a higher density of markers (only 6,647 and 12,712 SNPs in turbot and flounder, respectively). Most importantly, markers were distributed into 21 SseLGs that match with the haploid karyotype (2n = 42) of the species51. Until now, two genetic maps with 129–229 microsatellites were reported in Senegalese sole13,20 Moreover, a cytogenetic map was also published although the number of BACs did not still cover all chromosomes14,16. This new high-density SNP map (Fig. 3) thus represents a key step forward for future genomic studies and QTL identification with respect the current information available until now in this species.

Although hybrid assemblies using long and short sequences reads reduce genome fragmentation and increase the average scaffold sizes as observed in this study, most of de novo genome assemblies still do not reach chromosome-level with the expected number of chromosomes due to, among other factors, the repetitive fraction of the genome. To get around this limitation, information of genome-wide physical maps and dense genetic linkage maps can be integrated to assign chromosomal locations to sequence contigs52. This anchoring can also remove assembly artifacts and position misplaced scaffolds to increase the contiguity of the assembled scaffolds. In this study, the high-density SNP genetic map was used to anchor, sort and refine the assembled contigs. Overall, 89.9% of the genome assembly could be anchored to 21 pseudo-chromosomes and a total of 102 contigs were removed or split to separate positions in SseLGs. A similar strategy was followed in turbot using 31 families that allowed for the rearrangement of 20% of the genome assembly3. A comparison between male and female demonstrated a high co-linearity between our physical map and female scaffolds (only 5.53% mismatch). Although 10.1% of genome information remained as unplaced, the anchored physical map is essential for gene association analysis, synteny and cross-species studies and targeted genome resequencing. Further studies will be required to accurately anchor the remaining 61.9 Mb unanchored regions to their position in the genome.

It is well-known that the genome-wide RR differs between males and females (heterochiasmy) and that the recombination landscape also varies along chromosomes. In animals and plants, females tend to have higher RR than males, which in turn result in larger map lengths53,54,55. In our study, map was longer in the female than in the male (2,698.4 vs 2,036.6 cM; ratio 1.32). Assessment of sex-specific RR indicated a female-biased heterochiasmy across all SseLGs, with an average RR of 3.02 in male vs 4.51 cM/Mb in female. Four species of Pleuronectidae also exhibited wide heterochiasmy through all chromosomes similarly to sole with some intervals of male- and female-restricted meiotic recombination56. However, such differences in RR between males and females are not fully conserved in flatfish when map size is considered. Female maps are larger in turbot (1.36 times) and halibut (1.07 times)1,2,57, this is not the case of flounder or tongue sole with slightly larger maps in males (1.03–1.09 times)50,58,59. C. semilaevis is the only flatfish known with heteromorphic sex chromosomes (ZZ/ZW) that has been described in several mammals, birds and insects as a cause for an arrest of recombination in the heterogametic sex (XY males or ZW females). This could explain a shift in the direction of heterochiasmy53.

In addition to such differences in overall RR between sexes, the chromosomal recombination landscapes also differed between male and female according to typical patterns. In fish, it has been shown that recombination occurs at higher frequencies near telomeres in males while the distribution is quite more uniform or elevated near centromeres in females54. In stickleback fish, it has been demonstrated that centromeres and telomeres have little or no effect on recombination in females, however, in males, the recombination rates are suppressed near the centromeres and hence crossovers localize mainly at the ends of long arms in acrocentric chromosomes55. This feature seems to be conserved in sole since RR were also more frequent toward the end of males SseLGs compared to females (Fig. 5).

Heterochiasmy is considered a major force that guides the evolution of genetic sex determination systems and speciation56,60. Normally, genome regions with very low RR are associated with sex-determining regions in young sex chromosome systems and sex-linked traits such as pigmentation61. In Atlantic halibut, the sex determining gene gsdf is located in a region of chromosome 13 with restricted male and female RR56. In S. senegalensis, 30 significant sex-associated SNPs (66 if we consider the SNPs of separated families) were distributed throughout the SseLG18 with very low RR hot region (Fig. 6 and Supplementary Fig. S2 and S3). The shift and crossing between male and female RR suggest sex-specific restricted meiotic recombination events and that heterochiasmy might be involved in nascent sex chromosome system.

Most of SNP markers in the whole-population were heterozygous in males suggesting an XX/XY system. However, it should be noted high levels of incomplete penetrance in the families analysed (Supplementary Table S5). The fact that this proportion was even inverted in specific markers of F4 indicates a high effect of environmental factors on sex determination. The temperature seems to be a major factor that modifies sex ratios during larval development generating skewed populations of neomales and neofemales62,63. Familial sex ratios in sole were reported to oscillate from 16 up to 90% males supporting a high impact of environmental factors to modulate sex differentiation and sex population ratios19.

After analyzing the hot region in SseLG18, the fshr appeared as a putative candidate for sex determination. The fshr locus was recently associated with male sex in flatfhead grey mullet with an incomplete penetrance as observed in sole64. These authors proposed that fshr might act as a proxy for the genetic transduction of environmental factors such as temperature Under this hypothesis, sex determination would not rely on a single genetic cascade but a continuum of environmental and genetic factors. In sole, fshr was mainly expressed in testis65. The Fshr together with StAR are expressed in the steroidogenic Leydig cells and Fshr act as a promiscuous receptor that mediates the steroidogenic activity induced by both FSH and LH66,67. This double action supports a prolonged spermatogenesis and spermatid availability within the testis throughout the year mediated by FSH and the differentiation of spermatids into spermatozoa and subsequent spermiation mediated by LH66. Functional studies are needed to validate this putative candidate.

A synteny comparison of SseLGs with different flatfish genomes indicated that there was a one-to-one correspondence for 15 chromosomes, with some lineage-specific rearrangements (Fig. 8 and Supplementary Table S6). This correspondence between chromosomes was also confirmed when genome of turbot was compared with other flatfish demonstrating intrachromosomal rearrangements that shaped chromosome synteny and gene organization3. In our data, deviations from diagonal unlike in the comparison between male and female are indicative of this intense internal reorganization across species. The three SseLGs (SseLG1, SseLG2 and SseLG3) deserve special attention as they can provide an evolutionary framework to understand the history of chromosome fusions and fissions that shaped the karyotypes in flatfish. The SseLG1, predicted as a metacentric chromosome by the analysis of recombination frequency (Fig. 6), was previously identified by cross-species genomic comparison as the largest metacentric chromosome in Senegalese sole suggesting it may be a proto-sexual chromosome12,16. Our data support the hypothesis that this chromosome has primarily emerged by a lineage-specific Robertsonian fusion, since the homologs in other flatfish maintained their integrity across evolution (Supplementary Fig. S7). A complex series of events including small chromosomal translocations and rearrangements, fusions, and pericentric inversions would explain the current gene content and organization12. Unlike SseLG1, the SseLG2 and SseLG3 contain those chromosomes whose remodeling have shaped the karyotypes in flatfish from n = 24 in P. olivaceus to 22 S. maximus and 21 in S. senegalensis and C. semilaevis. A fusion model envisaged suggests a small number of chromosomes in the older lineage Paralichthyidae (9,14 and 16)10 that combined with other chromosomes in a lineage-specific way could explain the major rearrangement events that shaped the karyotype in this species.

In conclusion, this study reports a new genome assembly for a male sole and a high-density SNP genetic map with 15,511 high-quality markers distributed in 21 linkage groups. The physical map was anchored to the consensus genetic map to generate 21 pseudo-chromosomes, in agreement with the number of chromosomes in this species. The larger map in females was the result of higher RR with distinct recombination landscape between sexes. Recombination frequencies were used to assess the putative morphology of SseLGs that will have to be validated by cytogenetic studies. A GWAS analysis identified 30 sex-associated markers, all located in SseLG18. A low recombining hot region hosted the putative candidate gene fshr. In silico comparison with other Pleuronectiformes genomes demonstrated a high conservation of chromosome synteny, although with much intrachromosomal reorganization. Moreover, these changes in karyotype chromosome number were associated with lineage-specific Robertsonian fusions (i.e. SseLG1 in S. senegalensis) and several other rearrangements that involved mainly three chromosomes in the ancestral lineage. The consistent physical and genetic maps reported in Senegalese sole represent a valuable genomic resource for functional and genome-wide association studies, and the identification of genomic processes involved in speciation.