Introduction

Disentangling neutral from selective forces acting on the process of population divergence is a key part of population genetics, as the two forces provide different information about the mechanisms shaping population structure (Wright 1978). While neutral markers provide data about population connectivity and demographic processes (Beaumont and Nichols 1996), markers under selection can reveal the existence of locally adapted populations or cryptic speciation processes in a meta-population (Gagnaire et al. 2015; Benestan et al. 2016). Access to high numbers of genetic markers obtained from high-throughput sequencing technologies has increased the power to detect genetic markers under selection. Recently, such genomic approaches have highlighted the importance of structural variants (SVs), which have often been associated with signals of selection in natural populations (see special issue: Wellenreuther et al. 2019). SVs can originate from changes in copy number (deletion, insertion and duplication), orientation (inversion) or position (translocation and fusion) of DNA sequences resulting in reduced levels of recombination in individuals that are heterozygous for different arrangements (Kirkpatrick 2010). Reduced recombination causes such chromosomal rearrangements to follow independent evolutionary pathways, often showing higher levels of population divergence than collinear regions of the genome (Navarro and Barton 2003a, 2003b; Farré et al. 2013). The SVs can harbor hundreds of genes and thus may have functional consequences for the organism (reviewed in Wellenreuther and Bernatchez 2018; Oomen et al. 2020).

SVs have been found to be important for promoting the evolution and maintenance of locally adapted populations in the face of gene flow. In a system of interconnected populations, gene flow results in the rapid homogenization of genetic variation and is the main evolutionary force acting against the process of divergence (Slatkin 1987). In the most extreme condition, when gene flow is too high, mutations that are favorable in a local environment are likely to recombine with maladapted genomic backgrounds, and can be lost through swamping effects (Lenormand 2002). However, if two (or more) locally advantageous mutations are located within the same SV, the absence of recombination with adjacent, maladapted genomic backgrounds increases their fitness locally. In comparison to independent genetic variants, mutations within SVs are less affected by the migration load and therefore more likely to maintain locally adapted alleles in a meta-population (Kirkpatrick and Barton 2006). The strength of population differentiation will depend on the balance between genetic drift, migration and selection, the effect of the latter being inflated if both locally adapted variation and genetic incompatibilities (decreased fitness in heterozygotes) are found at the same time in the population or even within the same SV (Bierne et al. 2011; Faria et al. 2019). Co-adaptation involving positive epistatic interactions between loci is also more likely to be maintained within a SV (Dobzhansky 1970; Feldman et al. 1996), resulting in a supergene that further increases the adaptive potential of SVs (Thompson and Jiggins 2014).

The age of the SVs is often older than the age of the contemporary populations in which they are studied, suggesting that their adaptive potential often relies on ancient polymorphisms (reviewed in Wellenreuther and Bernatchez 2018; Marques et al. 2019), representing a source of standing variation for population divergence and adaptation. For instance, ancient SVs have promoted the repeated evolution of ecotypes following the post-glacial recolonization of new environments, as described in systems undergoing parallel evolution (e.g., Jones et al. 2012; Nelson and Cresko 2018; Morales et al. 2019). It has been suggested that such adaptive genetic variation can be shaped during distinct time periods (i.e., “Evolution at two time-frames”, c.f. van Belleghem et al. 2018). The first period corresponding to the initial increase in the frequency of adaptive variants, can be facilitated by the presence of SVs (e.g., Barton and Kirkpatrick 2006) and/or a period of isolation (e.g., Feder et al. 2011; van Belleghem et al. 2018), while the second period, sometimes much later, corresponds to their association with environmental, physical and/or endogenous barriers to gene flow, observable in contemporary meta-populations (Bierne et al. 2011; van Belleghem et al. 2018; Morales et al. 2019). While the origin of many SVs often remains unclear, they sometimes result from adaptive introgression that has positive fitness effects for the introgressed species or population. For example, adaptive introgression of a major inversion is responsible for mimicry of the wing color pattern between poisonous Heliconius butterflies (The Heliconius Genome Consortium et al. 2012; Jay et al. 2018).

The European plaice (Pleuronectes platessa) is a marine flatfish with biological traits such as long pelagic egg and larval phases that promote high levels of gene flow (Harding et al. 1978; Rijnsdorp 1991; Waples and Gaggiotti 2006). Plaice is distributed in the eastern Atlantic from the Iberian Peninsula in the South to the Barents Sea and Greenland in the North. Previous studies based on six microsatellite loci have found weak, often statistically non-significant, levels of population structure across Europe, except across the bathymetric barrier between the continental shelf and off-shelf regions (Europe vs. Iceland and Faroe Islands, respectively), where depth apparently acts as a strong physical barrier (Hoarau et al. 2002; Was et al. 2010). Although the demographic history of these populations is still poorly known, the off-shelf population shows a higher genetic diversity than most of the shelf populations (Hoarau et al. 2002), which could suggest that the divergence between the shelf and off-shelf is ancient, potentially linked to different glacial lineages. The plaice has also large populations established in the western parts of the Baltic Sea, a brackish environment formed 8000 years ago (8 kya). In a recent study of the genomic basis underlying the colonization of the Baltic Sea, we identified two putative SVs (Le Moan et al. 2019; Johannesson et al. 2020) showing a strong signal of linked selection over large portions of two chromosomes. These two putative SVs were associated with 66% of the top 0.1% of the North Sea–Baltic Sea FST outlier loci and showed allelic frequency clines associated with the Baltic Sea salinity gradient (Le Moan et al. 2019). However, whether these putative SVs evolved in connection with a founder event and adaptation to a Baltic Sea environment or are ancient polymorphisms remained unresolved. The European plaice is known to hybridize with its sister species, the European flounder (Platichthys flesus) (Kijewska et al. 2009). The flounder is a euryhaline species better adapted to low salinity than the plaice, and can be found in the innermost parts of the Baltic Sea where plaice does not occur (Hemmer-Hansen et al. 2007a). Therefore, it is also possible that the European flounder could be a source of the SVs found in plaice.

In the present study, our overall objective was to increase our understanding of the origin of the putative SVs and their effects in contemporary populations of the European plaice. Specifically, the main goals of the study were to (i) test for a potential flounder origin of SVs, (ii) estimate the age of the SVs in plaice with a phylogenetic approach, (iii) re-assess the population structure in European plaice from northern Europe and from Iceland with the use of a population genomics approach, (iv) evaluate the contribution of SVs to population structure, and (v) provide relevant data to understand the extent to which selection is involved in maintaining the allelic clines observed at the SV. This work thus provides increased insight into the relative roles of demographic history, hybridization, environmental gradients and genomic structural changes in the evolution of marine species.

Materials and methods

Geographical sampling

European plaice samples were collected at seven sites distributed across Northern Europe and Iceland (Fig. 1a and Table 1) during the spawning season (Rijnsdorp 1991). Samples from the North Sea, Kattegat, the Belt Sea and the Baltic Sea were collected in 2016–2017. These samples were also analyzed by Le Moan et al. (2019) who studied the diversification process involved in the colonization of the Baltic Sea across multiple flatfish species, and where two large SVs were identified. To explore the spatial distribution of these SVs in greater detail, three additional northern sites were included in the current study. These samples were collected from the Barents Sea, Norway and Iceland in 2013. Most of the northern parts of the plaice distribution were covered with this sampling design. Analyses included 255 plaice in total, along with ten European flounders (the species hybridizing with plaice in the study area), and ten common dab (Limanda limanda), a closely related but reproductively isolated species from both European plaice and European flounder, to be used as an outgroup in a phylogenetic analysis.

Fig. 1: Population structure of European plaice in northern Europe.
figure 1

Sampling design (a) and principal component analyses performed on individual diversity for the European plaice based on a genomic dataset of 3019 SNPs in the overall dataset (b), 222 SNPs on chromosome 19 (c), and 210 SNPs on chromosome 21 (d). Colors correspond to sampling sites represented on the map in (a). Insets in (c) and (d) show DAPC results grouping individuals by haplogroup, where green dots are individuals from the heterozygote haplogroup and the yellow and blue dots are individuals from the homozygote haplogroups 1 and 3, respectively.

Table 1 Details of the sampling locations, with the corresponding site ID, sample sizes, longitude, latitude and the date when the samples were collected.

ddRAD libraries and sequencing

Whole genomic DNA was extracted from gill tissue using the DNeasy Blood Tissue kit (Qiagen). The DNA concentration was measured with the Broad Range protocol of Qubit version 2.0® following the instruction manual. DNA extractions were diluted to 20 ng/µl. Four double-digest RAD (ddRAD) libraries were constructed following Poland and Rife (2012), using Pst1- and Msp1-restriction enzymes with rare and frequent cutting sites, respectively. Each library was made by randomly pooling between 60 and 75 barcoded individuals from various locations. The libraries were size-selected on agarose gels in order to retain insert sizes between 350 and 450 bp. After an amplification step (12 cycles), the libraries were purified with AMPure® beads and their quality was checked on a Bioanalyzer 2100 using the High Sensitivity DNA protocol (Agilent Technologies). The targeted size selection was successful in most libraries, except in one that was slightly shifted toward a shorter insert size (from 300 to 400 bp). The effects of different library size ranges are discussed further below. Each library was pair-end sequenced on one Illumina HiSeq4000® lane (2 × 101 bp).

Bioinformatics

Raw sequences were processed using the “ref-map” pipeline from Stacks version 2.1 (Catchen et al. 2013). Specifically, the samples were demultiplexed with “process radtag” by removing reads with mean sequencing quality below 10 and reads with uncalled base pairs. On average, we obtained six million reads per sample (Fig. S1). The reads were trimmed to 85 bp using trimmomatic (Bolger et al. 2014), and aligned to the Japanese flounder (Paralichthys olivaceus) genome (Shao et al. 2017) using bwa-mem set with default parameters (Li and Durbin 2009). This reference genome is from a species of the same family of the European plaice, which has a relatively conserved genome structure (Robledo et al. 2017). On average, 65% of reads per sample mapped to the reference genome (Fig. S1). SNPs were called based on the mapping results using the “gstacks” function with the “marukilow” model, and alpha parameter of 0.05. Only biallelic SNPs genotyped in at least 80% of the individuals within each sampling site and with a maximum heterozygosity of 0.80 were called using the “population” function. All individuals with more than 10% missing data were removed (details in Table 1). Finally, SNPs with a significant departure from Hardy–Weinberg equilibrium (p value < 0.05) in more than 60% of the sampling sites, as well as singletons, were removed using vcftools (Danecek et al. 2011). The average coverage after filtering was 29× per sample (Fig. S1). Unfortunately, size selection was slightly shifted between the first three libraries (containing North Sea and Baltic Sea samples) and the last library (containing Barents Sea, Norway and Iceland samples). This shift resulted in a reduction of the genomic sampling when keeping only loci sequenced for all the sampling sites. Therefore, three datasets were constructed to take the differences in size selection into account: the “overall dataset” including all sampling sites (8587 RAD tags with 28,016 SNPs genotyped in 234 individuals), the “southern dataset” including the North Sea, Kattegat, Belt Sea and Baltic Sea (17,342 RAD tags with 56,740 SNPs genotyped in 166 individuals) and the “northern dataset” including Iceland, Norway and Barents (35,549 RAD tags with 92,706 SNPs genotyped in 68 individuals). These datasets were subsequently used for different analyses focusing on different aspects of population differentiation (see below), and further filtered to fulfill the requirements of the analyses performed (details Table S1).

Population structure and demographic history

The “overall dataset” was used to describe the overall population structure and was thinned by removing loci with minor-allele frequencies (MAF) below 5% and by keeping only one random SNP per bin of 1 kb to limit the effects from physical linkage but not the linkage disequilibrium (LD) within SVs (Fig. S2). In total, we kept 3019 SNPs from which individual genetic diversity was visualized using PCA analyses, conducted with the R package adegenet (Jombart 2008). The same package was used to compute population-specific heterozygosity. Pairwise genetic differentiation (FST) between samples was estimated following the method of Weir and Cockerham (1984) using the R package StAMPP (Pembleton et al. 2013). We used 1000 bootstraps over loci to evaluate if pairwise FST values were significantly different from 0. The effect of isolation-by-distance was assessed with a Mantel test from the R base package (Ihaka and Gentleman 1996) between pairwise FST and geographical distances among sampling sites using 9999 permutations. All analyses were conducted by including and excluding the two chromosomes carrying the SVs, as well as using the information from these two chromosomes only. Additionally, the correlation between genome-wide population structure and the structure displayed by the SVs was assessed by Mantel tests on the genome-wide FST matrix without the SVs and both FST matrices of the SVs only.

We used an approximation-of-diffusion approach, as implemented in the software moments (Jouganous et al. 2017), to examine the demographic history associated with the major population breaks that separate populations from the continental shelf (Norway) and Iceland (Hoarau et al. 2002) using the “northern” dataset. The analysis was replicated with and without the two chromosomes carrying SVs. For this analysis, only polymorphic sites with a minimum allele count of 2 in at least one of the two populations were kept, which were then filtered for LD (1 SNP per 1 kb). Four basic models of demographic history were used, representing the scenario of a strict isolation (SI), an isolation-with-migration (IM), an ancestral migration (AM), and a secondary contact (SC). To include more realistic evolutionary histories, each of these four basic scenarios were further developed in eight variants, which lead to a comparison of 32 scenarios in total (Fig. 2). The 28 more elaborated models were tuned to include fluctuation of the effective population size (representing bottlenecks and/or expansions), which can happen in the ancestral population (“af” = ancient fluctuation) and/or in the Icelandic population (“rf” = recent fluctuation, Momigliano et al. 2020). Then, each of these demographic models included the possibility of heterogeneous genomic signatures involving variation in effective population size across loci (model 2N) for the model of SI (SI, e.g., Cruickshank and Hahn 2014), or in migration rate (model 2m) for the models with gene flow (IM, SC and AM, e.g., Le Moan et al. 2016; Rougemont et al. 2017). These models were compared to the data using the folded version of the Joint Allelic Frequency Spectrum (JAFS) without considering singletons (-z option). We used the optimization routine developed for δaδi by Portik et al. (2017) and adapted for moments in Momigliano et al. (2020). This routine uses a four-step optimization procedure to infer the best parameter of each model. Each of the optimization steps were replicated 20, 20, 40, and 40 times, respectively, and the optimization routine was run 20 times for each model to control for convergence. We kept the 10 best runs for each model, and the best model was then selected based on the weighted Akaike Information Criterion (wAIC, e.g., Rougeux et al. 2017). The parameters inferred from the best model were subsequently transformed into meaningful biological numbers following Rougeux et al. (2017), using a mutation rate of 1 × 10−8 (Tine et al. 2014) and a generation time of 3.5 years (Erlandsson et al. 2017). Only the results of the best model and those with an AIC difference below 10 are shown (results for other models available in Supplementary File I).

Fig. 2: Schematic representation of the tested models.
figure 2

An ancestralpopulation splits into two derived populations of size N1 and N2 ata time (TS) in strict isolation (SI), or connected by continuousmigration (IM), by ancient migration events (AM) or by a secondarycontact phase (SC). In each scenario, the population evolvedwithout any demographic changes (a, basic), with recent demographicfluctuation in Iceland (b, rf), ancient demography fluctuation in theancestral population (c, af) or both ancient and recent fluctuation at thetime (d, af_rf). Each of these models were fitted with and withoutgenomic heterogeneity (heterogeneous effective “2N” size for SImodels, and heterogeneous migration “2m” for IM, AM, and SCmodels).

Genomic analyses of the structural variants

The two chromosomes (C19 and C21) carrying the putative SVs (SV19 and SV21, respectively), were extracted from the overall dataset (LD and MAF pruned) to construct two independent sub-datasets to examine population structure in the SVs alone, using a PCA approach (on 222 and 210 SNPs, respectively). Initial PCA plots showed clustering into three distinct groups (Fig. 1) and hence suggested that each SV behaves like a Mendelian character with two major divergent alleles at multiple linked loci, leading to three distinct “haplogroups” (two homozygotes and one heterozygote, e.g., Mérot et al. 2020). Consequently, we performed DAPC analyses with adegenet (Jombart and Ahmed 2011) set to three groups, to identify the haplogroup of each individual using the find.cluster function. The allele frequencies of the SVs for each population were calculated based on the DAPC clusters, using the formula

$$F \,=\, \frac{{2 \,\times\, C_1 \,+\, C_2}}{{2N}},$$

where C1 is the number of individuals assigned to one of the homozygote haplogroups (haplogroup 1 or 3), C2 the number of individuals assigned to the heterozygote haplogroup (haplogroup 2) in the DAPC, and N is the number of samples in the population. Then, we used the DAPC groups as genotype input to calculate the pairwise FST between populations at each SV using hierfstat (Goudet 2005).

We analyzed the subsets of the overall data (“northern” and “southern” datasets combined) without pruning for LD to increase the genomic coverage and better describe the genomic heterogeneity along the chromosomes carrying SVs. Nucleotide diversity (π) and HO were calculated per SNP for each haplogroup inferred in the DAPC. The genomic distribution of differentiation was examined using SNP-specific FST values. Specifically, this differentiation was calculated between proximal sites for Norway vs. Barents Sea, Norway vs. Iceland using the northern dataset and for North Sea vs. Baltic Sea, and between homozygotes for haplogroups 1 and 3, using the southern dataset. We used a quantile regression from the quantreg R package to calculate the variation of the upper 1% FST quantile and average FST along bins of 100 kb across the chromosome in the different pairwise comparisons. We focused on the upper quantile to limit the effects from variable levels of diversity across the SVs that would tend to depress average FST in certain regions along the SVs (Fig. S3). Finally, we estimated the overall LD using the southern dataset by calculating the pairwise correlation between loci localized on chromosomes 19 and 21. These statistics were computed using vcftools (Danecek et al. 2011) and the smoothed value of the statistics was plotted with ggplot2 using local regression with loess (Wickham and Winston 2008) with a span parameter of 0.3. In addition, we used LDheatmap (Shin et al. 2006) that integrates the information about the physical position of the SNP on the chromosome to represent the heatmap of LD between every pair of SNPs.

Gene content of the structural variants

In order to understand the genetic composition of the two SVs, we extracted the filtered RAD tags from the two SVs into two individual fasta files. The genomic ranges of the SVs were defined based on visual observation of the breaking-point positions of the FST values between homozygote haplogroup 1 and 3 individuals, starting after the first and ending before the last SNP with FST > 0.9 (from 1.4 to 9.9 Mbp for SV19 and from 10.5 to 20 Mbp for SV21). The RAD tags within the SVs were then aligned back to a previous version of the Japanese flounder genome that is annotated (Shao et al. 2017) using bwa-mem set with default parameters (Li and Durbin 2009). All the genes localized between the two most distantly aligned RAD tags on the genome were then extracted from the annotation file using custom bash scripts. Finally, the gene lists were mapped to the Gene Ontology (GO) resource website (http://geneontology.org/) to test for functional enrichment using Fisher’s exact tests to find GO terms that were over-represented in the gene list. The functional enrichment analyses were performed using both gene lists independently and with the gene lists combined using the zebrafish and the human databases.

Phylogenetic analyses

The ddRAD protocol can be used to identify orthologous sequences with restriction sites conserved across distantly related species. As such, this property was used in order to estimate the age of the SV polymorphisms by building a phylogeny of European plaice and two other species of the Pleuronectidae, the European flounder and the common dab. In order to obtain the sequences of the homozygote haplogroups and infer their divergence, we only retained haplogroups 1 and 3 plaice individuals (based on the DAPC) from the southern dataset. We focused on the southern dataset because it was the only dataset including haplogroup 3 for SV21 (Fig. 1d) and with the highest number of reads overlapping between the plaice and the two outgroup species.

Three independent phylogenies were constructed using concatenated ddRAD loci, one representing each SV from chromosomes 19 and 21 and one representing loci localized outside the SVs on chromosome 19. The loci from the SVs were extracted based on the genomic ranges defined above for gene-content analyses, while the loci representative of the genome-wide divergence were selected from outside the SVs to get a sequence length similar to that of the SVs (from 15 to 25 Mbp on chromosome 19). Chromosome 19 was used to infer phylogenies both within and outside the SV in order to estimate any effects that the SVs may have on collinear regions of the same chromosome. Only ddRAD loci with sequence information for both homozygote haplogroups in plaice and in flounder and/or dab were used for the phylogeny. The full sequence of each RAD locus was extracted into individual fasta files with the population function of Stacks (Catchen et al. 2013) using a “whitelist” (-w) option comprising the RAD-tag ID from the filtered southern dataset. For each RAD locus, one random RAD allele per individual was retained. All alleles were concatenated into one pseudo-sequence using a custom script. The three phylogenies were inferred based on orthologous sequences of 10125, 6152, and 10341 bp for chromosomes 19 and 21 and “collinear” (i.e., outside the SV on chromosome 19), respectively. All phylogenies were estimated in RaxML (Stamatakis 2014), under the GTR+GAMMA model with a random number set as seed. Finally, we tested for potential gene flow between species using the “f4” statistic from Treemix (Pickrell and Pritchard 2012), evaluating the mismatch between the tree topology inferred with RaxML and individual SNP topologies.

The length of the inferred branch between each cluster represents the number of substitutions occurring after the split of the species/haplogroups, and it is directly proportional to the time of divergence under neutral processes (Kimura 1983). We applied a strict molecular clock to transform this nucleotide divergence into time since divergence in years. Specifically, we divided the substitution rate along each branch by the average SNP mutation rate per site, and multiplied this value by the average generation time described for the demographic inferences. Although this approach is commonly used to age SVs, the divergence estimates should be interpreted with caution as cross-recombination events in the center of the SV would decrease the estimate of divergence and selection acting on the haplogroup (background and disruptive) would tend to increase the estimate of divergence.

Results

Population structure and demographic history

The first axis of the PCA based on all markers explained 1.45% of the total inertia and distinguished Icelandic plaice from all continental shelf individuals, and to a lower extent also identified separation of (marine) Atlantic samples from (brackish) Baltic Sea samples (Fig. 1b). The second axis explained 1.2% of the inertia that roughly traced the gradient from the North Sea to the Baltic Sea. Observed heterozygosity was maximal in the North Sea (HO = 0.193/0.190 with/without SVs) and decreased across the North Sea–Baltic Sea transition zone (Kattegat HO = 0.185/0.184, Belt Sea HO = 0.182/0.184, and Baltic Sea HO = 0.178/0.181) as well as towards the north (Norway HO = 0.180/0.180 and Barents Sea HO = 0.179/0.180). The HO of the Icelandic population was intermediate (HO = 0.184/0.185). All pairwise FST estimates were significantly different from zero, except for the two sites in the North Sea–Baltic Sea transition zone (Kattegat vs. Belt Sea, Table S2). Pairwise comparisons including the Iceland population were homogeneously valued around FST = 0.03 (Fig. 3, yellow dots), while all other pairwise FST estimates were lower (Fig. 3, red dots). The demographic modeling revealed that the most likely scenario for the origin of the separation between Iceland and the continental shelf was a scenario of past isolation followed by a secondary phase of gene flow, including ancient fluctuation in Ne and heterogeneous migration rate during the secondary phase of gene flow (SC_af_2M model, wAIC > 0.5 with and without the SVs—Figs. S4, S5, Table 2 and Supplementary File I). The estimate of the divergence time, including the isolation phase, was two times higher than the estimated time under the SC phase (Table 2). By using a generation time of 3.5 years and a mutation rate of 10−8 per generation, the time of split was estimated to ~55–58 kya (SC estimated to 27–28 kya). The results of the inferences were consistent with and without the chromosomes carrying the SVs (Table 2), suggesting that the demographic effects captured by our model were evident genome wide. The proportion of loci affected by reduced migration rate was nonetheless higher in the inferences performed with the SVs (1.47 vs. 1.05% with and without the SVs, Table 2).

Fig. 3: Relationships between geographic distance and genetic differentiation.
figure 3

The relationships were estimated with (a; r = 0.56) and without (b; r = 0.96) structural variants. The correlation analyses were performed using only the samples from the continental shelf (red dots), thus, comparison including the Icelandic sample (yellow dots) were excluded from the analyses.

Table 2 Summary table of the divergence history between the populations from Iceland and Norway inferred with (SV incl.) and without (SV excl.) the chromosomes carrying putative SVs.

In order to exclude the effects from ancient demography on inference about contemporary neutral structure, the Icelandic population was removed from the analyses of isolation-by-distance. The pairwise FST across the continental shelf was significantly correlated with geographical distance (Mantel test: r = 0.59, p > 0.01). Interestingly, removing the SVs from the analysis reduced the variation of the pairwise FST (Fig. 3, red dots) and resulted in a stronger correlation between genetic and geographic distances (r = 0.96, p > 0.01). However, this pattern of isolation-by-distance was not detected when only the chromosomes carrying SVs were analyzed (r = 0.06, p = 0.89 for C19 and r = 0.11, p = 0.35 for C21, Figs. S6, S7, Table S3). Consequently, the genome-wide pairwise FST (without the SVs) was not correlated with the pairwise FST calculated with the chromosomes carrying SVs (r = 0.08, p = 0.77 for C19, and r = 0.11, p = 0.64 for C21). Altogether, this result suggests that the population structure found genome-wide is different from the population structure displayed by the SVs. Interestingly, the pairwise FST values calculated for the two chromosomes with SVs were also not significantly correlated (r = −0.2, p = 0.38), which suggests unique geographical patterns of substructure for each SV.

Genomic variability and differentiation of the structural variants

A high dispersion of samples from the North Sea and Kattegat was observed on the second axis of the PCA (light blue and green samples in Fig. 1b). This dispersion mostly involved SNPs from chromosomes 19 and 21 carrying SVs (Fig. S8b). The PCA based on these chromosomes showed three clusters of samples, which were also inferred in the DAPC analyses (Fig. 1c, d). In both cases, the clusters were likely due to the presence of two divergent multilocus alleles segregating in the plaice populations for each SV, resulting in three haplogroups corresponding to two homozygotes (haplogroups 1 and 3 in blue and yellow, respectively) and one heterozygote cluster (haplogroup 2, in green).

Both SVs were polymorphic at most sampling sites (Fig. 4a, b). However, whereas SV19 allele frequencies were variable across most sampling sites (Fig. 4a), only the North Sea and Kattegat showed both SV21 alleles in high frequency (Figs. 4b,  S9), which confirmed the different geographical patterns for the two SVs also identified by a lack of correlation of pairwise FST estimates. The important variation in allele frequencies leads to high FST between populations that are geographically close (Table S4, illustrated Fig. 4c). This large differentiation was evident across nearly one-half of the chromosomes, from 1.4 to 9.9 Mbp for SV19 (Fig. 4c), and from 10.5 to 20 Mbp for SV21 (Fig. 4b). The genome-wide differentiation outside the SVs was lower (e.g., mean FST North Sea vs. Baltic Sea = 0.004, sd = 0.031) than inside the SVs (mean FST SV19 = 0.203, sd = 0.177 and mean FST SV21 = 0.167, sd = 0.142, Figs. 4bS10). The individuals from haplogroups 1 and 3 were differentially fixed for eight and 30 SNPs within SV19 and SV21, respectively (also represented by the black dashed line in Fig. 4b). Strong LD occurred along the entire SVs, confirming that recombination between the SV alleles is rare (Fig. 4d). Only haplogroup 1 in each SV (in yellow in Fig. 4) showed reduced genetic diversity, whereas haplogroup 3 (in blue in Fig. 4) had a higher genetic diversity than the average observed across the genome. As expected, the individuals from haplogroup 2 showed the highest diversity of the three DAPC groups. The two SVs showed higher overall levels of LD than the average genome-wide (0.2 vs. 0.07), but lower levels than the average LD within the SVs (>0.35, Fig. S11). Chromosome 21 showed two peaks of LD separated by a distance of 5 Mbp of limited LD, and the LD within the two regions was equivalent to the LD between these regions (Fig. 4d).

Fig. 4: Detailed analyses of structural variants.
figure 4

Sample-specific data for the European plaice structural variants SV19 (a, c, e, g, i) and SV21 (b, d, f, h, j): a, b sample-specific allele frequencies (yellow=allele found twice in haplogroup 1 and once in haplogroup 2, and blue=allele found twice in haplogroup 3 and once in haplogroup 2); c, d differentiation between different pairs of populations and between haplogroups 1 and 3 along the two chromosomes carrying the SVs, each dot is the FST value of an individual locus and the lines represent the smoothed upper 1% quantile (one color is one comparison, alpha=0.3); e, f smoothed average π for the three haplogroups identified in the DAPC (span=0.3); g, h variation of LD along the chromosomes and i, j LD heatmaps for pairwise SNP comparisons.

Gene content and functional enrichment within the structural variants

We identified 200 and 256 genes located within SV19 and SV21, respectively (Supplementary File II). Of these genes, 48/66 in SV19/SV21 had a known function in zebrafish, and 145/191 in humans. For SV21, we detected functional enrichment linked to physiological or morphological membrane disruption (>100% enrichment, p = 3.1 × 10−10), immune and defense response to bacteria (16%, p = 1.6 × 10−05) and chromosome/chromatin organization (10%, p = 6.5 × 10−6) using the zebrafish database, and for purinergic receptor (40%, p = 4.52 × 10−7) using the human database. However, no functional enrichment was identified for genes in SV19. By pooling both gene lists, only the gene enrichment for SV21 was detected without additional function using the zebrafish database. However, using the human database, we identified functional enrichment for pre-B-cell allelic exclusion (62%, p = 7.5 × 10−5), which are genes involved in the immune system. Moreover, when looking specifically for known candidate genes involved in the local adaptation in other marine fishes (e.g., Hemmer-Hansen et al. 2014), we found the presence of two genes linked to heat-shock proteins (heat-shock factor-binding proteins 1 and 4) within SV19.

Phylogenetic analyses

The common dab was the most divergent species in all phylogenetic trees, with 0.024–0.025 substitutions per site compared with European flounder and European plaice (Fig. 5), which corresponds to time of split at 9 million years ago (Mya). All plaice individuals were equally distant from flounder individuals based on the concatenated ddRAD loci representative of the genome-wide divergence, with an average of 0.0110 substitutions per site (4 Mya, Fig. 5). However, the two haplogroups were clearly divergent in both SV phylogenies, with an average distance of 0.0020 and 0.0017 substitutions per site for SV19 and SV21 (Fig. 5), respectively. In each case, the deepest branches were observed for haplogroup 1 (yellow), which leads to a different estimate of divergence for each branch, around 460 (504–434) and 200 (224–179) kya for haplogroups 1 and 3, respectively. The longest branch of the SVs resulted in an average plaice–flounder divergence slightly higher than outside the SVs (0.0120 and 0.0115 substitutions per site, respectively).

Fig. 5: Phylogenies of European plaice, European flounder and common dab.
figure 5

Phylogenetic trees are represented for the ddRAD loci within SV19 (a), SV21 (b), and outside the SVs (c). The lengths of the branches reflect the numbers of substitutions per site (multiplied by 100 in the figures).

We found a low, but statistically significant, effect of introgression between the plaice and the flounder (f4 = 0.004, p value < 0.05). However, this signal was mostly carried by three loci from SV19 and one from SV21 that showed a departure with the tree phylogeny of the SVs. In all cases, the flounder and haplogroup 1 of the plaice were nearly fixed for the same allele, which was different from the major allele observed in the common dab and within haplogroup 3 of the plaice. Removing these SNPs leads to f4 statistic values not significantly different from 0 (i.e., indicating no effect of introgression).

Discussion

The two SVs previously identified in Le Moan et al. (2019) along the North Sea–Baltic Sea transition zone were polymorphic across most of the north-eastern Atlantic distribution range of European plaice (Fig. 4a, b). Our analyses confirm the clear isolation of the Icelandic population and identify this population as likely originating from a different glacial refugium than the continental shelf populations. This knowledge has important implications for our understanding of the origin and evolution of SVs in the species. We have also confirmed weak but significant genetic structure among continental shelf samples. Removing the SVs from the analyses leads to a stronger pattern of isolation-by-distance among European plaice samples (from r = 0.56 to r = 0.96), which was not detected in previous European plaice studies based on six microsatellite markers (Hoarau et al. 2002; Was et al. 2010). Thus, the results highlight the power of increasing genomic resolution to detect subtle population structure for species with high gene flow. Below, we discuss the evolution and effects of SVs in European plaice in the context of population history and structure of the species in the northern Atlantic.

Origin of structural variants in European plaice

Type of the structural variants

The two SVs covered nearly half of chromosomes 19 and 21 of the Japanese flounder genome, where a strong LD was maintained over 9 Mbp. These large linkage blocks are expected with chromosomal rearrangements, such as inversions and translocations, which can formally be distinguished by use of a linkage map or genome sequencing (e.g., Faria et al. 2019), but not with the reduced representation approach used in this study. However, assuming a high degree of synteny between European plaice and Japanese flounder genomes, the size of the LD blocks and their central position are consistent with the presence of at least two major inversions in the genome (Kirkpatrick 2010). The observed second peak of LD on chromosome 21 could be due to the presence of a third and smaller inversion. However, the similar value of LD within and between the two LD blocks on this chromosome suggests that they may be part of the same inversion, and that a lack of synteny between the plaice and the Japanese flounder reference genome results in two distant peaks of LD on chromosome 21.

Age of the structural variants

The low genetic diversity of haplogroup 1 (yellow, Fig. 4c) and their long branches in the phylogenies (Fig. 5) suggest that they are the derived form of the SVs. The derived alleles of both SVs were found in both Iceland and continental shelf samples, presumably representing different glacial refugia, which were estimated to have diverged ~57 kya. Moreover, SV19 was polymorphic in both lineages, which could suggest that the SV polymorphism has been present as standing variation in European plaice since at least the time of divergence between the two glacial lineages. This hypothesis is also in agreement with the deep haplogroup divergence observed in both SV phylogenies. For the shortest branch, the split of the main haplotypes was estimated to 200 kya (470 kya for the longest branch). Although these split dates should be interpreted with caution since they do not take effects from selection or recombination between haplogroups into account, the age of the alleles and the fact that the derived alleles of both SVs are present in both glacial lineages, suggest that the SVs are older than the population in which they currently segregate.

Introgression of the structural variants

Adaptive introgression from sister taxa can sometimes explain the presence of ancient “supergenes” adapted to specific environmental conditions, such as those observed in the Heliconius butterflies (The Heliconius Genome Consortium et al. 2012; Jay et al. 2018). In order to test this hypothesis in the European plaice, we used the European flounder, a euryhaline species adapted to low salinity and known to hybridize with the European plaice, as a candidate for the source of the SVs. Under this hypothesis, the introgressed haplogroup in plaice should be less divergent from flounder than the rest of the genome on average. However, we found the opposite pattern, with each haplogroup being more than or as divergent from the European flounder as the plaice–flounder divergence inferred from outside the SVs. Thus, a potential flounder origin of SVs was rejected by the phylogeny. Introgression was also rejected by the f4 statistics showing limited evidence of mismatches between the phylogenetic tree and individual SNP topologies. More data are necessary to assess if the few mismatches that were observed are due to a random process of allele sorting or represent a case of more ancient introgression, pre-dating the formation of the SVs. Hence, our data suggest that the SVs originated after the split of European flounder and European plaice (estimated at 9 Mya from the phylogeny, Fig. 5). However, other potential introgression sources, involving species more closely related to plaice than flounder or ghost species, cannot be ruled out. Further analyses should thus be performed to fully understand the origin of the reported SVs in European plaice.

The population structure in European plaice and the contribution of the SVs

Population structure in European plaice

As mentioned above, we have confirmed the isolation of the Icelandic population, which has also been described in previous work (Hoarau et al. 2002). It has been hypothesized that this differentiation and the genetic diversity data of the Icelandic population (lower number of microsatellite alleles) could reflect the effects from the isolation on the edge of the distribution area and/or a recent bottleneck in the population (Hoarau et al. 2002). Our analyses provide additional information about the history of the Icelandic population, as the demographic analyses performed here suggest that the Icelandic population represents an old population, established from a different glacial refuge that diverged ~57 kya from the other populations of European plaice sampled in this study. This scenario would also explain why the observed heterozygosity is higher in the Icelandic population than in other populations located in other distributional edges of the species (Barents Sea or Baltic Sea), as observed with both the present genomic data and microsatellite data in previous work (Hoarau et al. 2002). In fact, Iceland itself may have been the glacial refuge where a relatively high diversity has been preserved (Maggs et al. 2008). Nevertheless, the physical barrier represented by deep oceanic regions may still be an important factor for maintaining the genetic differences that have evolved during the last glacial maximum, as suggested by Hoarau et al. (2002).

The reductions of diversity found from the North Sea to the Baltic Sea and from the North Sea to the Barents Sea were also reported by the previous studies with 6 microsatellite loci (Hoarau et al. 2002; Was et al. 2010). However, the numbers of markers used at the time were not sufficient to reliably detect any population differences within this geographical region. The populations studied here were sampled along the continental shelf coast lines, resembling a stepping-stone model of isolation, which represents an ideal condition to lead to patterns of isolation-by-distance (Kimura and Weiss 1964). However, other processes may also be involved in maintaining this pattern (Jenkins et al. 2018), such as the effect of living on the edge of the species distribution range after a post-glacial recolonization (Hewitt 2000). Similar patterns have been reported for various species in association with the salinity gradient of the Baltic Sea (Johannesson and André 2006, Cuveliers et al. 2012), and along the South–North coast of Norway in taxa with lower dispersal capacities than European plaice (Hoarau et al. 2007; Morvezen et al. 2016).

Implication of structural variants for population structure of European plaice

In contrast to genome-wide patterns of weak population structure, the SVs were responsible for population differences much higher (mean FST on haplogroup frequencies overall pairwise comparisons for SV19 = 0.23 and SV21 = 0.12) than the average genome-wide FST (estimated at 0.01). Haplogroup 1, identified as the derived allele of the SVs (with the lowest diversity and the highest divergence), reached near-fixation along the environmental gradient of the Baltic Sea in both SVs (fSV19 derived = 0.94 and fSV21 derived = 1.0). However, the same allele was found in high frequency towards the northern margin of the plaice distribution, in the Barents Sea (fSV19 derived = 0.52 and fSV21 derived = 0.97), which was the most distant site from the Baltic Sea. This variation in allele frequency resulted in a geographical structure of the SVs, which was different from the genome-wide population structure. The distribution edge of the plaice represents a common feature associated with the increase of the derived SV allele frequencies (Fig. 4a, b), potentially resulting in allele-surfing effects in the marginal populations (Excoffier and Ray 2008). Selection may also explain the increased frequencies of haplogroup 1 towards the distribution edge, supported by a net divergence that was 2.4 times higher than the net divergence of haplogroup 3 in both SV phylogenies. This result could be explained by the accumulation of deleterious mutations and background selection in the derived haplogroup (Duranton et al. 2018; Perrier and Charmantier 2018). However, the accumulation of deleterious mutations alone would make the—near-fixation of the derived allele observed in several populations by drift unlikely, especially in species with a large effective size (Ohta 1973), like plaice.

Evidence for local adaptation

The derived allele of the SVs occurred under various habitat conditions, ranging from brackish to marine environments and along temperature and daylight/seasonal gradients within the Atlantic. It is possible that these associations are explained by selection along multiple gradients and on several genes within the SVs. The SNPs carried by the SVs were previously detected as strong candidates for selection and with significant association with the salinity gradient of the North Sea–Baltic Sea transition zone (Le Moan et al. 2019; Johannesson et al. 2020). In the present study, the highest allele-frequency differences at the SVs were found along this transition zone (Fig. 4a). Interestingly, SV19 carried two heat-shock protein genes, which have important functions for cellular stress response (Sørensen et al. 2003), and have already been identified as candidate loci for local adaptation of Baltic Sea populations in several other marine fishes (Hemmer-Hansen et al. 2007b in European flounder; Nielsen et al. 2009 in Atlantic cod; Limborg et al. 2012 in Atlantic herring). Moreover, several biological functions seem to be enriched within the SVs, notably linked to the immune system, which may also be candidates for local adaptation associated with multiple environmental gradients. Further work focusing on the functional consequences of these SVs will help to understand their effects on the biology of European plaice.

Structural variants promoting evolution at two time frames

“Evolution at two time-frames” has been used to coin the process where several ancient alleles that are locally adapted can be quickly reassembled during more recent colonization of similar environmental conditions. This was initially described by van Belleghem et al. (2018) to explain the rapid parallel evolution of post-glacial ecotypes of saltmarsh beetle, which have repetitively evolved after the last glacial maximum, but for which the divergence of most alleles under selection can be traced back to a singular origin occurring about 190 kya. Our results suggest that similar processes may be involved in shaping the population structure of the European plaice. Indeed, the derived allele frequencies of the SVs were strongly associated with the North Sea–Baltic Sea transition zone that was connected to the Atlantic Ocean 8 kya (Le Moan et al. 2019), but the divergence of the alleles was estimated to be at least 25 times older than the age of the Baltic Sea itself. Moreover, the two SVs have similar estimated divergence times and show similar diversity patterns, suggesting that they share an evolutionary history and potentially have a common origin. Although our analyses did not completely clarify this origin, our study highlights an additional mechanism promoting the rapid reassembling of adaptive variation, i.e., the presence of SVs that maintain adapted alleles together.

Conclusions and perspectives

The SVs in European plaice are among a few examples of large SVs involved in the maintenance of population structure in marine fishes (e.g., Atlantic cod: Kirubakaran et al. 2016; Capelin, Cayuela et al. 2020; Atlantic salmon: Lehnert et al. 2019; Atlantic herring, Pettersson et al. 2019; Atlantic silversides, Therkildsen et al. 2019; lingcod, Longo et al. 2020), but may be present in many other species (e.g., Australasian snapper, Catanach et al. 2019; lesser sandeel, Jimenez-Mena et al. 2019; sea horse, Riquet et al. 2019). Our data show the genomic heterogeneity of the European plaice population structure. While the differentiation outside the SVs was associated with demographic history and current connectivity patterns, the two SVs showed high variation in allele frequencies and were associated with more complex patterns of structure, which may be influenced by selection from standing genetic variation. Interestingly, two other fish species, the Atlantic cod and the Atlantic herring, show similar variation in SV frequencies, with one haplogroup increasing in frequency both in the Baltic Sea and the northern Atlantic (Barth et al. 2017, Kess et al. 2020 for cod and Pettersson et al. 2019 for herring, reviewed in Johannesson et al. 2020).

Additional experimental work focusing on the potential fitness effect of these SVs holds exciting perspectives for understanding their evolution and the role they may play in local adaptation and population structuring. Moreover, longer genomic sequencing reads, as provided by PacBio or nanopore technologies, could confirm if these SVs are chromosomal inversions. In addition, deeper sequencing coverage would allow an exploration of evolutionary signatures within the SV alleles that evolved in different geographical contexts. Such studies will provide an interesting framework to assess the evolutionary pathways involved in maintaining structure in this species where dispersal should normally limit population divergence.