Admixture between old lineages facilitated contemporary ecological speciation in Lake Constance stickleback

Ecological speciation can sometimes rapidly generate reproductively isolated populations coexisting in sympatry, but the origin of genetic variation permitting this is rarely known. We previously explored the genomics of very recent ecological speciation into lake and stream ecotypes in stickleback from Lake Constance. Here, we reconstruct the origin of alleles underlying ecological speciation by combining demographic modelling on genome-wide single nucleotide polymorphisms, phenotypic data and mitochondrial sequence data in the wider European biogeographical context. We find that parallel differentiation between lake and stream ecotypes across replicate lake-stream ecotones resulted from recent secondary contact and admixture between old East and West European lineages. Unexpectedly, West European alleles that introgressed across the hybrid zone at the western end of the lake, were recruited to genomic islands of differentiation between ecotypes at the eastern end of the lake. Our results highlight an overlooked outcome of secondary contact: ecological speciation facilitated by admixture variation.


Supplementary Fig. 2 | Evidence for introgression of West European alleles into all Lake
Constance populations. D-statistics show significant excess allele sharing of all Lake Constance population with West European lineages from the Rhine (AGS1) and the Upper Rhone (FRS4) compared to the more closely related East European lineage from the Vistula (PLS1). Significance is marginal to absent for lake populations (L1, L2, ROM), but strong for stream populations (S1, S2, NID). Colors indicate the Lake Constance population (P2: L1, L2, NID, S1, ROM, S2), while three neighboring points in turn use three different outgroups, ninespine stickleback (Pungitius pungitius), Black-spotted stickleback (Gasterosteus wheatlandi) and Japan Sea stickleback (G. nipponicus). Error bars indicate ±3 standard deviations around D-estimates. Source data are provided as a Source Data file.

Supplementary Fig. 3 | Admixture evidence from additional lake populations and outgroups.
Colors indicate the stream population (S: NID, S2, S1), three points in a row correspond to the same lake population (L: L1, L2, ROM), while three neighboring points in turn correspond to three different outgroups, ninespine stickleback (Pungitius pungitius), Black-spotted stickleback (Gasterosteus wheatlandi) and Japan Sea stickleback (Gasterosteus nipponicus). Error bars indicate ±3 standard deviations around D-estimates. Source data are provided as a Source Data file.

Supplementary Fig. 4 | Maximum-likelihood clustering of Lake Constance stickleback and
European lineages (SbfI RAD-seq SNPs). The best number of clusters (K=2) as assessed from crossvalidation is highlighted with a blue box. The y-axis shows the estimated cluster membership of each individual. Source data are provided as a Source Data file.

Supplementary Fig. 5 | Bayesian clustering of Lake Constance stickleback and European lineages (microsatellites).
The best number of clusters (K=2) as assessed using the Evanno method (Evanno et al. 2005) is highlighted with a blue box. The y-axis shows the estimated cluster membership of each individual. Source data are provided as a Source Data file.

Supplementary Fig. 6 | Parameter estimates for model 3a.
Maximum-likelihood parameter estimates for model 3a including West (Wi: Rhine, green; Wo: Rhone, blue) and East European (E: Vistula, red) populations. The left panel shows population sizes (colored, white numbers), and time estimates for population splits (black numbers). The right panel shows block-bootstrap parameter confidence intervals for the best models, with population sizes (Ni) in units of 2Ne and time (Ti) in numbers of generations. Maximum likelihood parameter estimates are indicated in blue and black error bars show 95%-confidence intervals (95%-CI). Source data are provided as a Source Data file. Fig. 7 | Alternative demographic models tested. The first three models (3a1, 3b1, 3c1) introduce an additional parameter for the population size of the West and East European lineage ancestor, respectively, but did not considerably improve the likelihood ( Supplementary Fig. 8). The latter five models test for admixture between the three ancestral lineages. While these more complex models performed equally well to the simpler model 3a without any admixture ( Supplementary Fig. 8), the admixture proportions in all of these models were very close to zero (<0.5%) making them equivalent to model 3a, which all downstream modelling was based on (Fig. 2 b -d). Source data are provided as a Source Data file. Fig. 8 | Likelihood distributions of each tested demographic model. Violin plots show the distributions of -log10 likelihoods of 100 site-frequency spectra simulated under the maximum likelihood parameters inferred for each demographic model using the observed data. If likelihood distributions of two models overlap, this indicates that the two models fit the observed data similarly well. The best and similarly well-fitting models are both highlighted in Table 1. Blue horizontal bars indicate median likelihoods, black boxes delineate the 1 st and 3 rd quartile and error bars extend these by max. 1.5 times the interquartile range. Source data are provided as a Source Data file.

Supplementary Fig. 9 | Correlation of genome-wide differentiation (FST), diversity (π), divergence (dXY) and recombination rate (cM Mbp -1 ).
Genome-wide differentiation is weakly correlated between allopatric populations, but not associated with recombination rate, suggesting that background selection does not play a major role in shaping differentiation. In Lake Constance, only differentiation between lake and stream ecotypes is negatively associated with recombination rate, but not differentiation between allopatric streams, suggesting that divergent selection between habitats rather than background selection drives differentiation. Shown are distributions of Pearson's correlation coefficients between statistics, for allopatric European population comparisons (yellow: AGS1, FRS4, PLS1, SOR, CHA), Lake Constance lake vs. stream ecotype comparisons (dark blue: L1 vs. S1 / S2, ROM vs. NID / GRA / BOH), Lake Constance stream vs. stream population comparisons (light blue: L1, S2, NID, GRA, BOH) and for allopatric European populations vs. Lake Constance lake vs. stream ecotypes or stream vs. stream populations (pink, red). Only comparisons of non-overlapping population pairs, e.g. FST(AGS1,FRS4) vs. FST(PLS1,SOR) are shown for FST and dXY comparisons to avoid pseudoreplication. Statistics were computed in windows of >2,500 sequenced base pairs. Mean recombination rate per window was estimated at 10 regularly spaced positions in each window. Black box plots delineate the 1 st and 3 rd quartile, with whiskers extending these by max. 1.5 times the interquartile range. The white dot shows the median. Source data are provided as a Source Data file.  . Source data are provided as a Source Data file.

Supplementary Fig. 12 | Different levels of allele imbalance in two individuals of the NsiI dataset.
Allele imbalance is an indicator of PCR errors, contamination or other artefacts. PCR errors can have a strong effect on the site frequency spectrum, in particular singletons -we thus excluded individuals with strong allele imbalance (b) and only included individuals with good balance between major and minor alleles (a). Supplementary Table 2 | Fit of additional demographic models and datasets. log10 likelihood difference between observed and expected site-frequency spectra (ΔLL) and difference in Akaike information criterion (ΔAIC) between the best and all models for a given dataset are shown. Bestfitting models and models with very similar likelihood are marked with an asterisk. For all five population models, the three sister lineages were modeled as unsampled ('ghost': º) populations with parameter search ranges for these sister populations either constrained to 95% confidence intervals of model 3a or parameters fixed to the maximum likelihood estimates of model 3a (see Supplementary  Fig. 6). Source data are provided as a Source Data file.