Adaptive radiations represent some of the most remarkable explosions of diversification across the tree of life. However, the constraints to rapid diversification and how they are sometimes overcome, particularly the relative roles of genetic architecture and hybridization, remain unclear. Here, we address these questions in the Alpine whitefish radiation, using a whole-genome dataset that includes multiple individuals of each of the 22 species belonging to six ecologically distinct ecomorph classes across several lake-systems. We reveal that repeated ecological and morphological diversification along a common environmental axis is associated with both genome-wide allele frequency shifts and a specific, larger effect, locus, associated with the gene edar. Additionally, we highlight the possible role of introgression between species from different lake-systems in facilitating the evolution and persistence of species with unique trait combinations and ecology. These results highlight the importance of both genome architecture and secondary contact with hybridization in fuelling adaptive radiation.
Understanding the genetic basis of speciation and adaptive radiation without geographic isolation, and determining how and when such diversification is possible, is a key aim of evolutionary biology. Speciation with gene flow often occurs in the form of ecological speciation. During this process, reproductive isolation results from divergent ecological selection, or ecologically mediated, sexual selection1,2. Despite the supposed prevalence of ecological speciation in adaptive radiation, the factors that constrain or facilitate speciation and the mechanisms by which speciation proceeds during the adaptive radiation of lineages are still not well understood3,4. Studying the identity and genomic distribution of loci involved in ecological speciation, particularly in cases where similar ecomorphological contrasts have arisen multiple times in parallel, is one way to address these questions.
Using such approaches, studies have already highlighted the prevalence of either few strongly differentiated genomic ‘islands’ of differentiation5,6,7,8,9,10 or genome-wide polygenic architectures of phenotypic differentiation and ecological speciation11,12,13,14,15. Both of these architectures on their own constrain speciation with gene flow in different ways. In the former scenario, reproductive isolation may be unlikely to evolve because the chance that loci under divergent selection will be linked to a trait that causes reproductive isolation is slim, and a genome-wide correlated response to divergent selection is lacking16,17. Polymorphism may therefore be a more likely outcome than speciation. In the polygenic scenario, the strength of per-locus divergent selection is likely to be small and insufficient to overcome the homogenising effects of gene flow18. Combinations of a larger number of genome-wide small-effect loci and some large-effect loci, may therefore be most conducive for overcoming constraints to speciation in the face of gene flow that result from either one of these architectures alone4,7,19.
In addition to specific genetic architectures, empirical20,21,22,23,24, experimental25,26 and theoretical27,28 investigations have implicated introgression between non-sister species as a process that may also facilitate diversification and adaptive radiation. Since introgression generates novel combinations of haplotypes, combining those from the distinct parental species, it may be possible that hybrid populations are able to span fitness valleys and, in turn, occupy ecological niches that would otherwise be inaccessible via stepwise adaptation27. However, few studies have been able to link ecological novelty with empirical signatures of introgression in the wild (but see refs. 22,29).
The Alpine whitefish radiation contains over 30 species of the genus Coregonus, which have evolved in small species flocks across multiple lake systems in the last 10–15 thousand years30,31,32,33,34,35,36. Although whitefish have speciated in many postglacial lakes across the Northern hemisphere, species flocks in pre-Alpine lakes are particularly diverse, and up to six whitefish species, with different ecological strategies, exist in sympatry and exhibit considerable phenotypic variation, including body size, spawning depth and season, gill-raker count and length and diet (Fig. 1a and refs. 30,31,32,33,36). Across multiple lake systems, species from different species flocks have evolved similar ecological strategies and phenotypes and as such have been categorised into ecomorphs37 based on these similarities. Interestingly, a number of traits are correlated across the Alpine whitefish radiation, particularly amongst widespread ecomorphs that have evolved in most species flocks. For example, deeper spawning species tend to have higher gill-raker counts and smaller bodies than shallower spawning species (Fig. 1b, Supplementary Fig. 1 and ref. 38). However, in addition to these widespread ecomorph trends, there are a number of less-widespread ecomorphs that have evolved just in one or few lakes and exhibit different trait combinations, decoupled from this trend (for example, deep spawning, small bodied, species with few gill-rakers). The fact that sympatric whitefish species flocks are thought to have evolved independently within each lake system35 provides the opportunity to identify overarching genomic features that may have facilitated rapid diversification, including the rapid and repeated evolution of similar ecomorphs, and the origin and persistence of species with new trait combinations.
Here, we build on past work on adaptive radiations39,40 to investigate the genetic basis of diversification, and the ways in which, in the absence of geographical isolation, constraints to speciation may have been overcome, within the Alpine whitefish radiation. We compiled whole-genome sequences for 99 whitefish individuals, spanning 22 species belonging to six distinct ecomorphs, from five pre-Alpine lake systems (putative species flocks), and four outgroup species (Fig. 1a, b, Supplementary Fig. 1 and Supplementary Data 1). We show that phenotypic diversification along water depth gradients, which is independently repeated across five lake systems41, is underpinned by a mixed genetic architecture that comprises both genome-wide differentiation and one locus with a larger effect on phenotype. In addition, our results suggest that secondary contact between species from different species flocks, followed by interspecific hybridization, may have helped overcome constraints to the evolution of additional niche specialists.
Phylogeny and population structure
To understand how the Alpine whitefish radiation evolved and how the relationships between sympatric species within flocks, and between ecologically similar species (belonging to the same ecomorph) in different flocks, are structured, we produced a genomic PCA (Fig. 1c) and constructed a phylogenetic tree (Fig. 1d). Our PCA and phylogeny confirm and expand upon results of earlier work35 in demonstrating that the Alpine whitefish radiation is monophyletic with respect to non-Alpine whitefish and European Cisco (Coregonus albula; not plotted), and, in general, each of the pre-Alpine lake systems sampled constitutes a reciprocally monophyletic species flock. Both the branching patterns in the phylogeny and the results of our clustering analysis (Fig. 1d, e; K = 7; Supplementary Figs. 2 and 3) are concordant with the independent evolution of sympatric species flocks within lakes or lake systems, and hence the parallel evolution of species with similar ecological strategies, i.e. ecomorphs in different lake systems. The one substantial deviation from this pattern of reciprocal monophyly amongst lake-system species flocks is the placement of C. acrinasus, which phylogenetically belongs to the Lake Constance clade despite being endemic to Lake Thun (discussed below; also noted in ref. 35; in addition to a number of individuals with putative hybrid signatures).
Parallel allele-frequency shifts underpin repeated ecological differentiation
Of the six whitefish ecomorph classes, the most widely distributed are the large, deep-bodied and macro-invertivorous ‘Balchen’, the smaller, shallower bodied, and zooplanktivorous ‘Albeli’, and the ‘Felchen’, which have intermediate characteristics between these two ecomorphs (across these three widespread ecomorphs, whitefish species exhibit correlated trait variation; Fig. 1b and Supplementary Fig. 1). Our phylogeny indicates that within each lake, two genetically distinct lineages typically emerged first, separating a ‘Balchen’ species from an ‘Albeli’ species or, if ‘Felchen’ species are present, from the common ancestor of ‘Albeli’ species and ‘Felchen’ species (with the exception of Lake Constance where no ‘Albeli’ species is present). These divergence events occurred separately within each lake system, suggesting that species belonging to these widespread ecomorphs evolved independently in different lake systems. To identify whether this parallel phenotypic differentiation was underpinned by parallel allele-frequency shifts we first investigated four sympatric pairs of ‘Balchen’ and ‘Albeli’ species from lakes Brienz, Lucerne, Walen and Neuchâtel. We subsetted our full dataset to include three ‘Balchen’ and three ‘Albeli’ individuals from each of these four lakes and calculated F4 statistics, which corroborated the results of our phylogenetic investigation by indicating that each sympatric species pair represents a single independently evolved species pair (as suggested by our phylogenetic tree). Topologies placing sympatric ‘Balchen’ and ‘Albeli’ species as sister taxa in a four-taxon tree had consistently lower F4 statistics, indicative of a more accurate topology, than topologies where the species of the same ecomorph from different lakes were sister taxa, supporting the idea that diversification occurred within each lake (Supplementary Fig. 4 and Supplementary Table 1). Then we calculated the cluster separation score (CSS) between the ecomorph groups (i.e., individuals of the four ‘Balchen’ species were grouped together and individuals of the four ‘Albeli’ species were grouped together42,43; Fig. 2a), allowing the detection of signals of parallel allele-frequency differences between ecomorphs. The resulting 1659 50 kb CSS outlier windows, which represented parallel allele frequency shifts between the ‘Balchen’ and ‘Albeli’ species from different lakes (identified by running a permutation test which shuffled the assignment of individuals to each ecomorph group whilst maintaining population structure and then identifying windows with an FDR corrected P-value of <0.01), were distributed genome-wide (Fig. 2a). These 1659 parallel-differentiated windows overlapped with 1702 genes in total, which were significantly enriched for a number of gene ontology terms including those related to neurons, cell signalling, and fatty acid metabolism (Supplementary Data 2 contains a full list of significantly enriched gene ontology terms; Supplementary Fig. 5 shows the length distribution of these genes compared to all annotated genes).
Genetic variation across CSS outlier regions not only differentiated ‘Balchen’ and ‘Albeli’ species from each other but also allowed the separation of species belonging to the four other whitefish ecomorphs within each lake system (Fig. 2b and Supplementary Fig. 6). We further show that genomic variation across these parallel-differentiated regions (captured by CSS PC1; Fig. 2b) correlated with body size (standard length; Fig. 2c; total R2 = 0.498, P = 8.06 × 10−15; see Supplementary Table 2 for lake-system-specific statistics) and gill-raker count (Fig. 2d and Supplementary Table 2), suggesting that in addition to explaining variation between ‘Balchen’ and ‘Albeli’ species, these genomic regions might contribute to broader phenotypic differences between other ecomorphs, including intermediate ‘Felchen’ species and to some degree the less-widespread ecomorphs, ‘large-pelagic’, ‘benthic-profundal’, and ‘pelagic-profundal’ (additionally, when the original 24 samples used for the CSS analysis were excluded prior to PCA construction these correlations between PC1 and gill-raker count, R2 = 0.114, P = 0.005667, and standard length R2 = 0.208, P = 1.183 × 10−4 were still significant; Supplementary Fig. 7; Supplementary Table 2). These results are concordant with a scenario of polygenic differentiation between sympatric species, with many loci affected by divergent selection and potentially associated with ecological and phenotypic differences and each contributing a small amount to a broader overall pattern of divergence.
Parallelism in gene functional pathways between independent ecomorph contrasts
In addition to patterns of genetic parallelism between species of the widespread ‘Balchen’ and ‘Albeli’ ecomorphs, we also investigated each of the four independently evolved ‘Balchen’ and ‘Albeli’ species pairs separately, to identify whether, despite the presence of parallel allele-frequency shifts, the most strongly differentiated genomic regions between ecomorphs are species-pair-specific or shared among replicate pairs from different lakes. Species-pair-specific patterns of strong differentiation may be indicative of subtle differences in selection regimes between lakes and hint at the degree to which genetic redundancy, where different genotypes can result in similar phenotypes, underpins parallel ecomorph differentiation. As such, we assessed whether genomic differentiation between each independently evolved ‘Balchen’ and ‘Albeli’ species pair involved the same set of alleles, genes, or gene pathways, hinting at the commonality of ecomorph evolution across lake systems. To understand the genome-wide landscape of differentiation across the four independent ‘Balchen’ and ‘Albeli’ species pairs we first carried out separate pairwise FST scans in 50 kb windows (each with >10 SNPs) for each sympatric species pair (resulting in ~34,000 windows for each species pair; Supplementary Fig. 8). This window-based approach averaging FST estimates based on only 12 alleles across multiple loci may result in some observed frequency differences arising from sampling, limiting us to the detection of strong selection and near fixation regimes44,45 but allows us to explore the degree of genomic redundancy across scales. The most differentiated regions of the genome between sympatric ‘Balchen’ and ‘Albeli’ species (outlier windows within the top FST percentile for a given species pair) have a genome-wide distribution (with mean genome-wide background FST across the four species pairs ranging from 0.06 in Neuchâtel to 0.12 in Brienz; Supplementary Fig. 8), and are species-pair-specific, with no outlier windows shared across all four lakes (6 outlier windows were shared between three contrasts, and 63 shared between two; in keeping with findings from North American whitefish ecomorph contrasts where observed genetic differentiation is not parallel across all lakes46). These species-pair-specific patterns were also reflected at the gene level (i.e., regardless of window boundaries), where, out of 1130 genes that overlapped with FST outlier windows in at least one of the four sympatric ‘Balchen’ and ‘Albeli’ species contrasts (out of the 42,695 genes that sit on scaffolds that were annotated in the reference genome), none overlapped with an outlier window in all four lakes (Supplementary Table 3).
The lack of overlap in genes associated with outlier windows across the four species pairs may also suggest that genetic redundancy is at play. To test whether genetic redundancy may help explain species-pair-specific differentiation patterns, we investigated whether the same set of four species pairs exhibit parallelism at the functional level rather than at the gene level by comparing gene orthology terms and pathways associated with each gene that overlapped with FST outlier windows between sympatric ‘Balchen’ and ‘Albeli’ species. For the 1130 genes overlapping FST outlier windows, we identified 660 KEGG orthology terms, of which two were associated with outlier windows in the species pairs of all four lakes (Supplementary Table 3). For both orthology terms (K07526 and K12959), we found that in Lake Neuchâtel one paralog overlapped an outlier window on chromosome WFS12, and in the remaining three lakes the other paralog overlapped an outlier window located on chromosome WFS10 (K07526 is also associated with an additional gene overlapping with another outlier window on chromosome WFS19 in Lake Lucerne). For K07526, both paralogous genes, despite being located on different chromosomes, had BLAST hits to different isoforms of the protein SRGAP3 (SLIT-ROBO Rho GTPase-activating protein 3; Supplementary Data 3). Similarly, for K12959 both genes hit to caveolin and caveolin-like proteins in other salmonids (Supplementary Data 3). WFS12 and WFS10 are homeologous chromosomes47, supporting the idea that genomic redundancy, in this case across homeologous chromosomes, may be involved in ecomorph differentiation. This finding furthermore supports the idea that the ancient salmonid-specific whole-genome duplication facilitated diversification by increasing the number of possible adaptive combinations of alleles48. In addition, around one-third (111/315) of the KEGG pathways that the 660 KEGG orthology terms belonged to were associated with outlier windows in all four independent species pairs (Supplementary Table 3). This shared differentiation at the metabolic pathway level, across independent speciation events with similar phenotypic outcomes, without parallelism at the gene level, suggests the role of genetic redundancy. As such, parallel ecomorphological divergence across the radiation may be underpinned by a polygenic adaptive architecture featuring redundancy, as reflected by the many parallel frequency shifts detected (using CSS), the lack of widely shared regions of strong differentiation (as indicated by FST), and the evidence for genetic redundancy at the gene pathway level49.
Large-effect loci underpin a key ecological trait
We also identify the genetic basis of variation in gill-raker count in whitefish, a key ecological trait that often differs between species occupying different niches because of its role in determining feeding efficiency on different prey items, i.e., trait utility50,51. Fish with fewer gill-rakers feed most efficiently on benthic macroinvertebrates52 whilst fish with many gill-rakers feed most efficiently on zooplankton50. We tested associations between gill-raker counts and SNPs (those polymorphic within the Alpine whitefish radiation). Using data from all 90 Alpine whitefish individuals with recorded gill-raker counts we identified a single significantly associated SNP on WFS23 (-log10(p) = 8.1; LD-considerate significance threshold −log10(P) = 7.96; Fig. 2f), that explains 31% of the variation observed in gill-raker counts and displays highly correlated allele frequencies with mean gill-raker counts across all ecomorphs and species (Fig. 2e and Supplementary Fig. 9). This candidate SNP fell within an annotated whitefish gene on WFS23, which, when aligned with other salmonid assemblies using BLAST, hit with high confidence against the edar gene (ectodysplasin-A receptor). This gene is known to be involved in gill-raker development in zebrafish, where edar knockouts exhibit a loss of gill-rakers53, and is in the same protein family as the gene eda, which is known to underpin a number of ecologically important features in other fish species, most notably plating in stickleback54. Using a similar approach, we also identified a number of significant sex-associated peaks, with the most significantly associated SNP (−log10(P) = 15.93), explaining 54% of the variation in sex across the radiation, located on WFS04 (Fig. 2g; see 'Methods' for more information).
Hybridization facilitates ecological diversification
Although species of the geographically widespread ‘Balchen’, ‘Felchen’ and ‘Albeli’ ecomorphs repeatedly diverge from one another along the common ecological axis of water depth with correlated phenotypic differentiation in several traits (including standard length and gill-raker count; Fig. 1b and Supplementary Fig. 1), likely the result of similar selection pressures along water depth gradients in different lakes, some lakes additionally harbour species of less-widespread ecomorphs, with distinctive ecological strategies. These include ‘large-pelagic’, ‘benthic-profundal’ and ‘pelagic-profundal’ species. These species have combinations of traits that contrast with the direction of correlation among traits seen in the widespread ecomorphs. For example, whereas species that spawn deeper typically have higher gill-raker counts, reflective of the transition from feeding on benthic macroinvertebrates to zooplankton, the ‘benthic-profundal’ C. profundus spawns very deep but has very few gill-rakers. Interestingly, our admixture analysis highlighted that a number of species that belong to these less-widespread ecomorphs, including two of the three ‘large-pelagic’ species, and both profundal species, show evidence of genetic admixture between species flocks from different lakes (Fig. 1e). To investigate these signals further, and determine whether secondary contact and introgression were associated with the evolution and maintenance of less-widespread ecomorphs with distinct trait combinations, explaining their heterogeneous distribution across the Alpine whitefish radiation, we calculated excess allele-sharing between species across our dataset. Excess allele-sharing was computed using the f-branch statistic fb(C), which was calculated from f4 admixture ratios, f(A, B; C, O), for all combinations of species (or clades in cases where sister species belong to the same ecomorph but are not reciprocally monophyletic) within and between lakes that fit the relationships ((A, B), C), according to our phylogeny (Fig. 1d).
When considering the three ‘large-pelagic’ species (C. wartmanni in Lakes Constance, C. acrinasus in Lake Thun, and C. suspensus in Lake Lucerne), the most striking significant introgression (indicated by a high, and significant, fb(C) value) reflects excess allele-sharing between Lake Constance and C. suspensus from Lake Lucerne, particularly with the Constance ‘large-pelagic’ species C. wartmanni (Fig. 3, black box). This result is concordant with our admixture analysis which indicated that C. suspensus indeed looks admixed between species of Lake Lucerne and Lake Constance. The Lucerne ‘large-pelagic’ C. suspensus also appears to have significant, but less substantial, excess allele-sharing with a number of other Lucerne species. Our results also suggest, as supported by our phylogeny and admixture analysis, that the ‘large-pelagic’ species in Lake Thun, C. acrinasus, is genetically admixed, with alleles from Lake Constance and Lake Thun (indicated by significant excess allele-sharing with all Brienz/Thun branches in our tree; Fig. 3). This also confirms, and clarifies, the results of other studies which suggested that the evolution of C. acrinasus involved the historical anthropogenic translocation of fish from Lake Constance into Lake Thun34,38. Despite this extensive gene flow in the recent past, C. acrinasus now appears to persist as a stabilised hybrid species, demonstrated by its monophyly in our phylogeny (Fig. 1d) and distinct placement in our PCA (Fig. 1c). Together, these patterns suggest that the ‘large-pelagic’ ecomorph may have originally evolved in Lake Constance, and that fish of this species from Lake Constance subsequently colonised, or were translocated to, other lake systems where hybridization with native species then occurred and hybrid species became established (as suggested by historical records for lakes Thun38 and Lucerne55).
Interestingly, a modest amount of excess allele-sharing was observed between the ‘benthic-profundal’ species from Lake Thun C. profundus and the ‘large-pelagic’ species C. acrinasus (from Lake Thun; likely the result of within-lake gene flow), and the ‘pelagic-profundal’ species C. nobilis from Lake Lucerne (Fig. 3, green box), despite the previously implied admixture with the Biel/Neuchâtel system (shown in Fig. 1e). However, more substantial signals of excess allele-sharing were observed between other non-profundal ecomorphs of Lakes Thun/Brienz and C. nobilis (Fig. 3, pink box). The strongest signals of excess allele-sharing with C. nobilis came from the ‘Albeli’ species C. albellus in lakes Thun and Brienz, and the ‘Felchen’ species from Lake Brienz. The ‘pelagic-profundal’ C. nobilis may therefore constitute a stabilised hybrid between other whitefish species from Lake Lucerne and some from Lakes Thun or Brienz. These system-wide f-branch statistics highlight that significant signals of excess allele-sharing between lake systems are less commonly associated with species of widespread ecomorphs which exhibit correlated trait divergence (although many of these species exhibit within-lake signals of introgression; Fig. 3 and Supplementary Fig. 10), but are prevalent when considering species of less-widespread ecomorphs, which have trait combinations that are discordant with these correlations.
Adaptive radiations provide a valuable opportunity to identify constraints to diversification and to disentangle the ways in which lineages may overcome some of these constraints. In this study, we addressed these outstanding questions using radiation-wide whole-genome sampling. We found that the genetic basis of rapid parallel evolution of widespread Alpine whitefish ecomorphs comprises both a locus of large effect, implicating the gene edar in underpinning gill-raker variation, and many allele-frequency shifts distributed across the length of the genome. We were also able to detect parallelism in gene pathways differentiating species of the ecologically contrasting and widespread ‘Balchen’ and ‘Albeli’ ecomorphs across different lake systems. Our data also suggest that the evolution and maintenance of less-widespread ecological strategies and unique trait combinations are often associated with introgression upon secondary contact between species of different species flocks.
Previous empirical and theoretical work had suggested that the genetic basis of differentiation of ecologically contrasting sympatric sister species can comprise few large-effect loci5,6,7,8,9,10 or many genome-wide small-effect loci11,12,13,14,15. However, mounting empirical evidence suggests that polygenic architectures with a combination of these two, a so-called ‘mixed’ genetic architecture comprising many small-effect loci and a few large-effect loci, may also be present14,19,56. Such ‘mixed’ architectures may provide the ideal substrate for rapid speciation in the absence of geographical isolation, since they may better facilitate the build-up of linkage disequilibrium in the face of gene flow than either very few key loci or highly polygenic architectures. This is because large-effect loci can act as ‘visible’ targets for selection, and additional genome-wide modifier loci increase the chances of the accumulation of reproductive isolation via linked selection18. Our data suggest that such an architecture, a combination of large and small-effect loci, indeed underpins variation among species in the Alpine whitefish radiation.
Our results also highlight the potential role of genetic redundancy in facilitating the repeated evolution of ecologically similar species within adaptive radiations. Genetic redundancy can act at many scales and describes the scenario in which various alleles both within and between genes, and even gene pathways, result in similar phenotypes49,57. Such genetic redundancy may help explain rapid and repeated instances of evolution, since subtly different environment-specific selection regimes acting on different regions of the genome can still drive parallel phenotypic change. It may be possible that the prevalence of duplicated genes (ohnologs) after whole-genome duplication, and the possible relaxation of selection acting on these ohnologs58, may facilitate both the de novo evolution of novel alleles (and thus phenotypes) and increase the likelihood that different populations can evolve and reach the same fitness optimum in a genetically non-parallel but redundant way. Whole-genome duplication is thought to have facilitated adaptation in a diverse array of clades (including plants59, fungi60 and animals61), and our observations that different ohnologs underpin differentiation between ecologically similar, independent, whitefish species pairs support the idea that the ancient salmonid-specific whole-genome duplication facilitated diversification by increasing the number of possible adaptive combinations of alleles48.
Whilst our data shows that highly replicated ecomorphological differentiation along similar ecological (water depth) gradients in different lake systems is underpinned by a mixed genetic architecture, hybridization upon secondary contact between species from different lake systems seems to facilitate the growth of species flocks through addition of species with trait combinations that are decoupled from those associated with speciation on depth gradients. Whilst a mixed genetic architecture promotes the rapid and repeated diversification of ecologically similar whitefish, there are likely constraints to the phenotypic divergence that can be achieved simply by the shuffling of existing alleles. As a result, the occupation of vacant niches may require new combinations of alleles that result in new, discordant, combinations of traits. Hybridization between distantly related species, e.g., non-sister whitefish species from different species flocks, results in the coming together of adaptive alleles or haplotypes which have each been tested by selection on their own, but have not previously existed in these combinations. This gene flow upon secondary contact between separate species flocks within a single large radiation may therefore provide a mechanism by which constraints to diversification may be overcome, allowing evolution into new niche space without having to persist through low-fitness intermediate states27,62. The specific genetic architecture of introgressed regions might also play a crucial role in determining the potential to overcome constraints, since large introgressed haplotypes can rapidly reach substantial frequencies following hybridization63. Our results suggest that both the genetic architecture of traits under divergent selection and the opportunity for secondary contact and hybridization between non-sister species are important for rapid adaptive radiation.
Sampling the radiation
To understand the phylogenetic relationships between Alpine whitefish, we carried out whole-genome resequencing on 96 previously collected whitefish (with associated phenotypic measurements including standard length and gill-raker counts; collected in accordance with permits issued by the cantons of Zurich (ZH128/15), Bern (BE68/15), and Lucerne (LU04/14); these fish were previously caught within the context of an assessment of the Swiss fish fauna, Projet Lac64, and documentation of endemic whitefish species in Switzerland31), in addition to three previously sequenced whitefish (discussed below). Fish were selected from lakes Constance, Lucerne, Thun, Brienz, Biel, Neuchâtel, Zurich, and Walen which make up five separate lake systems (Constance, Lucerne, Thun/Brienz, Biel/Neuchâtel, and Walen/Zurich; Fig. 1a and Supplementary Data 1). Individuals from each whitefish species within each lake, representing the phenotypic diversity of Swiss Alpine whitefish, were sampled, including three species from Lake Constance, six from Lake Lucerne, six from Lake Thun, and four from Lake Brienz, two from Lake Biel, two from Lake Neuchâtel, three in Lake Zurich, and two in Lake Walen. In addition to these Swiss whitefish a number of outgroup individuals were also sampled, including two Coregonus albula (European cisco), and a number of members of the European C. lavaretus species complex including two Norwegian C. lavaretus, and four samples of North German whitefish thought to be the closest relatives of the Alpine whitefish radiation members: two German C. holsatus (from Lake Drewitz) and two German C. maraenoides (from Lake Schaal).
The whitefish species we sampled spanned a range of six different ecomorphs that differ in their morphology, including body length, depth, and feeding morphology, as well as spawning depth and time, and diet (sampled species in each lake and the ecomorphs to which these species belong were plotted according to their distribution; Fig. 1a, b). Species in this study were assigned to each ecomorph based on their phenotype by whitefish taxonomic experts and co-authors Oliver M. Selz and Ole Seehausen. The ‘Balchen’ whitefish ecomorph is characterised by large-bodied shallow spawning species which predominantly feed on benthic macroinvertebrates. Conversely, the ‘Albeli’ ecomorph is characterised by small species which spawn deeper (intermediate depth to very deep) and feed on zooplankton in the pelagic zone of lakes. The third ecomorph is the ‘Felchen’ type, which grow to larger sizes than the ‘Albeli’ ecomorph but not as large as the ‘Balchen’, feed on zooplankton, and feed and spawn from an intermediate depth to very deep. In addition to these three widespread ecomorphs, three less-widespread ecomorphs occur in three or fewer lake systems. These include two variations of profundal ecomorphs, a ‘benthic-profundal’ species, C. profundus from Lake Thun (an additional, now extinct, ‘benthic-profundal’ species C. gutturosus was also once present in Lake Constance), which have few gill-rakers but spawn at intermediate to great depth and a ‘pelagic-profundal’ species, C. nobilis in Lake Lucerne, which spawn deep but have a high number of gill-rakers. The final ecomorph we sampled were the ‘large-pelagic’, and included the species C. wartmanni from Lake Constance, C. acrinasus from Lake Thun, and C. suspensus from Lake Lucerne, which, although they are large-bodied, have a high gill-raker count and feed predominantly on zooplankton. C. wartmanni has a well-described pelagic spawning behaviour, while the other two ‘large-pelagic’ species are so far less well characterised in that respect. A full breakdown of the fish included in this study, their gill-raker counts, standard-length measurements, and the ecomorph assignment of each species can be seen in Supplementary Data 1.
DNA for each individual was extracted from either fin or muscle tissue from each fish that had been stored at −80 °C using Qiagen DNeasy extraction columns, quantified using a Qubit 2.0, and run on a 1% agarose gel to assess DNA quality. DNA was then sequenced on the Illumina NovaSeq 6000 with a 550 bp insert size (Next Generation Sequencing Platform, University of Bern). To this data, we added Illumina HiSeq 3000 data sequenced from one Coregonus sp. “Balchen” (ENA accession: GCA_902810595.1; now re-classified as C. steinmanni31) from Lake Thun (Switzerland) that was previously used to polish and validate the Alpine whitefish reference genome assembly47.
Genotyping and loci filtering
After sequencing, all fastq files were quality checked using FastQC65 before being mapped to the Coregonus sp. “Balchen” Alpine whitefish reference genome (ENA accession: GCA_902810595.1; ref. 47; with additional un-scaffolded contigs (https://datadryad.org/stash/dataset/doi:10.5061/dryad.xd2547ddf;66) to ensure accurate mapping) using bwa-mem v.0.7.1767 changing the ‘r’ setting to 1 to allow more accurate, albeit more time-consuming, alignment. Mosdepth v.0.2.868 was used to calculate mean sequencing coverage from the BAM files for each of the 97 individuals which ranged from 15.32x to 41.69x (an additional two individuals were added to this dataset after genotype calling discussed below). Picard-tools (Version 2.20.2; http://broadinstitute.github.io/picard/) was then used to mark duplicate reads (MarkDuplicates), fix mate information, (FixMateInformation) and replace read groups (AddOrReplaceReadGroups). Genotypes were then called across the 40 chromosome-scale scaffolds included in the Coregonus sp. “Balchen” Alpine whitefish assembly (ENA accession: GCA_902810595.1; ref. 47) using HaplotypeCaller in GATK v.126.96.36.199 using a minimum mapping quality filter of 30. The resulting VCF file was then filtered using vcftools v.0.1.1470 to remove indels (–remove-indels) and include biallelic loci (–min-alleles 2 –max-alleles 2) which have a minor allele count > 3 (–mac 3), no missing data (–max-missing 1), a minimum depth > 3 (–min-meanDP 3 –minDP 3), a maximum depth < 50 (–max-meanDP 50 –maxDP 50), and a minimum quality of 30 (–minQ 30), to leave 16,926,710 SNPs. Loci that fell within potentially collapsed regions of the genome assembly (as identified in47) were removed using BEDTools v.2.28.0 (ref. 71; bedtools subtract) and any loci with duplicate IDs which were identified with PLINK v.1.9072 were removed with VCFtools70 resulting in 15,841,979 SNPs. To increase our sampling of the species C. macrophthalmus from Lake Constance from one individual to three, we added sequencing data from an additional two individuals (previously sequenced by Frei et al.73; Supplementary Data 1). To avoid the downstream impacts of combining sequencing data from different runs (which can result from different biased nucleotide calls and introduce erroneous signals of genetic differentiation; as outlined in ref. 74), we mapped these two samples as above (resulting in a mean genome-wide coverage of 9.32× and 16.58×) and called genotypes again for all samples (including the two additional C. macrophthalmus individuals) at each of the original 15,841,979 SNP positions. Following this genotype calling, which resulted in 15,521,925 SNPs, SNP filtering was repeated as before, leaving 14,313,952 SNPs with no missing data across the dataset of 99 individuals.
PCA, phylogenetics and admixture analysis
PLINK v.1.9072 was used to produce a genomic PCA of all 91 Alpine whitefish genomes with the aim of understanding how each of the individuals, species, and lakes were differentiated from one another. All eight outgroup individuals were removed from the full dataset of leaving only Alpine whitefish from the five lake systems. Loci were then filtered based on linkage disequilibrium using PLINK v.1.90 (ref. 72; 50 kb windows with a step size of 10 bp and filtering for an R2 > 0.1). This resulted in 1,133,255 loci which were processed by PLINK to calculate eigenvector distances between individuals. PCAs were plotted using R75.
We took a phylogenetic approach to understand the relationships between each of the Alpine whitefish species we sampled. First, the full VCF file was thinned to include only SNPs which were 500 bp apart using VCFtools (ref. 70;–thin 500). The thinned SNP dataset containing 2,039,744 SNPs was then filtered using bcftools (part of SAMtools v.1.876; bcftools view -i ‘COUNT(GT = “RR”) > 0 & COUNT(GT = “AA”) > 0’) to leave only SNPs that were present at least once in our dataset as homozygous for the reference allele, and homozygous for the alternative allele, as required by RAxML. This reduced the dataset to 1,692,559 SNPs (specifically compiled for RAxML and not used in any other analysis). This filtered VCF file was then converted to a PHYLIP file using vcf2phylip v.277 before RAxML v.8.2.1278 was run with the ASC_GTRGAMMA substitution model (-m ASC_GTRGAMMA–asc-corr=lewis, -k -f a) with 100 bootstraps and specifying the C. albula samples as outgroups to produce the maximum likelihood tree. The phylogenetic tree, excluding the long node to C. albula, was then plotted using Figtree v.1.4.479.
The same linkage-pruned dataset of 1,133,255 SNPs that was used to produce the full genomic PCA was used to calculate admixture proportions. The.bed file from PLINK resulting from the PCA was analysed using admixture v.1.3.080 to estimate admixture for values of K between 2 and 14, specifying 20 cross-validations (–cv=20). As the CV error increased with the range of K that we tested (Supplementary Fig. 2, we selected the K which helped to resolve the lake systems and deep clade splits best, K = 7, and plotted admixture barplots in R (additional admixture plots for K = 2-K = 10 can be found in Supplementary Fig. 3).
To identify the degree of genetic parallelism between ‘Balchen’ and ‘Albeli’ whitefish species from across the radiation, we subsetted 24 individuals representing three ‘Balchen’ species and three ‘Albeli’ species from four of the lakes we sampled: Lake Brienz, Lake Lucerne, Lake Walen and Lake Neuchâtel out of our full 99 individual dataset. ‘Albeli’ species included C. candidus, C. albellus, C. muelleri and C. heglingus (for lakes Neuchâtel, Brienz, Lucerne, and Walen), and ‘Balchen’ species included C. palea, C. alpinus, C. litoralis, and C. duplex (for lakes Neuchâtel, Brienz, Lucerne, and Walen). To first confirm the independent evolution of each ‘Balchen’ and ‘Albeli’ species pair within each of these four lakes, as indicated by the phylogeny, F4 statistics were calculated across a four-taxon tree (as used in ref. 81), allowing us to estimate the degree of correlated allele frequencies between ‘Balchen’ and ‘Albeli’ individuals within and between lake systems. First, loci were pruned based on linkage disequilibrium using the script ldPruning.sh (https://github.com/joanam/scripts/raw/master/ldPruning.sh), resulting in 1,315,105 SNPs. Then the script plink2treemix.py (from https://speciationgenomics.github.io/Treemix/) was used to convert data into the treemix format before F4 calculations were implemented using f4.py (https://raw.githubusercontent.com/mmatschiner/F4/master/f4.py). We calculated F4 for two different topologies, placing ‘Balchen’ and ‘Albeli’ species from all pairwise combinations of the four lakes on a four-taxon tree. In the first four-taxon tree ((A,B),(C,D)) we placed sympatric ‘Balchen’ and ‘Albeli’ species from a first lake as A and B, and ‘Balchen’ and ‘Albeli’ species from a second lake as C and D. In this context, the resulting F4 (F41) represents the correlated allele frequency between A or B and C or D that would indicate introgression, or in our case, representative of a single evolution of ‘Balchen’ and ‘Albeli’ followed by sorting into lakes. We then calculated F4 where allopatric ‘Balchen’ species from two different lakes were placed as A and B and allopatric ‘Albeli’ species from the same two lakes as C and D (F42). F4 in this second arrangement represents the correlated allele frequencies of sympatric species, again between A or B and C or D. Where F41 < F42 there is stronger support for the scenario in which ‘Balchen’ and ‘Albeli’ are truly sympatric species pairs, and therefore independently originated across lakes rather than for a single origin of the two ecomorphs.
To explore whether ‘Balchen’ and ‘Albeli’ species of whitefish show a parallel genetic basis of evolution in different lakes, regardless of lake structure, we used the cluster separation score (CSS; introduced in ref. 42 and the therein reported formula corrected in ref. 43), a measure of genomic differentiation between individuals assigned to two groups. Here we assigned individuals from the four ‘Balchen’ species to one group and those from the four ‘Albeli’ species to another. When calculated in windows of the genome, the CSS score quantifies the genetic distance between these ecomorph groups relative to the overall genetic variance in this particular window42. We calculated CSS in 50 kb windows using a custom R script (https://github.com/marqueda/PopGenCode/blob/master/CSSm.R) where the 24 whitefish individuals were split into two groups according to ecomorph (i.e., ‘Balchen’ or ‘Albeli’). A stratified permutation test which reshuffles the assignment of individuals to each of the ecomorph groups within each lake to test the statistical significance of the CSS score for each window, whilst maintaining population structure, was then carried out 100,000 times using a custom R script (https://github.com/marqueda/PopGenCode/blob/master/CSSm_permutation.R). Windows with fewer than 24 SNPs were removed (in accordance with ref. 43) and outlier windows were identified based on a false discovery rate adjusted P-value cut-off of P < 0.01, using ‘fdr.level = 0.01’ in the R package ‘qvalue’ (ref. 82; similarly to ref. 43). The median CSS score across all 34,102 windows with ≥24 SNPs was 0.0083 and the median CSS score across all 1659 outlier windows was 0.0973. A PCA was then produced for all 91 Alpine whitefish (excluding our outgroup samples) using PLINK v.1.90 starting with only the 690,101 SNPs that fell within these 1659 CSS outlier windows. Filtering for linkage disequilibrium was carried out as above, resulting in 56,127 SNPs that were then used to determine the genomic variation between whitefish species within these genomic regions. Correlations between PC1, which separated out species, and traits (gill-raker count and standard length) were carried out using the linear model function (lm) in R.
To confirm that this pattern is not simply driven only by the inclusion of individuals used to define the outlier CSS windows, we produced a second PCA as above but excluding the original 24 individuals prior to the calculation of the PCA (Supplementary Fig. 7). In this instance, CSS PC1 was still significantly correlated with standard length (R2 = 0.2081, P = 1.183 × 10−4) and gill-raker count (R2 = 0.1135, P = 5.667 × 10−3 when including the outlier C. profundus; R2 = 0.2201, P = 1.05 × 10−4 when excluding the outlier C. profundus), albeit, and unsurprisingly, to a lesser extent.
We also identified genes that were annotated on chromosome-scale scaffolds in the whitefish reference genome47 which overlapped with the 1659 outlier CSS outlier windows by a minimum of 1 bp using ‘bedtools intersect’71. And then used the topGO package v.2.46.083 in R to identify significantly enriched gene ontology terms (P-values < 0.05 according to both the ‘weight’ and ‘elim’ algorithms) associated with these outlier windows (Supplementary Data 2). A non-parametric Mann–Whitney U test showed that the 1702 genes that overlapped with our 1659 CSS outlier windows were significantly longer than non-overlapping genes (40,993 of the full 42,695 gene set; P-value = 2.775 × 10−8), however, the difference was small in absolute terms (means between the groups varied from 13,648 bp in outlier genes to 11,638 bp in non-outlier genes; Supplementary Fig. 5).
We then calculated pairwise genome-wide relative divergence between sympatric ‘Balchen’ and ‘Albeli’ species for each lake separately. Weir and Cockerham FST was calculated between ‘Balchen’ and ‘Albeli’ species in each lake after filtering out loci which had a minor allele count <1 between the two using vcftools v.0.1.14 (ref. 70;–weir-fst –mac 1) specifying a window size of 50 kb. Windows with fewer than 10 SNPs were removed. The mean FST of all windows along the genome was then calculated for each species pair to determine the total extent of differentiation between sympatric ‘Balchen’ and ‘Albeli’ species. To identify regions of the genome which underpin the phenotypic contrast between ecomorphs we identified the top percentile of most differentiated windows in each lake and species pair using R and those outlier windows which were shared between two or more species pairs were noted. Since this 1% FST value cut-off was used to identify outlier windows in each of the four independent FST scans, by chance we would expect 20.7234 windows to be shared across two lakes (0.012 × 34,539 windows × 6 combinations of two lakes), 0.1381 across three lakes (0.013 × 34,539 windows × 4 combinations of three lakes), and 0.0003 across four lakes (0.014 × 34,539 windows × 1 combination of 4 lakes). As with CSS outlier windows, genes that overlapped with the top 1% outlier windows from each of the four species pairs were identified using ‘bedtools intersect’71. KEGG orthology was identified for 28,673 of the 46,397 annotated genes in the whitefish Coregonus sp. “Balchen” assembly using BlastKOALA (https://www.kegg.jp/blastkoala/; using the taxon id 861768 and selecting the genus_eukaryotes database) and as a result, the genes and KEGG orthology terms that overlapped with each of the FST outlier windows, and genes overlapping with these windows, for each of the four species-pair comparisons were identified. For each species pair, the KEGG gene pathways that were associated with KEGG orthology terms associated with lake-specific FST outlier windows were also identified using the KEGG orthology database (https://www.kegg.jp/kegg/ko.html). The genes, KEGG orthology terms and KEGG gene pathways that were associated with each species-pair-specific set of FST outlier windows were then compared to identify any features that were associated with ‘Balchen’-‘Albeli’ differentiation across all lake systems. Full protein sequences for genes associated with the shared KEGG orthology terms K07526 (augustus_masked-PGA_scaffold11__203_contigs__length_63881516-processed-gene-394.0 and maker-PGA_scaffold9__196_contigs__length_60468309-snap-gene-345.2) and K12959 (maker-PGA_scaffold11__203_contigs__length_63881516-snap-gene-396.10 and maker-PGA_scaffold9__196_contigs__length_60468309-snap-gene-342.13) were BLASTed using blastp (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins) and the resulting best hits, those with the highest E-value and an annotated gene name in a salmonid species, were noted (Supplementary Data 3). As with CSS, a non-parametric Mann–Whitney U test showed that the 1130 genes that overlapped with FST outlier windows across the four comparisons were significantly longer than non-overlapping genes (41,565 of the full 42,695 gene set; P-value = 2.693 × 10−6). Again, the absolute difference between groups was small (means between the groups varied from 14,084 bp in outlier genes to 11,654 bp in non-outlier genes; Supplementary Fig. 5).
Genome-wide association mapping
To identify the genetic basis of gill-raker variation across the Alpine whitefish radiation, we used a mixed model approach implemented in EMMAX v.2012021084 (as in ref. 14). First, EMMAX was used to produce a Balding-Nichols kinship matrix between all 90 Alpine whitefish samples for which we had gill-raker counts using ‘emmax-kin’ using only the 9,120,498 SNPs that were polymorphic within the Alpine whitefish radiation. We then used EMMAX to calculate the association of each SNP marker with gill-raker count. Two significance thresholds were determined. A strict Bonferroni multiple-testing P-value threshold was calculated using the total number of SNPs tested: −log10(0.05/9120498) = 8.26, in addition to an LD-considerate threshold of −log10(0.05/4536915) = 7.96, which was calculated by removing linked markers (R2 > 0.95) in 50 kb sliding windows across the genome using PLINK72. One SNP on WFS23 had an association above the LD-considerate threshold and the allele frequencies within each of the six ecomorph groups was calculated for this SNP using vcftools –freq on each subset of ecomorphs separately (Fig. 2e; in addition to each ecomorph within each lake separately; Supplementary Fig. 9). The gene that overlapped with this SNP was identified with BEDTools71 and the full protein sequence from the gene that overlapped with the SNP (maker-PGA_scaffold22__199_contigs__length_52020451-snap-gene-302.9) was BLASTed using Ensembl TBLASTN against the Atlantic Salmon, Rainbow Trout, Brown Trout and Coho Salmon genomes, hitting with high confidence against the ectodysplasin-A receptor (edar) gene (E-value 1e-20; ID% 97.62 in Brown Trout fSalTru1.1; ENSSTUG00000036900 and E-value 7e-20; ID% 100 in Atlantic Salmon ICSASG_v2; ENSSSAG00000053655). The variance in gill-raker count across our samples explained by the most significantly associated SNP was calculated using the equation: PVE = ((2*(beta2)*MAF*(1-MAF))/(2*(beta2)*MAF*(1-MAF) + (se_beta2)*2*N*MAF*(1-MAF))) where N = the sample size (90), se_beta = the standard error of effect size of the SNP, beta = SNP effect size, and MAF = SNP minor allele frequency (from the Supplementary Information S1 associated with ref. 85).
This EMMAX association mapping was repeated using sex as a binary trait for 90 Alpine whitefish individuals. The most substantial associated peak was observed on WFS04. As above, genes that overlapped with these SNPs were identified with BEDTools71 and the protein sequence from the single gene that overlapped with this peak of SNPs on WFS04, maker-PGA_scaffold3__454_contigs__length_92224161-snap-gene-551.2, was BLASTed using Ensembl TBLASTN against the Atlantic Salmon, Rainbow Trout, Brown Trout and Coho Salmon genomes, however, no annotated genes were hit with high confidence using this approach.
To calculate excess allele-sharing across the dataset, and test whether species of the less-widespread ecomorphs with unique trait combinations (i.e., combinations of traits that contrast with the direction of correlation among combinations of traits seen in the widespread ecomorphs) have evolved as a result of gene flow between lake systems, we used the f-branch statistic fb(C) as calculated by the package Dsuite v.0.386 as in ref. 87. First, a simplified version of the full RAxML phylogenetic tree was prepared. To make use of the multiple samples per species in our dataset and get robust estimates of excess allele-sharing both within and between lake systems, collapsed nodes in the phylogenetic tree using the R package ‘ape’88 where possible. Individuals which looked like potential F1 hybrids as indicated by close to 50/50 splitting in the admixture analysis or were placed discordantly in our genome-wide PCA and phylogeny (including the C. alpinus 0129 and C. zuerichensis 099) and individuals which did not sit in the same clade as other individuals of the same species in the same lake system were kept separated so as to not skew species-wide estimates of excess allele-sharing from single, potentially recent introgression events, and thus not included in node collapsing. Nodes were then collapsed, and the individuals within that clade assigned as a single tree tip, if all individuals within the clade belonged to the same species or species of the same ecomorph from a single lake or, where possible, single lake system (excluding potential F1 individuals). All outgroup individuals in the tree were collapsed into a single outgroup tip. Dsuite86 was then run specifying Dtrios, DtriosCombine, and finally f-branch, each time specifying the collapsed tree. Dsuite was used to first calculate f4 admixture ratios f(A,B;C,O) across the dataset where combinations of taxa fit the necessary relationship ((A, B), C) in our phylogenetic tree, with the 8 non-Alpine whitefish set as the outgroup. The f-branch statistic fb(C) was then calculated from these f4 statistics using the phylogenetic tree to identify excess allele-sharing between any taxa into any other taxon or node in the phylogeny. fb(C) is particularly powerful for complex systems such as the Alpine whitefish radiation since, unlike Patterson’s D, it provides branch-specific estimates of excess allele-sharing, meaning that specific instances of gene flow do not skew excess allele-sharing estimates across multiple nodes or branches, providing a phylogenetically-guided and robust estimate of excess allele-sharing87. F-branch statistics plotted in Fig. 3 are provided in Supplementary Data 4 along with a version of the figure highlighting within-lake introgression in Supplementary Fig. 10. Significant instances of excess allele-sharing were identified by calculating a stringent Bonferroni multiple-testing significance threshold, which involved dividing the p-value threshold of P < 0.01 by the number of cells in the f-branch matrix for which fb(C) could be calculated (1910) and converting this to a Z-score using R. All cells with Z-scores higher than this threshold i.e., Z > 4.41 represented significant excess allele-sharing between taxa in the tree and were indicated as such.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
The raw sequencing files are accessible on ENA (PRJEB47792 and PRJEB43605) and additional source data (genotype file and corresponding metadata file along with figure-specific data) are accessible on the Eawag research data institutional collection [https://doi.org/10.25678/0005S0]89. Source data are provided with this paper.
Scripts for all analyses are available on GitHub: https://github.com/RishiDeKayne/Alpine_whitefish_WGS (archived at Zenodo [https://doi.org/10.5281/zenodo.6807278]90 and on the Eawag research data institutional collection [https://doi.org/10.25678/0005S0]89).
Nosil, P. Ecological Speciation (Oxford University Press, 2012).
Schluter, D. The Ecology of Adaptive Radiation (OUP Oxford, 2000).
Nosil, P., Feder, J. L. & Gompert, Z. How many genetic changes create new species? Science 371, 777–779 (2021).
Seehausen, O. et al. Genomics and the origin of species. Nat. Rev. Genet. 15, 176–192 (2014).
Gavrilets, S. Fitness Landscapes and the Origin of Species (Princeton University Press, 2004).
Yeaman, S. & Whitlock, M. C. The genetic architecture of adaptation under migration-selection balance: the genetic architecture of local adaptation. Evolution 65, 1897–1911 (2011).
Flaxman, S. M., Wacholder, A. C., Feder, J. L. & Nosil, P. Theoretical models of the influence of genomic architecture on the dynamics of speciation. Mol. Ecol. 23, 4074–4088 (2014).
Cruickshank, T. E. & Hahn, M. W. Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Mol. Ecol. 23, 3133–3157 (2014).
Via, S. & West, J. The genetic mosaic suggests a new role for hitchhiking in ecological speciation. Mol. Ecol. 17, 4334–4345 (2008).
Wu, C.-I. The genic view of the process of speciation. J. Evol. Biol. 14, 851–865 (2001).
Lawniczak, M. K. N. et al. Widespread divergence between incipient Anopheles gambiae species revealed by whole genome sequences. Science 330, 512–514 (2010).
Michel, A. P. et al. Widespread genomic divergence during sympatric speciation. Proc. Natl Acad. Sci. USA 107, 9724–9729 (2010).
Riesch, R. et al. Transitions between phases of genomic differentiation during stick-insect speciation. Nat. Ecol. Evol. 1, 82 (2017).
Kautt, A. F. et al. Contrasting signatures of genomic divergence during sympatric speciation. Nature 588, 106–111 (2020).
Todesco, M. et al. Massive haplotypes underlie ecotypic differentiation in sunflowers. Nature 584, 602–607 (2020).
Felsenstein, J. Skepticism towards Santa Rosalia, or why are there so few kinds of animals? Evolution 35, 124–138 (1981).
Kirkpatrick, M. & Barton, N. Chromosome inversions, local adaptation and speciation. Genetics 173, 419–434 (2006).
Nosil, P., Harmon, L. J. & Seehausen, O. Ecological explanations for (incomplete) speciation. Trends Ecol. Evol. 24, 145–156 (2009).
Feller, A. F., Haesler, M. P., Peichel, C. L. & Seehausen, O. Genetic architecture of a key reproductive isolation trait differs between sympatric and non-sympatric sister species of Lake Victoria cichlids. Proc. Biol. Sci. 287, 20200270 (2020).
Lamichhaney, S. et al. Evolution of Darwin’s finches and their beaks revealed by genome sequencing. Nature 518, 371–375 (2015).
Meier, J. I. et al. Ancient hybridization fuels rapid cichlid fish adaptive radiations. Nat. Commun. 8, 14363 (2017).
Richards, E. J. & Martin, C. H. Adaptive introgression from distant Caribbean islands contributed to the diversification of a microendemic adaptive radiation of trophic specialist pupfishes. PLoS Genet. 13, e1006919 (2017).
Svardal, H. et al. Ancestral hybridization facilitated species diversification in the Lake Malawi cichlid fish adaptive radiation. Mol. Biol. Evol. 37, 1100–1113 (2020).
Meier, J. I. et al. The coincidence of ecological opportunity with hybridization explains rapid adaptive radiation in Lake Mweru cichlid fishes. Nat. Commun. 10, 5391 (2019).
Feller, A. F. et al. Rapid generation of ecologically relevant behavioral novelty in experimental cichlid hybrids. Ecol. Evol. 10, 7445–7462 (2020).
Selz, O. M. & Seehausen, O. Interspecific hybridization can generate functional novelty in cichlid fish. Proc. Biol. Sci. 286, 20191621 (2019).
Kagawa, K. & Takimoto, G. Hybridization can promote adaptive radiation by means of transgressive segregation. Ecol. Lett. 21, 264–274 (2018).
Kagawa, K. & Seehausen, O. The propagation of admixture-derived adaptive radiation potential. Proc. Biol. Sci. 287, 20200941 (2020).
Richards, E. J. et al. A vertebrate adaptive radiation is assembled from an ancient and disjunct spatiotemporal landscape. Proc. Natl. Acad. Sci. USA 118, e2011811118 (2021).
Hudson, A. G., Lundsgaard-Hansen, B., Lucek, K., Vonlanthen, P. & Seehausen, O. Managing cryptic biodiversity: Fine-scale intralacustrine speciation along a benthic gradient in Alpine whitefish (Coregonus spp.). Evol. Appl. 10, 251–266 (2017).
Selz, O. M., Dönz, C. J., Vonlanthen, P. & Seehausen, O. A taxonomic revision of the whitefish of lakes Brienz and Thun, Switzerland, with descriptions of four new species (Teleostei, Coregonidae). Zookeys 989, 79–162 (2020).
Steinmann, P. Monographie der schweizerischen Koregonen. Schweiz. Z. Hydrol. 12, 340–491 (1950).
Kottelat, M. & Freyhof, J. Handbook of European Freshwater Fishes (Publications Kottelat, 2007).
Douglas, M. R., Brunner, P. C. & Bernatchez, L. Do assemblages of Coregonus (Teleostei: Salmoniformes) in the Central Alpine region of Europe represent species flocks? Mol. Ecol. 8, 589–603 (1999).
Hudson, A. G., Vonlanthen, P. & Seehausen, O. Rapid parallel adaptive radiations from a single hybridogenic ancestral population. Proc. Biol. Sci. 278, 58–66 (2011).
Vonlanthen, P. et al. Eutrophication causes speciation reversal in whitefish adaptive radiations. Nature 482, 357–362 (2012).
Williams, E. E. The origin of faunas. Evolution of lizard congeners in a complex island fauna: A trial Analysis. In Evolutionary Biology: Volume 6 (eds Dobzhansky, T., Hecht, M. K. & Steere, W. C.) 47–89 (Springer US, 1972).
Doenz, C. J., Bittner, D., Vonlanthen, P., Wagner, C. E. & Seehausen, O. Rapid buildup of sympatric species diversity in Alpine whitefish. Ecol. Evol. 8, 9398–9412 (2018).
Gillespie, R. G. et al. Comparing adaptive radiations across space, time, and taxa. J. Hered. 111, 1–20 (2020).
Martin, C. H. & Richards, E. J. The paradox behind the pattern of rapid adaptive radiation: how can the speciation process sustain itself through an Early Burst? Annu. Rev. Ecol. Evol. Syst. 50, 569–593 (2019).
Ingram, T., Hudson, A. G., Vonlanthen, P. & Seehausen, O. Does water depth or diet divergence predict progress toward ecological speciation in whitefish radiations? Evol. Ecol. Res. 14, 487–502 (2012).
Jones, F. C. et al. The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484, 55–61 (2012).
Miller, S. E., Roesti, M. & Schluter, D. A single interacting species leads to widespread parallel evolution of the stickleback genome. Curr. Biol. 29, 530–537.e6 (2019).
Udpa, N. et al. Whole genome sequencing of Ethiopian highlanders reveals conserved hypoxia tolerance genes. Genome Biol. 15, R36 (2014).
Feulner, P. G. D. et al. Genomics of divergence along a continuum of parapatric population differentiation. PLoS Genet. 11, e1004966 (2015).
Laporte, M. et al. RAD-QTL mapping reveals both genome-level parallelism and different genetic architecture underlying the evolution of body shape in Lake Whitefish (Coregonus clupeaformis) Species Pairs. G3 5, 1481–1491 (2015).
De‐Kayne, R., Zoller, S. & Feulner, P. G. D. A de novo chromosome‐level genome assembly of Coregonus sp. ‘ Balchen’: one representative of the Swiss Alpine whitefish radiation. Mol. Ecol. Resour. 20, 1093–1109 (2020).
Macqueen, D. J. & Johnston, I. A. A well-constrained estimate for the timing of the salmonid whole genome duplication reveals major decoupling from species diversification. Proc. Biol. Sci. 281, 20132881 (2014).
Barghi, N., Hermisson, J. & Schlötterer, C. Polygenic adaptation: a unifying framework to understand positive selection. Nat. Rev. Genet. 1, 769–781 (2020).
Roesch, C., Lundsgaard-Hansen, B., Vonlanthen, P., Taverna, A. & Seehausen, O. Experimental evidence for trait utility of gill raker number in adaptive radiation of a north temperate fish. J. Evol. Biol. 26, 1578–1587 (2013).
Amundsen, P.-A., Knudsen, R., Klemetsen, A. & Kristoffersen, R. Resource competition and interactive segregation between sympatric whitefish morphs. Ann. Zool. Fennici 41, 301–307 (2004).
Lundsgaard-Hansen, B., Matthews, B., Vonlanthen, P., Taverna, A. & Seehausen, O. Adaptive plasticity and genetic divergence in feeding efficiency during parallel adaptive radiation of whitefish (Coregonus spp.). J. Evol. Biol. 26, 483–498 (2013).
He, S. et al. Mandarin fish (Sinipercidae) genomes provide insights into innate predatory feeding. Commun. Biol. 3, 361 (2020).
Colosimo, P. F. et al. Widespread parallel evolution in sticklebacks by repeated fixation of ectodysplasin alleles. Science 307, 1928–1933 (2005).
Svarvar, P.-O. & Müller, R. Die Felchen des Alpnachersees. Schweiz. Z. Hydrol. 44, 295–314 (1982).
Sinclair-Waters, M. et al. Beyond large-effect loci: large-scale GWAS reveals a mixed large-effect and polygenic architecture for age at maturity of Atlantic salmon. Genet. Sel. Evol. 52, 9 (2020).
Láruson, Á. J., Yeaman, S. & Lotterhos, K. E. The importance of genetic redundancy in evolution. Trends Ecol. Evol. 35, 809–822 (2020).
Ohno, S. Evolution by Gene Duplication (Springer, 1970).
Soltis, P. S. & Soltis, D. E. Ancient WGD events as drivers of key innovations in angiosperms. Curr. Opin. Plant Biol. 30, 159–165 (2016).
Merico, A., Sulo, P., Piskur, J. & Compagno, C. Fermentative lifestyle in yeasts belonging to the Saccharomyces complex. FEBS J. 274, 976–989 (2007).
Meyer, A. & Van de Peer, Y. From 2R to 3R: evidence for a fish-specific genome duplication (FSGD). Bioessays 27, 937–945 (2005).
Marques, D. A., Meier, J. I. & Seehausen, O. A combinatorial view on speciation and adaptive radiation. Trends Ecol. Evol. 34, 531–544 (2019).
Feulner, P. G. D. et al. Introgression and the fate of domesticated genes in a wild mammal population. Mol. Ecol. 22, 4210–4221 (2013).
Alexander, T. & Seehausen, O. Diversity, distribution and community composition of fish in perialpine lakes—“Projet Lac” synthesis report. Eawag: Swiss Federal Institute of Aquatic Sci. Technol. (2021).
Andrews, S. & Others. FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010).
De-Kayne, R., Zoller, S. & Feulner, P. G. D. Data from: a de novo chromosome-level genome assembly of Coregonus sp. “Balchen”: one representative of the Swiss Alpine whitefish radiation. https://doi.org/10.5061/dryad.xd2547ddf (2020).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868 (2018).
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at https://www.biorxiv.org/content/10.1101/201178v3 (2017).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Quinlan, A. R. BEDTools: the Swiss-army tool for genome feature analysis. Curr. Protoc. Bioinforma. 47, 11–12 (2014).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Frei, D., De-Kayne, R., Selz, O. M., Seehausen, O. & Feulner, P. G. Genomic variation from an extinct species is retained in the extant radiation following speciation reversal. Nat. Ecol. Evol. 6, 461–468 (2022).
De-Kayne, R. et al. Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets. Mol. Ecol. Resour. 21, 653–660 (2021).
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; (2022). https://www.R-project.org/.
Li, H. et al. The sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Ortiz, E. M. vcf2phylip v2.0: convert a VCF matrix into several matrix formats for phylogenetic analysis. https://doi.org/10.5281/zenodo.2540861 (2019).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Rambaut, A. FigTree v1. 4. http://tree.bio.ed.ac.uk/software/figtree/ (2012).
Alexander, D. H. & Lange, K. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinforma. 12, 246 (2011).
Louis, M. et al. Selection on ancestral genetic variation fuels repeated ecotype formation in bottlenose dolphins. Sci. Adv. 7, eabg1245 (2021).
Storey, J. D., Bass, A. J., Dabney, A. & Robinson, D. q value: Q-value estimation for false discovery rate control. R package version 2.26.0. http://github.com/jdstorey/qvalue (2021).
Alexa, A. & Rahnenfuhrer, J. topGO: Enrichment analysis for gene ontology. R package version 2.46.0. https://bioconductor.org/packages/release/bioc/html/topGO.html (2021).
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
Shim, H. et al. A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PLoS ONE 10, e0120758 (2015).
Malinsky, M., Matschiner, M. & Svardal, H. Dsuite—Fast D-statistics and related admixture evidence from VCF files. Mol. Ecol. Resour. 21, 584–595 (2021).
Malinsky, M. et al. Whole-genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow. Nat. Ecol. Evol. 2, 1940–1955 (2018).
Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).
De-Kayne, R. et al. Data for: genomic architecture of adaptive radiation and hybridization in Alpine whitefish. https://doi.org/10.25678/0005S0 (2022).
De-Kayne, R. et al. All code associated with: De-Kayne et al. Genomic architecture of adaptive radiation and hybridization in Alpine whitefish. https://doi.org/10.5281/zenodo.6807278 (2022).
We thank Anna Feller and Adam Ciezarek for comments on earlier versions of the manuscript. We acknowledge Verena Kälin for the whitefish illustrations. The data produced and analysed in this paper were generated in collaboration with the Next Generation Sequencing Platform, University of Bern, and the Genetic Diversity Centre (GDC), ETH Zurich. This work was supported by the Swiss National Science Foundation (SNSF project 31003A_163446/1 awarded to P.G.D.F.).
The authors declare no competing interests.
Peer review information
Nature Communications thanks Valentina Burskaia, Claire Mérot, Hannes Svardal and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
De-Kayne, R., Selz, O.M., Marques, D.A. et al. Genomic architecture of adaptive radiation and hybridization in Alpine whitefish. Nat Commun 13, 4479 (2022). https://doi.org/10.1038/s41467-022-32181-8