Introduction

Intraspecific resource polymorphisms are widespread in numerous taxa and have had an important role in understanding the roles of phenotypic plasticity and natural selection in generating biodiversity, including the process of speciation (Skúlason and Smith, 1995; Smith and Skúlason, 1996). In particular, fishes demonstrate extensive polymorphism in morphology and resource use both in the marine and freshwater realms and from the tropics to the Arctic (Wimberger, 1994; Taylor, 1999). Often, such polymorphisms occur as discrete morphological and behavioural forms that specialize on alternative resource types in distinct habitats (for example, ‘pelagic’, ‘benthic’, ‘limnetic’, ‘piscivorous’, ‘insectivorous’ morphotypes or ecotypes). A key feature of the environment strongly associated with resource polymorphisms is the novel ecological opportunity, coupled with low interspecific competition, which is often provided by the emergence of new habitats or when existing habitats are altered or disturbed (Schluter, 1996).

In north-temperate regions, recolonization of newly formed freshwater habitats after the most recent glaciation, beginning about 15 000 years ago (Lindsey and McPhail, 1986; Hewitt, 1996), has apparently provided such novel ecological opportunities, resulting in the description of a number of sympatric ecotypes in a variety of freshwater fishes (reviewed in Schluter, 1996; Taylor, 1999). In many cases, there are also examples of repeated occurrences of divergent ecotypes within a taxon that appear to have evolved in parallel (Schluter and Nagel, 1995; Pigeon et al., 1997; Østbye et al., 2006), and there is evidence both for evolution of ecotypes in allopatry followed by secondary contact (Bernatchez and Dodson, 1990; Fraser and Bernatchez, 2005) and for sympatric divergence (Taylor and Bentzen, 1993; Præbel et al., 2013). Phenotypic plasticity, the capability of a genotype to exhibit variable phenotypes as influenced by the environment (Whitman and Agrawal, 2009) has also been important in generating morphological and ecological diversity in post-glacial habitats (Robinson and Parsons, 2002).

Where sympatric ecotypes exist, there are several hypotheses that may explain their nature and origin. First, and perhaps most fundamental, is that ecotypes within a locality represent genetically discrete populations and not plastic responses to a heterogonous environment. The most direct test of this hypothesis would involve multi-generation breeding and common garden experiments (Robinson and Parsons, 2002; Lundsgaard-Hansen et al., 2013), but genetic distinctiveness can also be indirectly examined by assessing the level of divergence at neutral genetic markers to determine if ecotypes represent distinct gene pools (for example, Gowell et al., 2012; Stafford et al., 2014). If the latter is the case, then a second fundamental hypothesis concerns the geographic origin of the ecotypes (that is, whether they evolved in allopatry and now exist in sympatry following secondary contact or whether they evolved in sympatry). Testing these hypotheses with molecular markers, however, can often be challenging if processes such as contemporary gene flow after secondary contact is influencing population structure (Lu et al., 2001; Turgeon and Bernatchez, 2001), or if historical gene flow before isolation contributes to a blurring of population structure (Harris and Taylor, 2010; Harris et al., 2013).

Salmonine fishes offer some of the best-known examples of morphological and ecological diversity within sympatric populations (Schluter, 1996; Taylor, 1999). One of the most remarkable examples is the Arctic Char (Salvelinus alpinus) in Lake Thingvallavatn, Iceland, where four sympatric morphotypes (from herein ‘morphs’) occur (Skúlason et al., 1989a, 1989b; Kapralova et al., 2011). Similarly, sympatric ecotypes have been documented in the Brook Trout (S. fontinalis, for example, Dynes et al., 1999; Fraser and Bernatchez, 2005). The Lake Trout (S. namaycush) also exhibits morphological and ecological variation across its range, but the nature and origins of such variability have been less well studied than in its congeners. For example, in the Laurentian Great Lakes (LGLs), up to three depth-segregated morphs differing in morphology and ecology have been reported (Krueger and Ihssen, 1995; Moore and Bronte, 2001). The origin of these morphs is not completely understood, but there appears to be a genetic component to these differences (Page et al., 2004; Goetz et al., 2010). Sympatric morphs in the LGLs have been heavily impacted, even extirpated in some cases, owing to overharvesting, invasive Sea Lamprey (Petromyzon marinus) predation and the long history of Lake Trout hatchery supplementation (Krueger and Ihssen, 1995; Page et al., 2004), making the study of their evolutionary origin problematic.

More recently, however, additional examples of within-lake morphological divergence in Lake Trout have been described throughout the geographical range of this species (Zimmerman et al., 2006, 2007; Northrup et al., 2010) including relatively unperturbed systems from the Canadian Arctic. For example, in Great Bear Lake (GBL; Figure 1), Canada’s largest, virtually pristine system, Blackie et al. (2003) showed that Lake Trout characterized as ‘piscivorous’ and ‘insectivorous’ based on diet that were captured in shallow-water habitats (<20 m) could be differentiated morphologically based largely on differences in upper and lower jaw length, pectoral fin length and caudal peduncle depth. Alfonso (2004) also resolved two groups (called ‘redfin’ and ‘normal’ forms) that were morphologically distinct in body and caudal peduncle depth, as well as a suite of other characters. More recently, Chavarie et al. (2013) used a suite of morphological measurements to resolve four shallow-water (<30 m) morphs (referred to as groups 1–4), equalling or exceeding the diversity of the LGLs, but without being related to depth. The levels of ecological and genetic differentiation among these shallow-water forms, and the geography of their origin in GBL, however, remain unknown.

Figure 1
figure 1

Map of the study area showing sampling locations among arms within GBL including the additional sampling locations throughout Canada. Numbers refer to the locations listed in Table 1. The maximal extent of glacial Lake McConnell that once covered the areas now occupied by GBL, GSL and Lake Athabasca is indicated by the shaded area (modified from Smith, 1994).

In this study, we evaluated several hypotheses concerning the nature and origin of morphological variation in Lake Trout from GBL. First, we used microsatellite and mitochondrial DNA (mtDNA) to assess levels of genetic divergence among morphs. If the different morphs represent distinct gene pools, rather than solely the result of phenotypic plasticity within a single gene pool, we expected to detect significance divergence between distinct morphs within the same arm, or across arms (Kapralova et al., 2011). Second, we used these data to assess genetic interrelationships among morphs within and among arms of GBL and to test alternative hypotheses on their geographic origins. Under an allopatric model of divergence, we expected that similar morphs found in different arms of GBL would be more closely related to each other than divergent morphs within the same arm and that each such ‘morph-cluster’ would be genetically similar to Lake Trout sampled from outside GBL that originated from distinct glacial refugia (for example, Bernatchez and Dodson, 1991; Lu et al., 2001, Fraser and Bernatchez, 2005; Figure 2a). Alternatively, under a sympatric model of divergence, distinct Lake Trout morphs within GBL should be genetically more similar to each other than to any Lake Trout found outside GBL (the ‘intra-lacustrine’ model; Figure 2b). This scenario would suggest that Lake Trout morphs in GBL diverged from a common ancestor in situ soon after colonizing this system post-glacially and subsequently each morph dispersed to the different arms of GBL (Eshenroder, 2008). It is also possible that the multiple occurrences of different morphs within GBL across arms could have originated by multiple bouts of sympatric divergence (the ‘intra-arm’ model). If so, we would anticipate that divergent morphs within the same arm would be each other’s closest relative and that all such arm-specific clusters would form a monophyletic GBL cluster relative to Lake Trout from outside of this system (Hudson et al., 2007; Butlin et al., 2008; Figure 2c). If this model were supported it would provide evidence for parallel divergence of distinct morphs in each arm (see Schluter and Nagel, 1995). Overall, our study strives to better understand the evolution and maintenance of the immense phenotypic and ecological divergence exhibited by fishes occupying post-glacial habitats and to provide insights into whether such variability is a result of genetic discreteness, phenotypic plasticity or combinations of both (Bernatchez and Dodson, 1990; Taylor and McPhail, 2000; Crispo, 2008; Lundsgaard-Hansen et al., 2013).

Figure 2
figure 2

Hypothesized evolutionary scenarios for explaining the origin of morphological variation among shallow-water morphotypes (represented here as ‘A’ and ‘B’) of Lake Trout from GBL, Canada. Shown are (a) an allopatric model of divergence, where morphotypes diverged in discrete refugia and then subsequently colonized GBL followed by the colonization of shallow-water habitats within each arm, (b) an ‘intra-lacustrine’ model of divergence, where the evolution of morphological variation occurred in situ soon after colonizing GBL; subsequently, discrete morphs dispersed to occupy the shallow-water habitats within each arm and (c) an ‘intra-arm’ model of divergence, where the evolution of morphological variation ensued in situ within each arm from a common ancestor within GBL (that is, parallel evolution of morphs across arms). Shown below each colonization map are the anticipated phylogenetic relationships.

Materials and methods

Study system, sample collection and previous morph identification

GBL is located in the Northwest Territories of Canada and is the largest lake contained wholly within the country (Figure 1). As the Laurentian and Cordilleran ice sheets receded, numerous glacial lakes were formed throughout much of glaciated North America (Pielou, 1991). One such body of water, glacial Lake McConnell, covered the three basins presently occupied by GBL, Great Slave Lake (GSL) and Lake Athabasca (Craig, 1965; see Figure 1). Glacial Lake McConnell formed around 12 000 ybp as the Laurentide ice sheet retreated and existed until around 8500 ybp when isostatic rebound caused it to become separated into GSL, GBL and Lake Athabasca (Lemmen et al., 1994; Smith, 1994). During its existence, glacial Lake McConnell was also impacted by a major flood from glacial Lake Agassiz approximately 9900 ybp that drastically altered the hydrography of this system (Smith, 1994). Presently, GBL is divided into five distinct arms (Dease, Keith, McTavish, McVicar and Smith arms) that are all connected to a central basin. The above description of glacial Lake McConnell raises several important points. First, Lake Trout from what are now separate GBL and GSL would have once existed sympatrically before glacial Lake McConnell became subdivided approximately 8500 ybp. Second, the relatively recent subdivision of glacial Lake McConnell into GBL, GSL and Lake Athabasca implies that Lake Trout from GBL have only been isolated from Lake Trout in other freshwater systems for approximately 8500 years or 567 generations (based on a 15 year generation time). Finally, as a result of flood water from glacial Lake Agassiz, Lake Trout once isolated in the Mississippian glacial refuge were able to colonize as far north as GBL (see also Rempel and Smith, 1998).

We sampled GBL Lake Trout using paired bottom sets of a 140 mm and multi-mesh (38–140 mm) gill nets set at depths of 30 m (Chavarie et al., 2013). Owing to the logistical challenges of sampling a large Arctic lake, sampling was localized to one arm per year, rotating each year since 2002. Consequently, the 10-year data set has a full spatial representation of the lake, including temporal coverage in some arms (Table 1).

Table 1 Sampling locations and sample sizes for microsatellite and mtDNA sequencing analyses

Four divergent shallow-water (30 m) sympatric morphs of Lake Trout have been identified, and at least three occur within each arm (Chavarie et al., 2013; Figure 3). Briefly, a digital image of each fish was used to obtain morphological measurements based on 23 landmarks (homologous points), 20 semi-landmarks (used to compare homologous curves), and 12 linear measurement traits. These measurements summarized variation in body shape, head shape, and head and fin lengths of Lake Trout resulting in the resolution of four morphs: groups 1, 2, 3 and 4. Samples were assessed in morphs 1 and 2 from all five arms, whereas morphs 3 and 4 were assessed from only two arms (Dease and McVicar) and one arm (Keith), respectively (Table 1) because of low sample sizes of the latter.

Figure 3
figure 3

Shallow-water morphotypes of Lake Trout resolved from GBL assessed in this study (from Chavarie et al., 2013). Also shown are the body shape and head shape landmarks used to delineate morphotypes from this system.

We also obtained tissue samples of Lake Trout from lakes thought to contain lineages that originated from distinct glacial refugia (Wilson and Hebert, 1998): Sandy (Mackenzie River drainage) and Jayko (Victoria Island) lakes (putative Northern Beringian refuge populations); Atlin (Pacific coast drainage) and Nakinlerak (Fraser River drainage) lakes in British Columbia (putative Nahanni or Southern Beringian refuge populations); Peter (Hudson Bay drainage) and Nipigon (Great Lakes drainage) lakes (putative Mississippian refuge populations); and GSL (putative Beringian, Southern Beringian (Nahanni) or Mississippian refuge populations; Table 1, Figure 1). Unfortunately, both genetic and morphological data were only available for Lake Trout samples from GBL (but see Northrup et al., 2010).

Molecular methods

Microsatellite DNA data collection

DNA was extracted using Qiagen Dneasy tissue extraction kits (Qiagen Inc., Valencia, CA, USA). Samples were assayed for variation at 24 microsatellite loci amplified in four multiplexes (Supplementary Appendix 1). An automated sequencer (ABI 3130xl Genetic Analyzer; Applied Biosystems, Foster City, CA, USA) was used for microsatellite analysis using the LIZ 600 size standard. GeneMapper (ver. 4.0, Applied Biosystems) software was used to score all microsatellite data. All scoring data were also visually assessed.

MtDNA data collection

MtDNA data collection methods are described in Harris et al. (2013). Briefly, the left domain of the control region (d-loop) was amplified with primers tPro2 (Brunner et al., 2001) and ARCH1 (Alekseyev et al., 2009). The target amplification product was then sequenced with primer tPro2 using the Applied Biosystems Big Dye Terminator v3.1 Terminator Cycle Sequencing kit (Applied Biosystems). Sequencing products were run on an Applied Biosystems 3130xl Genetic Analyzer and aligned to haplotype Snam01 (Genbank accession number: JQ772460) using Seqscape vers. 2.5 (Applied Biosystems).

Statistical analyses

Genetic variation and Hardy–Weinberg and linkage disequilibrium

The program MICRO-CHECKER (ver. 2.2.3; Van Oosterhout et al., 2004) was used to assess the quality of the microsatellite markers by testing for null alleles and large allele dropout. The program FSTAT (ver. 2.9.2.3; Goudet, 2002) was used to compile descriptive statistics (number of alleles (NA), expected (HE, Nei’s unbiased gene diversity) and observed (HO) heterozygosities and the fixation index (FIS)) for each locus within each sample. In addition, the program HP-RARE (Kalinowski, 2005) was used to calculate allelic richness at a common sample size (AR, minimum 22 genes from each sample) and private allelic richness (PAR, 22 genes from each sample). To qualitatively visualize allele frequency differences among all samples and morphs within GBL, bubble plots of allele frequency variation were created for each locus in R (R Core Team, 2013). Differences in AR, HE, HO and FST when GBL samples were grouped by arm and when samples were grouped by morph were also compared using FSTAT permutation tests. Deviations from Hardy–Weinberg equilibrium (HWE) and linkage disequilibrium were assessed using GENEPOP (ver. 4.0; Rousset, 2008). First, we tested for deviations from HWE among all samples and then among only those from GBL when grouped by either arm or morphotype. If large deviations from HWE are observed for one of the grouping scenarios (that is, either by arm or morph), for example, because of a reduction of heterozygosity because of a Wahlund effect, then that grouping hypothesis would be poorly supported. The significance of simultaneous comparisons was initially compared with a nominal alpha of 0.05 and then to an adjusted alpha following the false discovery rate procedure (FDR, Narum, 2006).

Genetic population structure among morphs and arms

Global values of FST (θ, Weir and Cockerham, 1984) were generated using FSTAT and were calculated among all samples and then among samples in GBL grouped either by arm or by morph. Pairwise estimates of FST among all samples were compared in ARLEQUIN (ver. 3.1; Excoffier et al., 2005). To further resolve how population structure is best explained in GBL, we also calculated pairwise FST for samples when grouped by arm and when grouped by morph. The significance of all pairwise estimates was assessed using 10 000 permutations.

We used several methods to visualize population structure among all samples and then among only those samples from GBL to determine if genetic structure was most evident by morph or by arm. First, a factorial correspondence analysis was performed using GENETIX (ver. 4.02; Belkhir et al., 2004), which graphically represents the genetic distances between individual multilocus genotypes. Using Cavalli-Sforza and Edwards (1967) chord distance (DCE), bootstrapped neighbour-joining trees were built using PHYLIP (ver. 3.6.9; Felsenstein, 2009). Finally, we used Bayesian clustering implemented in the program STRUCTURE (ver. 2.3.4; Pritchard et al., 2000) to estimate the number of putative populations or clusters (K). For this analysis, we used the admixture model with independent allele frequencies while varying K from 1 to 20. We ran 10 independent runs for each value of K to assess variability of obtained log-likelihood values using a burn-in of 100 000 iterations followed by 100 000 Markov chain Monte Carlo iterations. We performed STRUCTURE analyses for two data sets: (1) one consisting of all samples; and (2) one containing only the GBL samples. Given the limited genetic structure that has previously been resolved among Lake Trout from GBL (Harris et al., 2013), we used the LOCPRIOR option in STRUCTURE for the latter analysis to enhance the likelihood of detecting existent population structure in this system (detailed by Hubisz et al., 2009). To avoid biasing the analysis toward the clustering by morph or arm, LOCPRIOR information was included at the sample level (that is, including 13 distinct entities in our LOCPRIOR information and the 13 localities represent various positions within arms). The program STRUCTURE HARVESTER (ver. 06.6.92; Earl and Vonholdt, 2012) was first used to visualize and compile the results based on both the posterior probability of the data (ln P[D]) and the post hoc ΔK statistic of Evanno et al. (2005). The best alignment of replicate runs was assessed using the program CLUMPP (ver. 1.1; Jakobsson and Rosenberg, 2007) using 1000 permutations and the LargeKGreedy algorithm. The program DISTRUCT (ver. 1.1; Rosenberg, 2004) was then used to produce plots of the best alignments for average memberships calculated using CLUMPP. For STRUCTURE analyses, we report the results of both the data ln P[D] and the post hoc ΔK statistic.

To determine the extent to which genetic variation was partitioned across samples, we conducted a hierarchical analysis of molecular variance (AMOVA; Excoffier et al., 1992) using ARLEQUIN. Using an AMOVA, the percentage of the total genetic variation is partitioned within populations (that is, individuals, Vc), among populations within groups (Vb) and by differences among groups (Va). We first tested a grouping hypothesis based on putative refuge of origin (as described above) to assess large-scale partitioning of genetic variation. We grouped samples by putative refugial origin based on previous assessments for this species (for example, Wilson and Hebert, 1998, see above) and the results of this study. Then, within GBL two grouping hypotheses were tested in which samples were treated by grouping distinct morphs by arm within GBL (‘by arm’ grouping) and then by grouping similar morphs among arms (‘by morph’ grouping). If the distinct morphs had arisen by parallel sympatric divergence across each arm of GBL, we expected the greatest amount of variation in allele frequencies to be explained using the ‘by arm’ grouping scenario.

Finally, to estimate the timing of divergence among the two most common morphotypes in GBL (that is, groups 1 and 2) we used the isolation with migration model (Nielsen and Wakeley, 2001; Hey and Nielsen, 2004) as implemented in the program IMa (Hey and Nielsen, 2007). IMa uses Markov chain Monte Carlo sampling to obtain maximum-likelihood estimates of six parameters, including the timing of divergence (t, as scaled by mutation rate, μ). Initial runs incorporated upper bounds for each parameter (q1=q2=qA=10, m1=m2=10, t=10) as recommended in the IMa manual. We used heated chains under Metroplis coupling incorporating a geometric heating scheme and parameter estimates were generated under the stepwise mutation model (SMM). After preliminary runs to optimize settings (for example, heating parameters and number of chains), to ensure that the parameter space was fully explored, that the stationary distribution were adequately sampled and that there was adequate mixing (based on low autocorrelation of parameters, high acceptance rates and visualization of plots to ensure no trends were apparent), five replicate simulations were conducted, each varying the random seed to assess consistency among IMa runs. The final runs included a burn-in of 1 × 106 followed by 1 × 107 Markov chain Monte Carlo steps, with 30 heated chains.

MtDNA genetic analyses

We used MEGA (ver. 5.0; Tamura et al., 2011) to find the best nucleotide substitution model for our mtDNA sequence data as assessed using the Akaike Information Criterion and the Bayesian Information Criterion. ARLEQUIN was used to compute descriptive statistics (that is, haplotype frequency, haplotype diversity (h) and nucleotide diversity (π)), Tajima’s D (a test for deviations from neutral xpectations; Tajima, 1989) and pairwise FST between sampling locations (with significance tested using 10 000 permutations). In addition, using ARLEQUIN, AMOVA’s were performed to determine the extent to which genetic variation (based on mtDNA sequence data) is partitioned according to the grouping hypotheses described above. Evolutionary relationships among haplotypes were visualized by constructing a haplotype network following the statistical parsimony method of Templeton et al. (1992) as implemented in the program TCS (ver. 1.20; Clement et al., 2000).

Results

Microsatellite DNA

Genetic variation and HWE and linkage disequilibrium

The locus OMM1128 was monomorphic and the program MICRO-CHECKER consistently identified SnaMSU9 as a locus containing null alleles. Removing these two loci resulted in 22 informative loci that were used in all subsequent analyses across 811 samples. Genetic variation was relatively high with the number of alleles ranging from 4 (Smm21) to 66 (SnaMSU10) alleles per locus and averaging 25.73 alleles per locus across all loci (Supplementary Appendix 2). Bubble plots of allele frequency variation were characterized by differences according to region (that is, putative refugia; Supplementary Appendix 3). Among GBL samples, bubble plots revealed that most samples (morphs) were not characterized by marked differences in allele frequencies or the presence of unique alleles and, qualitatively, no morph-specific differences were noticeable at any locus (Supplementary Appendix 3). Per locus observed heterozygosity ranged from 0.14 (Sco102) to 0.94 (SnaMSU6), averaging 0.74 across all loci and expected heterozygosity ranged from 0.14 (Sco102) to 0.96 (SnaMSU6 and SnaMSU10), while averaging 0.80 across all loci (Supplementary Appendix 2). Allelic richness ranged from 1.84 (Sco102) to 14.29 (SnaMSU6) and averaged 8.30 across all loci (Supplementary Appendix 2). There were no significant differences (P>0.05) in HO, HE, AR or FST when comparing GBL samples grouped by arm (Supplementary Appendix 4). Alternatively, when GBL samples were grouped by morph there were significant differences in HO, HE and AR (P<0.05) between morphs, but not in FST (P>0.05; Supplementary Appendix 4).

When all samples were assessed, HWE was rejected in 61 of a possible 440 population–locus comparisons (P<0.05) but subsequent to adjustments of alpha based on the FDR procedure only 21 deviations were detected (P<0.0075). All deviations were the result of heterozygote deficiencies. When samples from GBL were grouped by arm, 19 of a possible 110 population–locus comparisons (17.2%) were significant after adjustments for multiple comparisons (P<0.0095), whereas when samples were grouped by morph only 12 of a possible 88 comparisons (13.6%) remained significant after FDR adjustments (P<0.0099). Virtually all deviations when samples were grouped by arm or by morph were the result of heterozygote deficiencies. Significant linkage disequilibrium was detected in 188 of 4620 tests (P<0.05), but after using the FDR procedure, it was detected in only 45 comparisons (P<0.0055).

Population structure

Moderate overall differentiation among all samples was resolved (global FST estimate of 0.071, 95% confidence interval (CI)=0.056–0.087) but not when comparing only those samples from GBL in which overall differentiation was low (global FST=0.008, 95% CI=0.005–0.011). Pairwise estimates of FST ranged from virtually no apparent differentiation (among some of the samples from GBL) to 0.435 between lakes from distinct drainages covering putative distinct refugia (NAK and SAN; Supplementary Appendix 5). The majority of these comparisons were significant (P<0.05), even after adjusting for multiple comparisons based using the FDR (P<0.0086). Among GBL samples, however, results differed when samples were grouped by arm or by morph. When samples were grouped by arm, global FST was 0.002 (95% CI=0.001–0.003). In contrast, differentiation was almost four times higher when samples were grouped into the four morphs (global FST=0.007, 95% CI=0.004–0.011). When comparing samples grouped by arm, pairwise FST ranged from 0.0009 (between Keith and Dease arms) to 0.0061 (between McTavish and Smith arms located on opposite sides of GBL; Table 2a). Among morphs, pairwise estimates were higher, ranging from 0.0063 (between GRP 1 and GRP 2 morphs) to 0.0174 (between GRP 2 and GRP 4 morphs; Table 2b). Finally, when GBL samples were combined and compared with those grouped into putative refugia FST ranged from 0.038 (between GBL and the Mississippian refuge group) to 0.176 (between the Beringian and Southern Beringian (Nahanni) refugial groups; Table 2c). All of these comparisons were significant before (P<0.05) and after adjusting for multiple comparisons (P<0.0219).

Table 2 Pairwise FST (θ) values based on microsatellite (below diagonal) and mtDNA sequence (above diagonal) data between samples from GBL when they are grouped by arm within this system (a), when grouped by morphotype (b) and when all GBL samples were combined and compared with those grouped into putative refugia (that is, Northern Beringian (N-BER), Southern Beringian (S-BER) or Mississippian (MIS) (c))

The factorial correspondence analysis grouped individuals into clusters suggestive of distinct refugial origins (Figure 4a). Samples from GBL, GSL, PET and NIP (putative Mississippian refuge origin) clearly grouped together, while samples from SAN and JAY (putative Beringian refuge origin) also formed a distinct cluster. Alternatively, the samples from NAK and ATL (putative Southern Beringian refuge origin) were divergent from one another and showed no clear genetic affinity to any of the other samples. Within GBL, there was some clustering of samples based on morph (Figure 4b) and when only the GRP 1 and GRP 2 morphs were assessed, the genetic distinction between them was even clearer (Supplementary Appendix 6). The neighbour-joining tree also grouped samples by the refuge from which they potentially dispersed and again, GSL, PET and NIP grouped closer in the tree to GBL than any of the other samples (Figure 4c). Some separation of GBL samples by morph was also indicated based on the neighbour-joining tree. Although bootstrap support levels were modest, GRP 1 morphs tended to cluster together and separately from GRP 2 morphs (Figure 4d). The GRP 3 morph was more associated with the GRP 1 morph, whereas the GRP 4 morph was more closely associated with the GRP 2 morph (Figure 4d). The genetic distinction between GRP 1 and GRP 2 morphs was highlighted even further when only those samples were included in the analysis (Supplementary Appendix 6).

Figure 4
figure 4

Factorial correspondence analysis (FCA) of microsatellite DNA variation for (a) all samples included in the study and (b) only those form GBL. Also shown is a neighbour-joining (NJ) tree based on Cavalli-Sforza and Edwards (1967) chord distance (DCE) for microsatellite data shown for (c) all samples in the study and (d) only those from GBL. Sample codes are shown in Table 1.

Over the entire data set, Bayesian clustering implemented in STRUCTURE suggested the existence of 9 (ΔK=4.67) or 10 (ln P[D]=−70 744.18) genetic clusters (Supplementary Appendix 7). When admixture plots were visualized assuming K=9, distinct clusters that were associated with putative refugial origins were apparent for some samples (Figure 5a). For example, GSL and GBL (putative Mississippian refuge) samples clustered relatively closely together as did the SAN and JAY (putative Beringian refuge) samples. Within GBL, the STRUCTURE analysis suggested the existence of two genetic clusters based on the ln P[Data] values, whereas results based on ΔK suggest that four clusters as the most likely population structure (ln P[Data] and ΔK of −43 695.97 and 5.75 respectively; Supplementary Appendix 7). Under any of the models of population structure explored (that is, K=2 or 4 as the best models), admixture plots indicated strong differentiation in the genetic compositions of GRP 1 and GRP 2 morphs, and that GRP 3 and GRP 4 morphs have greater similarity to the GRP 1 morph (Figure 5b, Supplementary Appendix 6). When only the GRP 1 and GRP 2 morphs were analyzed, both statistics provided support for the existence of two genetic clusters within GBL (ln P[Data] and ΔK of −35 263.93 and 6.08 respectively; Supplementary Appendix 7).

Figure 5
figure 5

The results of Bayesian clustering analysis implemented in STRUCTURE showing the proportion of the genome (q, admixture coefficient on the y axis) assigned to one of the most likely inferred clusters. Shown are the results when (a) all samples were assessed (shown for K=9) and then when (b) only those from GBL were included (shown for K=2 and 4, see Results section). Each column represents a different individual. Sample codes refer to those outlined in Table 1.

AMOVA of microsatellite DNA allele frequencies indicated that 9.5% of the variation was explained by variation among groups when populations were grouped into putative refugial origins, 6.3% of the variation was attributed to variation among populations within putative refugia and 84.2% was explained by individuals within populations (Table 3). Alternatively, the AMOVA revealed that very little of the genetic variation could be explained by the grouping scenarios for samples within GBL; virtually all of the variation was attributed to variation among individuals within each sample (>99%) for both the grouping scenarios (P<0.001) and only 0.54% (P>0.05) and −0.18% (P>0.05) of the variation was attributed to the groupings based on morph and arm within GBL, respectively (Table 3). When samples were grouped into either GRP 1 or GRP 2 morph, results were virtually identical; most of the variation (98.9%) was explained by variation among individuals within populations (Table 3).

Table 3 Results of the hierarchical AMOVA showing the grouping hypotheses tested in this study

Among the five independent runs performed, the posterior probability distribution for divergence time (t) between groups 1 and 2 peaked at 0.195 (0.095–0.255, 90% highest posterior density) and averaged 0.141 (0.077–0.211, 90% highest posterior density; Supplementary Appendix 8) across all five runs. Assuming a microsatellite mutation rate of 1 × 10−4 (Jarne and Lagoda, 1996) resulted in a divergence time of 1950 years (950–2550) using the peak estimate of t and 1410 years (770–2110) using the divergence estimate averaged across all five independent runs. Regardless, both estimates are much less than the predicted age of GBL of 8500 ybp.

MtDNA variation

A 468-base pair region of the mtDNA control region was sequenced for 302 individuals and nine variable sites were found. A total of 12 haplotypes were resolved, 7 of which were new (Snam07-13) and deposited into GenBank under accession numbers KF951407-13. Haplotype Snam01 was the most common haplotype found in 39.2% of all samples, represented in all sampling locations with the exception of ATL and NAK (Supplementary Appendix 9). Snam06, Snam07 and Snam08 were the second, third and fourth most common haplotypes, found in 29.2%, 14.6% and 7.3% of all the samples, respectively. These haplotypes were found in most sampling locations (and morphs within GBL). All other haplotypes were relatively uncommon and were found in only a handful of samples. The NAK samples were represented exclusively by the Snam06 haplotype and Snam10 was unique to the ATL samples being found in >50% of these. The Tamura 3-parameter model (Tamura, 1992) was found to be the most appropriate model of nucleotide substitution and was used in all subsequent analyses that required prior substitution information. Estimates of Tajima’s D suggest that DNA polymorphism within each sample was consistent with neutral expectations (all P>0.05; Supplementary Appendix 9). Haplotype diversity ranged from 0.00 to 0.90 in the NAK and DES-GRP 3 samples, respectively. Nucleotide diversity ranged from 0.00 to 0.01 in the NAK and DES-GRP 3 samples, respectively.

Pairwise FST, ranged from −0.162 between samples from two distinct arms in GBL (VIC-GRP 3 and DES-GRP 3) to 0.863 between samples from distinct drainages outside GBL (ATL and NAK; Supplementary Appendix 5). Of these comparisons, 57 of 190 were significant (P<0.05) and 40 of these remained significant after adjusting alpha for multiple comparisons (P<0.0086; Supplementary Appendix 5). There were very few comparisons among GBL samples that were significant (4 of 58, P<0.0086) and comparisons between GBL samples and GSL were usually nonsignificant. Finally, both the NAK and ATL sampling locations were highly differentiated from all other samples. When samples were grouped by arm only two comparisons were significant (P<0.05), both of which involved McTavish Arm (Table 2a) and when assessing differentiation among GBL samples grouped by morph, no significant pairwise comparisons of FST were found (P>0.05; Table 2b). Finally, when GBL samples were grouped and compared with those grouped into putative refugia, FST ranged from 0.0278 (between GBL and the Mississippian refuge group) to 0.444 (between the Beringian and Southern Beringian (Nahanni) refugial groups; Table 2c) and all but the GBL-Mississippian refugial group comparison were significant before (P<0.05) and after adjusting for multiple comparisons (P<0.0219).

Similar to the AMOVA on microsatellite data, very little of the variation could be explained by our grouping hypotheses for samples from GBL (Va<1.0% for all hypotheses; Table 3). More of the variation, however, was attributed to populations within groups (Vb=3.5–3.7%), but again most of the variation was attributed to variation among individuals within each sample (>95%, P<0.05 for all scenarios). When grouping populations by putative refugial origin, 8.8% of the variation was attributed to this grouping hypothesis (Va, P<0.05), whereas 11.3% and 79.9% was attributed to populations within groups (Vb, P<0.01) and individuals within populations (Vc, P<0.01) respectively.

The haplotype network based on mtDNA sequences highlighted the close interrelationships among haplotypes, which differed from each other by a maximum of seven mutational steps (Figure 6a). Although the frequencies of haplotypes varied across sampling locations, the three most common haplotypes (Snam01, Snam06 and Snam07) were distributed throughout most of the study area. (Figure 6b). With the exception of ATL (the only location in which Snam10 was present) and NAK (fixed for Sam06), no regions or sampling locations were characterized by unique assemblages of highly divergent haplotypes (Supplementary Appendix 10).

Figure 6
figure 6

(a) Unrooted haplotype network based on the sequencing of 468-base pairs within the mtDNA control region (d-loop) showing the geographic distribution of haplotypes produced using TCS (Clement et al., 2000). Each circle represents a unique haplotype and lines separating haplotypes represent mutational steps. Small black nodes between haplotypes suggest missing haplotypes. The numbers on the lines between haplotypes indicate the base pair position for the difference. The size of the circle indicates the number of individuals with that haplotype and colours correspond to the sample codes shown in Table 1. (b) The proportion of each haplotype shown per morphotype in GBL (GRP 1, GRP 2, GRP 3 and GRP 4) and within other sampling locations (see Table 1). Coloured bars under sample codes correspond to samples from GBL (black), and putative Mississippian (blue), Northern Beringian (red) and Southern Beringian (Nahanni) (green) glacial refugia origins.

Discussion

Divergence and origin of GBL S. namaycush morphs

A remarkable feature of post-glacial, north-temperate freshwater fishes is their extensive phenotypic and ecotypic diversity (Schluter, 1996; Taylor, 1999; Robinson and Parsons, 2002). Lake Trout are well known for polymorphism in the southern latitudes (that is, the LGLs, Moore and Bronte, 2001) but, in general, little is known regarding the evolution and maintenance of morphological and ecological diversity in this species, particularly in the Arctic. Our study has provided evidence that there is an association between morphological and genetically distinct groups of Lake Trout in Canada’s GBL and that the latter appear to represent distinct gene pools. Within GBL, inter-arm population structure was low, but when pairwise comparisons were structured by morph, differentiation was much stronger and most analyses appear to genetically group samples by morph better than by arm. Model-based clustering suggested that the greatest degree of genetic divergence in GBL is between GRP 1 and GRP 2 with the other two morphs appearing to be most similar to the GRP 1 morph. Allopatric isolation in, and dispersal from, distinct refugia has been important for generating biodiversity in north-temperate faunas (for example, Bernatchez and Dodson, 1990; April et al., 2012) but our data were consistent with an intra-lacustrine model of morph divergence, as (1) all morphs were genetically most similar to one another rather than to Lake Trout sampled from other areas likely representative of populations that originated distinct glacial refugia and (2) maximum-likelihood estimates of divergence are younger than the formation of GBL suggesting Lake Trout morphological variation evolved in situ in this system. Notwithstanding our evidence of intra-lacustrine divergence of GBL Lake Trout, our data suggested that GBL Lake Trout diversity probably ultimately originated by dispersal from a Mississippian refuge.

Few studies have tested for associations between morphology and genetic divergences among morphs in Lake Trout. Goetz et al. (2010) assessed growth, morphometry, lipid content and differences in liver transcriptomics among laboratory reared lean and siscowet Lake Trout morphs from Lake Superior and concluded that differences in these traits were genetic in nature. Earlier, Burnham-Curtis and Smith (1994) drew similar conclusions when comparing osteological characters. Alternatively, Northrup et al. (2010) found no association between morphologically and genetically distinct groups of Lake Trout from another northern system (Atlin Lake, BC) and Stafford et al. (2014) did not detect any genetic differentiation between Lake Trout morphs from Montana, although this latter case involved a non-native population introduced <120 years ago. Thus, in addition to showing that different morphs may represent distinct gene pools, our work represents one of the first studies to use genetic data to evaluate different geographic models for their evolution (but see Krueger and Ihssen, 1995). Specifically, our work supports the idea that post-glacial populations of fishes may diverge into discrete ecomorphological forms in sympatry subsequent to colonization of novel, heterogonous post-glacial habitats. Although empirical examples of divergence occurring sympatrically are still relatively rare (Bolnick and Fitzpatrick, 2007), more studies supporting this scenario are becoming available (for example, Barluenga et al., 2006), including an increasing number involving north temperate fishes (Taylor and Bentzen, 1993; Gíslason et al., 1999; Lu et al., 2001; Alekseyev et al., 2002; Præbel et al., 2013).

Ecological sympatric speciation, in which barriers to gene flow are driven by divergent selection acting in contrasting directions between ecological niches (Rundle and Nosil, 2005; Nosil, 2012), has been implicated in the evolution of reproductive isolation, genetic differentiation and the evolution of ecological/morphological variation in variety of fish species (Schluter, 1996; Hendry et al., 2007; Langerhans et al., 2007; Præbel et al., 2013). Under an ecological speciation model, upon colonization of GBL or once Lake Trout became isolated in this system, founding populations can be hypothesized to have exploited discrete niches within the same geographical area (Rundle and Nosil, 2005). Divergent selection may have then acted to drive the fixation of advantageous alleles in each morph (Schluter and Conte, 2009) or selection may have resulted in the evolution of so called ‘magic traits’ to pleiotropically promote reproductive isolation (Servedio et al., 2011). Given that unique alleles were not common in any of the Lake Trout morphs, these populations likely represent incomplete (or initial) stages of reproductive isolation (Nosil et al., 2009). In GBL Lake Trout, natural selection is likely acting on traits important resource/food exploitation and the subsequent adaptations to these new ecological niches may have promoted some degree of assortative mating and reproductive isolation (Hendry et al., 2007). Reinforcement of mate choice from selection against hybrids or immigrant phenotypes that are less fit than the parental morphs (Vamosi and Schluter, 1999) could have then contributed to the maintenance of this divergence. Indeed, in GBL, Blackie et al. (2003) identified two morphs that differed in diet (insectivorous vs piscivorous) and suggested that the morphological differences among shallow-water forms were likely the result of divergent foraging strategies. Furthermore, the morphological variation used to delineate the morphs in this study was associated with food acquisition (Chavarie et al., 2013). Thus, the divergence of GBL Lake Trout into discrete morphs is likely, at least partially, related to selection acting on traits important for food exploitation (or habitat use associated with prey distribution). Variation in depth-related habitat use and diet has also been associated with ecotypic/morphological variation in this species in other northern systems (Zimmerman et al., 2007) including GSL (Zimmerman et al., 2006, 2009). GSL has likely shared a similar glacial history to that of GBL, and Lake Trout from these two systems have likely been isolated for approximately 8500 years (Smith, 1994). Unfortunately, no genetic data are available for the previously identified Lake Trout morphs from GSL and we did not have morphological data for our GSL samples that could be compared directly with those from GBL. Given that all morphotypes in GBL were most closely related to one another than to samples from GSL (or any other system) and that GBL appears to exhibit much more diversity (at least four versus two forms, the latter of which were primarily captured in depths >50 m), it is unlikely that the morphological diversity observed in GBL is the result of colonization of divergent morphs from GSL. Furthermore, our estimates for divergence between the morphs for which we had representative samples from all arms, suggests that the Lake Trout morphs assessed in the present evolved in situ after post-colonization of GBL.

Interestingly, in the LGLs, sympatric evolutionary hypotheses for divergence have also been suggested for deep-water Lake Trout morphs. For example, Burnham-Curtis (1993) suggested that the siscowet morph diverged from the lean morph in Lake Superior after the Pleistocene (8000 ybp) and that the humper morph was the result of hybridization between the two. Eshenroder (2008) reiterated this idea and proposed that the humper morph diverged post-glacially in sympatry from an ancestral shallow-water form. Interestingly, a common theme in these studies is that divergence evolved in situ as a result of divergent natural selection acting upon traits important for resource exploitation. Regardless of the mechanism responsible for the divergence, our results are consistent with hypotheses proposed for explaining morphological variation among sympatric Lake Trout populations in other large lake systems where divergence appears to have occurred post-glacially in sympatry (Eshenroder, 2008).

Finally, and although widely considered the most important mechanism for the evolution biodiversity (Coyne and Orr, 2004, see also April et al., 2012), our genetic results argue against the fully allopatric model of divergence despite this mechanism being described for the evolution of divergence and morphological variation in numerous North Temperate species (Pigeon et al., 1997; Lu et al., 2001; Fraser and Bernatchez, 2005; April et al., 2012). Under the allopatric hypothesis, we anticipated that similar morphs of Lake Trout across different arms of GBL would be more closely related to each other than divergent morphs within the same arm and that similar morphs would then be associated with Lake Trout from outside of this system that originated in distinct refugia. This, however, was not the case because all morphs from GBL had a genetic affinity to samples that likely originated from the Mississippian refuge (for example, GSL, Peter and Nipigon lakes) suggesting that all morphs of GBL Lake Trout had similar refugial origins. This was also supported by the lack of morph-specific mtDNA haplotypes and by low values of differentiation among morphs from GBL suggesting that they have not spent long periods of time isolated from each other. The microsatellite-based divergence time estimates (<2000 years) also suggest that the two genetic groups of Lake Trout in GBL are also of relatively recent, post-glacial origin. It is, however, impossible to discount the possibility that divergent lineages did, in fact, colonize GBL, but that any historical signature of divergence resulting from isolation in distinct glacial refugia is obscured by recent and/or contemporary gene flow among morphs.

Lake Trout phylogeography

During the height of the Wisconsinan glaciation, from 85 000 to 10 000 ybp (Lindsey and McPhail, 1986; Pielou, 1991), North American Lake Trout survived in several distinct refugia spanning the continent (Wilson and Hebert, 1998). Lake Trout dispersed from these refugia to occupy its contemporary distribution and although their sample size was small (N=5), Wilson and Hebert (1998) hypothesized that Lake Trout colonized GBL from two distinct Beringian refugia (both a southern and northern refuge) to form an area of secondary contact. By contrast, our results suggest that Lake Trout in GBL did not colonize this system from multiple refugia within Beringia, but rather dispersed post-glacially to GBL from one distinct refuge, the Mississippian. This was evidenced by the clustering of all GBL samples with GSL in the STRUCTURE plot and with the GSL, PET and NIP samples in the factorial correspondence analysis.

In addition, the lack of lake-specific haplotypes and the low differentiation among samples from GBL suggests that these populations/morphs were not at one time isolated in distinct glacial refugia (cf. Bernatchez and Dodson, 1990). This is consistent with the glacial history of the region suggesting that GBL Lake Trout once occupied glacial Lake McConnell and thus have only been isolated from GSL Lake Trout (and other populations of Lake Trout from glacial Lake McConnell) for 8500 years. Furthermore, microsatellite and mtDNA differentiation between GBL samples and those from populations that likely originated from the Mississippian refuge was lower than comparisons between GBL and putative Beringian (JAY and SAN) or southern Beringian (ATL and NAK) samples. On balance, given the clear affinity of GBL samples to those from GSL, PET and NIP, in terms of both microsatellites and mtDNA it seems most plausible that this system was founded by one lineage of fishes that most recently inhabited glacial Lake McConnell, which was colonized by Lake Trout of Mississippian origin (Wilson and Hebert, 1998). This dispersal likely proceeded through the vast system of post-glacial lakes (that is, glacial Lakes Agassiz, Peace, McConnell and others) that permitted dispersal of Mississippian refuge fish to northern and Arctic Canada beginning about 10 000 years ago (Rempel and Smith, 1997; Wilson and Hebert 1998). This inference, combined with our analysis of genetic relationships within GBL and the shallow divergence time estimates for the most common morphs, suggests that the discrete morphs did not arise by allopatric origin in distinct refugia, but rather by intra-lacustrine divergence following colonization by a single glacial lineage of Lake Trout.

Conclusions

We have shown that morphs of Lake Trout from GBL are genetically differentiated from one another, yet overall they exhibit strong genetic similarities to one another relative to those from outside of GBL. This, combined with divergence time estimates between morphs and the clustering of all GBL Lake Trout samples with putative samples thought to represent a Mississippian glacial lineage, suggests that processes not necessarily involving isolation in allopatry have been important for generating morphological and genetic diversity in this system. Thus, our results provide additional insights into the origin and maintenance of biodiversity, including ecotypic and genetic variation among freshwater fishes, in recently deglaciated, north temperate aquatic habitats, which will be vital for monitoring and assessing potential changes to intraspecific variation if conditions or habitats are altered (for example, Taylor et al., 2006). Furthermore, our data suggest that GBL is an excellent system for further studying the mechanisms involved in divergence that has ensued in situ including ongoing contemporary gene flow and the maintenance of reproductive isolation among closely related populations in the initial stages of speciation.

Data archiving

Data available from the Dryad Digital Repository: doi:10.5061/dryad.1368p.