Introduction

Range expansion is a critical step in which evolutionary forces act to shape the genetic and phenotypic diversity in natural communities (Taylor and Keller, 2007). Expansion is generally thought of as a series of successive population founder events along an axis of colonization which leads to the most recently colonized area or ‘wave-front’ (Excoffier et al., 2009; Slatkin and Excoffier, 2012). Repeated bottlenecks during expansion are expected to result in genetically impoverished populations in the wave-front. This prediction is based on a model of expansion common to invasive species that lack extrinsic barriers to dispersal, where the rate of expansion is limited only by the traits intrinsic to the colonizing populations on the range margin (for example, reproduction rate, dispersal and so on). Many native species are also undergoing range expansion due to human-induced climate change (Hickling et al., 2006; Chen et al., 2011). However, in contrast to invasive species, populations undergoing climate-driven expansions may not suffer dramatic losses in genetic diversity (Pluess, 2011; Nullmeier and Hallatschek, 2013; Dai et al., 2014; Garnier and Lewis, 2016; Monzón et al., 2016). Genomic tools can provide the resolution needed to directly disentangle the evolutionary and demographic history of natural populations of non-model organisms (Shafer et al., 2015).

The process of expansion has both stochastic and deterministic elements that can be viewed as a spatial analog to genetic drift (Slatkin and Excoffier, 2012) and natural selection (Shine et al., 2011). Individuals colonizing a new area will likely carry mutations present at low frequency in the source population that can drift (‘surf’) to high frequency in subsequent generations in the newly colonized area. This spatial version of genetic drift, or ‘allele surfing’, can lead to genetic structuring and differentiation of wave-front populations (Hallatschek et al., 2007; Swaegers et al., 2015). Deleterious mutations can also surf due to low effective population size and accumulate as a genetic load of expansion (Peischl et al., 2013), a prediction empirically verified in humans (Henn et al., 2015; Peischl et al., 2016).

Another consequence of range expansions is hybridization. Introgression has long been recognized as an important potential source of genetic variation for adapting to a new environment (Lewontin and Birch, 1966; Pfennig et al., 2016). Although theory indicates that introgression is likely to occur during expasion (Currat et al., 2008; Excoffier et al., 2009), genomic studies (for example, White et al., 2013; Swaegers et al., 2015; Monzón et al., 2016) have tended to overlook the possibility of gene flow between closely related species (but see Mastrantonio et al., 2016). Species’ boundaries can be continuous and gene pools porous (Harrison, 1990; Wu, 2001); however, few empirical studies on the genomics of range expansion have looked for evidence of hybridization when not suspected a priori. As genome-wide markers become increasingly available, long standing questions regarding the frequency and consequences of hybridization and range expansion (for example, Seehausen, 2004; Lowe et al., 2015; Abbot et al., 2016; Canestrelli et al., 2016; Gompert and Buerkle, 2016) in natural population are increasingly tractable.

Hybridization between divergent lineages is more likely following habitat disturbances, for example during range expansion where low population density results in relatively few con-specific mates and less competition against hybrids (Seehausen, 2004). Mammalian examples include canids during colonization of North America (Reich et al., 1999; von Holdt et al., 2016), polar bears with ongoing climate change (Kelly et al., 2010; Cahill et al., 2015) and humans, that are thought to have hybridized with at least three hominin species while expanding out of Africa (Huerta-Sánchez et al., 2014; Fu et al., 2015; Mondal et al., 2016).

In many cases, interspecific gene flow can lower the mean fitness of a population through the introduction of mal-adapted alleles and deleterious epistasis (Lynch, 1991). Sometimes, however, outcrossing with a divergent lineage can lead to local adaptation through the introgression of pre-adapted alleles (Hedrick, 2013). One example is the Western Mediterranean mouse (Mus spretus), a rodent with populations that adapted to rodenticide through interspecific introgression of resistance genes from the house mouse, M. musculus (Song et al., 2011). Introgression of Denisovan DNA is thought to have resulted in adaptation to Arctic conditions in the Inuit from Greenland (Racimo et al., 2016), and in high altitude adaptation in Tibetans (Huerta-Sánchez et al., 2014). Paralleling the history of Tibetan people, genomic evidence indicates modern Tibetan Mastifs adapted to high altitude after hybridizing with gray wolves native to the Tibetan Plateau (Miao et al., 2016). Thus, interspecific gene flow on the wave-front may provide the pre-adapted genetic substrate needed to uncouple gene pools from a local adaptive maximum and obtain a higher global peak, further facilitating establishment and expansion.

P. leucopus populations in southern Quebec provide an opportunity to investigate the genomic consequences of a range expansion in a small mammal. P. leucopus is native to eastern North America (King, 1968), and one of many species undergoing range expansion which is likely due to anthropogenic climate change (Myers et al., 2009; Chen et al., 2011; Roy-Dufresne et al., 2013). Dispersal of individual P. leucopus varies widely, although average movement of individual mice has been estimated at about 330 m in Indiana (Khrone et al., 1984) and 930 m in Quebec (Gaitan and Millien, 2016). Over the past few decades, the northern distribution of this species has expanded at an estimated 10 km per year northward in southern Quebec (Roy-Dufresne et al., 2013). P. leucopus is sympatric with closely related P. maniculatus across most of its distribution and both species are key components of small mammal communities in North America. P. leucopus is typically found in deciduous forests (Vessey and Vessey, 2007), while P. maniculatus is more associated with conifer forests and, being adapted to colder winter temperatures (Pierce and Vogt, 1993), reaches higher latitudes where P. leucopus is absent. However, as it expands, P. leucopus is ecologically replacing P. maniculatus (Wan, 2014), becoming the dominant Peromyscus in some regions of southern Quebec (for example, Grant, 1976; Roy-Dufresne et al., 2013).

P. leucopus and P. maniculatus belong to sister clades which the fossil record suggest diverged ~500 000 years ago (Hibbard, 1968; but see Fiset et al., 2015). Both species have a more closely related sister species with which they can hybridize and produce fertile offspring, P. polionotus and P. gossypinus, respectively (Dice, 1937; Maddock and Dawson, 1974). After unsuccessful attempts to experimentally hybridize P. leucopus and P. maniculatus (Dice, 1933, Dice (1968) concluded that complete reproductive isolation had occurred. Decades later, through hormonally induced ovulation and artificial insemination, Dawson et al. (1972) and Maddock and Dawson (1974) were able to produce live hybrids, although they were deemed inviable. Following this work, it was assumed these two species were reproductively isolated across their entire range. However, Leo and Millien (2016) recently used microsatellite markers and found putative hybrids of both species in natural populations from southern Quebec.

We use individual-based, genome-wide data (Andrews et al., 2016) to investigate the demographic history of P. leucopus in southern Quebec during its northward expansion. We looked for the signal of range expansion and investigated the evidence of genetic bottlenecks and allele surfing, namely decreased genetic diversity, divergent allele frequencies and potentially introgression, among northern populations. We use this species-pair as a case study to examine whether sympatric species, reproductively isolated likely under reinforcement (Doty, 1972, 1973), are more likely to hybridize during range expansion.

Materials and methods

Study system and DNA extraction

We used P. leucopus samples from 13 forested sites (n=229) and P. maniculatus from one forested site (‘Pm’, n=11) in southern Quebec, collected during the summer of 2013 and 2014. Sites were grouped into three transects, the northern (N), south-eastern (SE) and south-western transects (SW), separated by the St. Lawrence and Richelieu Rivers (Figure 1). Studies of P. leucopus using microsatellites and mitochondrial markers have indicated that the St. Lawrence and Richelieu Rivers inhibit gene flow, with differentiated populations on either side (Ledevin and Millien, 2013; Rogic et al., 2013; Fiset et al., 2015), while agricultural fields are relatively ineffective barriers (Marrotte et al., 2014). We thus suggest that our three transects represent distinct expansion routes that may provide independent genetic patterns of range expansion, taking the northern sites in each transect (for example, N4, SW5/SW4, SE4) as the approximate range margin. A search of capture records on Vertnet reveals that a putative P. leucopus individual has been caught further north of our northern-most sample site. However, this individual was a juvenile and visually identified, making it especially difficult to differentiate from P. maniculatus. In fact, according to the measurements taken by the collector (for example, ear and tail), this individual more closely resembles P. maniculatus (Lindquist et al., 2003). We extracted DNA from liver and muscle tissue using a 3-day phenol-chloroform protocol (Sambrook et al., 1989). Species were identified via PCR amplification of a mitochondrial COIII sequence, as described in Rogic et al. (2013).

Figure 1
figure 1

Study area and the geographic location of 14 genotyped populations in southern Quebec. Major barriers to gene flow (rivers) are labeled.

RAD-seq library

Extracted DNA was sent to the Institut de biologie intégrative et des systèmes (IBIS) at Université Laval for individual-based library preparation using a modified version of the original genotyping-by-sequencing protocol (Elshire et al., 2011). DNA from each individual was digested with the rare cutting (1 cut /4096 bp) restriction enzyme PstI (CTGCAG) to seed markers, followed by MspI (CCGG), a frequent cutter (1 cut /256 bp) used to define read size. Next, a barcoded adapter with a complimentary end to the PstI site and a common adapter with a complimentary end to the MspI site were ligated to the DNA fragments. Following ligation, sets of 48 samples were pooled and PCR was applied to the barcoded DNA fragments using primers that hybridize to the adapters. Each library composed of 48 samples had 100 bp paired-end reads sequenced on a single Illumina HiSeq 2000 lane at the Genome Quebec Innovation Center.

SNP genotyping and population genetic summary statistics

We used the Stacks pipeline (Catchen et al., 2013a) to process raw data and identify single-nucleotide polymorphisms (SNPs). Raw reads were demultiplexed and quality filtered using the process_radtags script of Stacks 1.37. We followed the recommendation of recent studies (Nam et al., 2016; Shafer et al., 2017) and aligned to a reference genome for SNP calling. We aligned all processed reads to the scaffold-level assemblage of the P. maniculatus bairdii reference genome (NCBI assembly accession: GCF_000500345.1) with Bowtie2 V 2.2.8 (Langmead and Salzberg, 2012) using local alignment. We removed one P. maniculatus and six P. leucopus individuals that had relatively few reads sequenced. This left 223 P. leucopus and 10 P. maniculatus samples (Table 1). Scripts were executed using the ref_map.pl wrapper in Stacks (v. 1.44) to generate a catalog of RAD loci with a minimum depth of coverage of 4 reads (-m 4). The populations script was then used to obtain two SNP data sets for population genomic analysis, one consisting only of the 13 P. leucopus populations (n=223, ‘P. leucopus-only’ data set hereafter), and a second data set that included the 10 P. maniculatus individuals (n=233, ‘complete’ data set hereafter). Genome-wide markers were filtered so that each SNP had a minimum allele frequency (MAF) of at least 2.5% (—min_maf 0.025) and a heterozygosity of 0.50 or less to avoid paralogs (—max_obs_het 0.50). In addition, each locus had to be found in every population, that is 13 populations for the P. leucopus-only data set (-p 13) and 14 populations in the complete data set (-p 14), as well as in at least 60% of individuals in a given population (-r 0.60).

Table 1 Site and genetic diversity summary of P. leucopus individuals sampled from 13 sites on one northern and two southern transects

The following population genetic summary statistics were then calculated for each data set: nucleotide diversity (Pi), homozygosity (Homobs, Homexp) and heterozygosity (Hetobs, Hetexp), inbreeding coefficient (FIS), private alleles and differentiation (FST). Nucleotide diversity is related to expected heterozygosity and is an overall measure of genetic variation (Catchen et al., 2013b). FIS measures the degree of inbreeding due to non-random mating, or the change in observed homozygosity relative to the expected value. FST represents the amount of inbreeding due to random mating in a finite population and is used as a measure of population subdivision and genetic drift (Crow and Kimura, 1970; Summarized in Ewen et al., 2012). Thus, FIS measures changes in genotype frequencies while FST measures changes in allele frequencies. We applied an FST correction, using Fisher’s exact test to assess whether allele frequencies at a given locus are statistically different from zero. Loci with FST estimates that fail to reach statistical significance according to the P-value (α=0.05) are set to zero (—fst_correction P-value). To estimate ancestry proportions and construct a population tree, the complete data set was thinned to the first SNP of each RAD locus to reduce linkage disequilibrium (—write_single_snp). The ordination methods used (discriminant analysis of principal components (DAPC) and principal component analysis (PCA)) do not make assumptions of independence and thus all SNPs identified in each RAD locus were used in those analyses.

To examine patterns of genetic diversity, we followed White et al. (2013) and regressed measures of heterozygosity and nucleotide diversity onto ‘relative expansion distance’. For sites on the northern transect which are aligned in a north-eastern direction (Figure 1), we use distance to the southern-most site. We hypothesize that northward range expansion in this transect is likely occurring in a north-eastern direction and used the Euclidian distances (km) of sites N2, N3 and N4 to site N1 (Table 1). South of the St. Lawrence River, the Richelieu River is aligned in a north-south direction with no major barrier to direct northward expansion. For the south-western and south-eastern transects, we therefore used the distance to the southern-most latitude, or the Euclidian distance to the latitude of the southern-most site (that is, SW1 and SE1). We regressed genetic diversity measures for each transect separately. To evaluate evidence of increased drift among northern populations, we looked for a northward gradient of increasing FST. We used the two-dimensional isolation by distance model of Rousset (1997) and regressed pair-wise FST estimates (FST/1−FST) of 10 populations and their southern-most site as a function of the natural log of relative expansion distance. We excluded southern-most sites (FST=0) and pooled our 10 populations.

Genotype imputation

Next generation sequencing, especially the RAD-seq protocol, produces a patchy genotype matrix with a considerable amount of missing information. Imputation of missing data has been shown to aid allele frequency estimates and improve the power of genomic studies (Li et al., 2009). For this reason, we used the software LinkImpute (Money et al., 2015) to estimate missing genotypes in PLINK files (Purcell et al., 2007) for all data analyses. LinkImpute uses a k-nearest neighbor genotype imputation method and was designed for RAD-seq data from non-model organisms.

Genomic structure and admixture

To summarize overall genetic variation among individuals, we performed a PCA on both data sets (with and without P. maniculatus samples) using the package LEA (Frichot and François, 2015) in R. To look for major lineages within P. leucopus, we further analyzed P. leucopus population structure by transforming genomic variation to PCs and applying a discriminant analysis (Jombart et al., 2010) in the R package adegenet (Jombart, 2008). DAPC describes genomic clusters using synthetic variables and focuses on the genetic variation observed between groups, while minimizing within-group variance. Genomic clusters (K) were identified using a k-means algorithm and the number of clusters evaluated with the Bayesian information criterion (BIC; Supplementary Figure 1a). DAPC was performed on the two K values with the lowest BIC (K=2 and K=3). We used alpha-score optimization to examine the trade-offs between over-fitting and power to discriminate. This procedure identifies the PCs with the highest alpha scores using spline interpolation. The PC with the highest (‘optimal’) alpha-score provides an approximation and with other integers of similar alpha scores a range from which it is adequate to choose the number of PCs to retain (Jombart and Collins, 2015). We performed this procedure on 110, 10 and 6 PCs (Supplementary Figure 2), which showed that the first six PCs have very similar high alpha scores. We thus kept the first six PCs (12.1% of variance) for DAPC presented here. Individual ancestry proportions of P. leucopus and P. maniculatus were estimated using a model-based method implemented in ADMIXTURE using default settings (Alexander et al., 2009; Alexander and Lange, 2011). ADMIXTURE uses cross-validation to evaluate models with K ancestral source populations and maximum-likelihood (ML) algorithms to estimate ancestry proportions. We define a hybrid as any individual with visible hetero-specific ancestry.

Population splits and test of bifurcating model

We used the software TreeMix 1.13 (Pickrell and Pritchard, 2012) to build a ML tree and analyze the demographic histories of our P. leucopus populations. To construct a bifurcating tree, this method uses a Gaussian approximation of genetic drift and the covariance in allele frequencies between population pairs. We used PLINK 1.9 (Purcell and Chang, 2015) to stratify allele frequencies and then the plink2treemix.py script (Pickrell and Pritchard, 2012) to convert the stratified file (.frq) to TreeMix format. The ML tree was rooted using P. maniculatus. The amount of genetic drift estimated to have occurred in a lineage is proportional to the horizontal branch length on a given branch.

Unlike the clustering methods described above, TreeMix explicitly tests for the presence of gene flow by identifying population pairs that poorly fit a bifurcating evolutionary history, and then models gene flow events between these populations to increase the fit. Migration events are added in order of their statistical significance, such that the first gene flow event added to the tree is the one that most increases the likelihood of the model. The direction of gene flow is estimated based on asymmetries in the covariance matrix of allele frequencies, as shown on Figure 1 of Pickrell and Pritchard (2012). Here, modeled the first two ML trees, a tree based on a strict bifurcating model and a tree with one gene flow event.

Results

Data sets

Sequencing our 240 samples generated 1 117 579 057 raw paired-end reads. After removing seven individuals with few reads sequenced and filtering for population genomic analyses, we obtained 38 144 bi-allelic SNPs for the P. leucopus-only data set and 33 919 SNPs for the complete data set. Thinning the complete data set to the first SNP per RAD locus yielded 12 507 SNPs for analysis of ancestry proportions and construction of a ML tree.

Population genetic statistics

The mean observed heterozygosity (Hetobs) in P. leucopus is 0.145 and mean inbreeding coefficient (FIS) is 0.074 (s.d.=0.056, Table 1). Populations along the northern transect show a significant linear decrease in nucleotide diversity with relative expansion distance (P0.002; adjusted R2: 0.993; n=4, Supplementary Figure 3b). In the south-western transect, populations from sites SW4 and SW5 show higher levels of diversity than sites further south. The most northern site in the south-eastern transect (SE4) also shows decreased diversity, although the relationship is not significant (Supplementary Figure 3 e and f). Interestingly, the northern-most population, N4, shows exceptionally high levels of observed heterozygosity (Hetobs=0.169) and low levels of inbreeding (FIS=0.026) relative to other sites in the northern transect. As a comparison, the population on Montreal island (N2) has an almost 8-fold higher FIS (FIS=0.204). This excess heterozygosity seen in the northern-most site also stands in contrast to the most proximal population to the south, N3, which shows a 6-fold higher inbreeding coefficient (FIS=0.157). Private alleles ranged from 0 (in 8 of 13 sites) to 14 (in the northern-most site N4).

Concordant with post-glacial expansion, we found a significant reduction in observed heterozygosity relative to the expected levels under Hardy–Weinberg equilibrium in two of three P. leucopus transects: the northern and south-western transects (one-tailed paired t-test; northern transect: P0.031; south-western transect: P0.017; south-eastern transect: P0.104). The reduction in observed heterozygosity was also significant when all populations were pooled (P0.003).

To further investigate the history of recent and post-glacial range expansion, we analyzed patterns of allele frequency divergence among populations. Overall, there was relatively low levels of differentiation among P. leucopus populations, with a mean FST of 0.037 (95% confidence interval [CI]: 0.033–0.041). However, 78 pair-wise FST estimates (Supplementary Table S1) between and within transects show relatively high divergence of P. leucopus south and north of the St. Lawrence River (Figure 2a). Population divergence within transects is also higher along the northern transect, consistent with populations having experienced greater amounts of drift. Populations south of the St. Lawrence River are less differentiated from each other than from populations north of this river, a pattern which supports our expectation of distinct post-glacial expansion routes separated by the St. Lawrence River.

Figure 2
figure 2

(a) Summary of 78 pair-wise FST values of populations within and between transects. The plot summarizes data presented on Supplementary Table S1. (b) Isolation by distance model of P. leucopus (Rousset, 1997), plotting differentiation (FST/ (1−FST)) between 10 populations and their southern-most site against the natural log of the distance. The regression is y=0.0089x–0.0149.

Increased genetic drift due to allele surfing during range expansion is expected to result in wave-front populations with high FST. Consistent with this expectation, the northern-most site N4 shows exceptionally high FST (N4: FST=0.059, CI: 0.050–0.068) relative to the average. We detected a significant linear relationship between differentiation relative to southern-most sites and distance to this site (Figure 2b; P0.006; Adjusted R2= 0.58). The slope of the regression is 0.0089, thus a rough approximation of the number of south to north effective migrants is 112 individuals. The linear increase in FST remained significant when site N4 is removed (P0.02).

A substantially greater divergence is expected across species than between populations within species. Accordingly, FST estimates between P. leucopus and P. maniculatus are approximately an order of magnitude greater (mean FST=0.384; SD=0.018) than the mean FST observed within P. leucopus in southern Quebec.

Population genomic structure in P. leucopus

Overall genetic variation in P. leucopus summarized with PCA shows population structure is associated with geography. Lineages on either side of the St. Lawrence River are separated by PC1 (8.1% of variance) while individual populations tend to differentiate across PC2 (2.6% of variance; Figure 3). A gradient in population structure is observed within some populations, including SE4, SW4, N3 and especially N4. These populations cluster more widely across PC space relative to more southern populations, such as SE1, N1, SE2 and N2, where individuals tend to cluster more closely together. The northern site SW5 however did not display this gradient pattern while it is apparent in more southern sites, such as SW1 and SW2.

Figure 3
figure 3

PCA on genome-wide variation of 13 P. leucopus populations (n=223) from southern Quebec.

Analyzing between-group differences with DAPC supports two main lineages (K=2) north and south of the St. Lawrence River (Supplementary Figure 4a and b), with more subtle variation separating populations east and west of the Richelieu River (Figure 4). These results provide additional evidence that the Richelieu River and the preceding waterways split a lineage expanding northward from an eastern glacial refugium. Results also show evidence of migration across major barriers. Both PCA (Figure 3) and DAPC (Figure 4) show that one individual from the south-western transect (SW1) with genetic variation more typical of the genomic background found in the northern transect. Similarly, a few individuals from sites SE3 and SE4 have genotypes more similar to those found in the south-western transect.

Figure 4
figure 4

Population genomic structure inferred by DAPC on 6 retained PC eigenvalues (inset plot: 12.1% of variance) shows three (K=3) P. leucopus lineages separated by the St. Lawrence and Richelieu rivers. The outset plot shows each individual represented by a row, transects delimited by arrows, and color (red) denoting which of three clusters each individual is assigned to. Differences between individuals east and west of Richelieu River (clusters 2 and 3) are subtle compared to the differences to individuals on the northern transect (as shown by Supplementary Figure 4). A full color version of this figure is available at the Heredity journal online.

We used ADMIXTURE to analyze the ancestry of P. leucopus from southern Quebec and look for evidence of hybridization with P. maniculatus. After cross validating 12 potential ancestral populations, our results indicate up to nine relatively well-supported source populations contributing to ancestry (Supplementary Figure 1b). Ancestry proportions under the well-supported model of K=4 shows P. leucopus population genomic structure associated with extrinsic barriers to dispersal (rivers), as well as admixture in SW5 and SW4 from different transects (Figure 5), which is consistent with the relatively high nucleotide diversity and heterozygosity observed in these two sites. Supporting ordination, the most ancestral P. leucopus clusters (K=3) consist of individuals on either side of the St. Lawrence River (Supplementary Figure 5), while the Richelieu River separates more subtle sub-structure, consistent with a more recent divergence. We find additional sub-structure in the northern-most site, N4 and admixture in N3 (K=6).

Figure 5
figure 5

Admixture proportions for two, four and nine putative ancestral populations for P. leucopus and P. maniculatus. Each column represents an individual and the length of colored segments denote the proportion of an individual’s genome inherited from one of K ancestral populations. K=4 was strongly supported by BIC (Supplementary Figure 1). A full color version of this figure is available at the Heredity journal online.

An admixture pattern consistent with range expansion (Falush et al., 2016) is observed in the two northern sites (for example, K=9, Figure 5). The second northern-most population, N3, is admixed with ancestry common to the south (for example, N1) and to the north (for example, N4). Falush et al., (2016) showed through simulations that this type of pattern would be consistent with gene flow, in this case from N1 and N4 to N3. However, such patterning of population genomic structure is also consistent with a bottleneck during expansion of the N4 lineage subsequent to the split from N3 (Excoffier and Ray, 2008; Falush et al., 2016), or perhaps introgression in N4 from an unsampled (‘ghost’) population (Currat et al., 2008; Falush et al., 2016). Indeed the relatively high observed levels of heterozygosity and low inbreeding (FIS) in N4 suggest the latter. Further sampling of P. leucopus and P. maniculatus north of the St. Lawrence River is warranted.

Ancestry proportions also show evidence of ancestral gene flow across the St. Lawrence River (for example, K=3, Supplementary Figure 5). Consistent with older admixture, the shared ancestry proportions are well homogenized within the populations. More recent admixture can be inferred when admixture proportions vary within a population because recombination has not had time to distribute genomic blocks across chromosomes. We find hierarchical structure in SW5 and SE1 (K=6), likely due to isolation. Site SW5 is situated in a park surrounded by the city of Longueuil, which has experienced a 58-fold increase in census human population during the most recently documented 143-year period (~286 mice generations), increasing from 3977 in 1871 to 231 409 in 2014 (Statistique Canada, 2014). Site SE1 is surrounded, other than to the north-east, by Lake Champlain and the Richelieu River. Finer population genetic structure is apparent in SW1 (K=9).

Admixture with P. maniculatus

Notably, our ancestry results show evidence of hybridization with P. maniculatus. A predicted consequence of expanding populations with low effective population size is potential introgression from closely related local species. ADMIXTURE results at K=2 show the presence of one putative P. leucopus hybrid in site SE3 with 15.5% of its ancestry inferred as coming from P. maniculatus (Figure 5). Two additional individuals have 1.6 and 1% inferred P. maniculatus ancestry, a percentage at least three orders of magnitude greater than in non-admixed individuals (0.001%). Given that most individuals in this site showed little to no interspecific ancestry, we suggest this result may be explained by a recent hybridization. Specifically, these ancestry proportions are consistent with ancestry expected in F3 (12.5%) and F6 (1.6%) hybrids backcrossed with P. leucopus. Our results also show that two out of the 10 P. maniculatus sampled share 18.8% and 7.5% ancestry proportions with P. leucopus. Further support for hybridization comes from PCA (McVean, 2009) on the complete data set, which shows the two species separated by PC1 (22% of explained variance) and the putative hybrids with more than 10% hetero-specific ancestry as intermediate genotypes compared to other individuals (Supplementary Figure 6).

Bifurcating P. leucopus population tree

In line with our analyses above, the ML tree supports an older divergence of P. leucopus on either side of the St. Lawrence River and a more recent split caused by the Richelieu River (Figure 6a). Interestingly, however, the ML tree places populations N2 and SW5 as basal lineages. The ML tree indicates that the inbred population on Montreal island (N2) is from an earlier split sharing common ancestor with a lineage that later diverged to expand north (N3 and N4), while another remained at a more southern latitude (N1). Consistent with being on the wave-front, the northern-most site shows substantial accumulated drift since splitting from N3, as indicated by the longer branch length. The ML tree indicates the population partially isolated by the city of Longueuil (SW5) is a basal P. leucopus lineage, one likely historically present in that area. Surprisingly, however, the ML tree indicates SW5 diverged from other P. leucopus in Quebec prior to the split of populations on either side of the Richelieu River. This would suggest that other south-western populations (for example, SW3) are more closely related to mice in the south-eastern transect than they are to mice in SW5, a hypothesis not supported by DAPC (Figure 4) and the ancestry proportions (K=4, Figure 5) that reveal shared genomic structure among individuals from south-western populations. Indeed, allele frequencies in SW5 are significantly less diverged from other south-western populations (FST=0.017) than they are relative to south-eastern populations (FST=0.026; one-tailed paired t-test: P0.016). A possible explanation for the placement of SW5 as a unique lineage south of the St. Lawrence River is the complex evolutionary history of mice in this site, which show ancestral admixture from the northern and south-eastern transects (for example, K=4; Figure 5), as well as built-up sub-structure due to more recent isolation.

Figure 6
figure 6

Population relationships inferred with TreeMix (a) under a bifurcating model without gene flow. Horizontal branch length represents amount of evolutionary change according to the drift parameter (related to FST). (b) The residual fit of model (a), a scenario without gene flow after divergence. The scale bar represents 10 × the average standard error of the entries in the sample covariance matrix (Pickrell and Pritchard, 2012). Residuals are qualified using the color palette. Positive residuals (green/blue/black) represent population pairs that are more closely related than presented on the tree, and thus candidates for admixture. Residuals below zero (yellow/orange/red) represent population pairs that are less closely related to each other than shown on the tree. (c) Population tree showing gene flow between P. maniculatus and P. leucopus from SE3. Direction of gene flow is shown by the arrow. The color of the arrow denotes the weight of migration (proportional to the amount of genetic ancestry contributed by the immigrant). A full color version of this figure is available at the Heredity journal online.

Test for introgression

Methods that calculate ancestry proportions, like ADMIXTURE, can give misleading inferences of population structure when sampling is incomplete or uneven (Falush et al., 2016; Wang, 2016). We therefore used TreeMix to formally test the bifurcating model of genealogical evolution. A residual fit of the bifurcating tree without gene flow (Figure 6b) shows that this model does not fit the evolutionary history of the Peromyscus populations in our study area, and instead supports a reticulate event between P. leucopus from SE3 and out-group P. maniculatus (Figure 6c). The direction of gene flow from demographically stationary P. maniculatus to colonizing P. leucopus (Fiset et al., 2015) is also consistent with predictions of introgression during range expansion (Currat et al., 2008). Allowing for two more migration events supports past gene flow from N1 to SW1 (Supplementary Figure 7a and b), and from SE1/SE2 to P. maniculatus (Supplementary Figure 7c and d). However, adding three migration events may be over-fitting as the topology of the ML tree is altered under this model.

Discussion

We used genomic tools to investigate the history of range expansion of P. leucopus in southern Quebec. We predicted allele surfing and genetic bottlenecks during recent northward expansion would lead to northern populations with reduced genetic diversity, divergent allele frequencies and potentially admixture. Consistent with these processes, we described a northern-most population with reduced nucleotide diversity, divergent allele frequencies, a high number of private alleles and heterozygosity levels that suggest admixture with an unsampled population. We detected two main lineages of P. leucopus on either side of the St. Lawrence River and additional sub-structure in populations east and west of the Richelieu River. We argue that this structure reflects the evolutionary history of post-glacial expansion of two lineages. We also documented gene flow between P. maniculatus and P. leucopus, undermining the notion of complete reproductive isolation between these two species.

Spatial genetic gradients in P. leucopus

We found clines of genetic diversity consistent with the northward range expansion of P. leucopus. Two of three transects showed northern populations with the least genetic diversity in at least one measure. The northern transect showed a significant linear decrease, although the overall difference in diversity was relatively small considering the distance between the southern-most and northern-most populations (~150 km). Contrasting our expectations, a positive trend was seen in the south-western transect due to higher levels of genetic diversity in admixed northern locations. Our isolation by distance model estimated gene flow (112 effective migrants) along a south-north axis. However, this estimate is based on a number of likely unrealistic assumptions (Whitlock and Mccauley, 1999) and likely differs among transects, and thus should be viewed only as a rough approximation and interpreted with caution.

Various factors likely limited our power to detect latitudinal gradients, such as inter-site differences. For example, larger forest patches contain more individuals that are likely to harbor greater genetic diversity. The presence of partial barriers to dispersal between sites (for example, roadways) in the south-western transect (Rogic et al., 2013; Marrotte et al., 2014) may further blur any relationship between genetic diversity and expansion distance at the spatial scale and sampling design used here. Wave-front dynamics may also play a role, as simulations have shown that climate-driven range expansions maintain genetic diversity on the expanding range margin (Nullmeier and Hallatschek, 2013; Dai et al., 2014), although diversity may still decrease under models of rapid climate change (Garnier and Lewis, 2016).

Further limiting our ability to detect consistent spatial gradients in genetic diversity, our results indicate that some populations are partially isolated and are therefore unlikely to have undergone recent expansion. For example, the relatively high levels of inbreeding in mice on Montreal island (N2) suggest this population has been isolated for some time. It is also likely that the response of P. leucopus to climate change, in particular in the south-western transect, is an increase in abundance rather than northern expansion. For example, although P. maniculatus was the more commonly found Peromyscus species as recently as 40 years ago, a historical presence of P. leucopus at low abundance (Grant, 1976) may have contributed genetic variation during more recent migration from the south.

P. leucopus is one of many North American mammals (Lessa et al., 2003) that expanded north during interglacial events of the Pleistocene. Mitochondrial DNA data indicates that at the start of the current interglacial period, two P. leucopus lineages expanded north from eastern and western refugia ~17 000 and 15 000 bp, respectively (Rowe et al., 2006; Fiset et al., 2015). The genetic legacy of the older and more geographically extensive post-glacial expansion may be exposed as a reduction of observed heterozygosity relative to Hardy–Weinberg expectation in P. leucopus populations sampled in southern Quebec, a region covered by the Laurentide Ice Sheet 20 000 years ago. However, it is worth noting that reduced heterozygosity in a population at Hardy–Weinberg equilibrium can also be detected due to allele frequency differences among sub-populations, known as the ‘Wahlund effect’. P. leucopus in our study area consist of three meta-population lineages separated by geographic barriers. Because our comparison of observed and expected heterozygosity was based on estimates from individual sub-populations rather than meta-populations, the role of the Wahlund effect is likely relatively minor.

Population genetic structure in P. leucopus

Geographic isolation is generally thought to be a common precursor to generating genetic diversity and ultimately new species (Sobel et al., 2010). We found that genomic variation in P. leucopus is spatially structured and associated with geographical barriers. We identified two putative post-glacial lineages of P. leucopus north and south of St. Lawrence River. These lineages likely began diverging during the Late Pleistocene in western and eastern glacial refugia, respectively (Rowe et al., 2006; Fiset et al., 2015). More subtle genetic correlations can differentiate P. leucopus on either side of the south-north oriented Richelieu River. This supports a scenario in which ancestral populations from the eastern glacial lineage were separated during northward post-glacial expansion. Populations expanding from an eastern refugium would have been separated by what was then the ancient Champlain Sea left by the retreating ice sheet, that eventually became Lake George and Lake Champlain in the states of New York and Vermont, respectively, and the Richelieu River in Quebec (Rogic et al., 2013; Fiset et al., 2015). P. leucopus populations from the western refugium, on the other hand, colonized the Great Lakes area during post-glacial expansion (Rowe et al., 2006). Ongoing climate change is thought to be responsible for the rapid expansion of P. leucopus across the Upper Michigan Peninsula (Myers et al., 2009), where it is ecologically replacing P. maniculatus (Wan, 2014), a process which can be facilitated by hybridization (Rhymer and Simberloff, 1996). P. leucopus north of the St. Lawrence River belong to the western lineage and an extension of the expansion detected across the Upper Michigan Peninsula. The higher levels of drift and inbreeding observed north of the St. Lawrence River than south of it indeed suggest the north shore may represent a more recently established colonization route.

Populations on an expanding range margin are expected to become differentiated due to the spatial accumulation of genetic drift and selection experienced during expansion. During expansion into a new territory, low frequency variants may become established in newly colonized areas where they, in subsequent generations, can increase to high frequencies and lead to ‘genetic revolutions’ of wave-front population structure (Excoffier and Ray, 2008). Consistent with processes associated with expansion, including genetic bottlenecks, allele surfing and potentially introgression, we showed that the northern-most population had relatively low nucleotide diversity, the most differentiated allele frequencies, the highest number of private alleles and heterozygosity estimates that suggest outcrossing with an unsampled population. In addition, ancestry proportions showed a pattern of population structure consistent with a northward bottleneck (Falush et al., 2016), although similar population genomic structure may arise from admixture with an unsampled population (Falush et al., 2016) during expansion, which population genetic statistics suggest may have occurred. Additional sampling north of the St. Lawrence River will help unravel the evolutionary history this P. leucopus lineage.

We found evidence that some populations are at least partially isolated. For example, P. leucopus on Montreal island have higher levels of inbreeding relative to populations on the mainland. We also found population structure in a P. leucopus population in an area (city of Longueuil) which has had a 58-fold increase in census human population during the most recently documented 143-year period, or roughly equivalent to 286 generations of P. leucopus (Statistique Canada, 2014). Urbanization can have major effect on population genetic structure through isolation and likely imposes novel selective pressures. Munshi-South et al. (2016) used human population size and percent impervious surface cover in New York City to study the effects urbanization had on genomic variation of P. leucopus. They found that these proxies best explained the genome-wide variation in urban P. leucopus and Harris et al. (2013) identified candidate genes potentially under selection in an urban environment, many of which were associated with the immune system. The isolated P. leucopus populations identified here thus provide an opportunity to study different evolutionary processes such as adaptation in response to a recent human expansion, and the effects of inbreeding on fitness in an island population.

Population splits

Our results of the demographic history of P. leucopus in southern Quebec support a scenario of post-glacial and ongoing expansion. Our analysis of population splits showed putative glacial lineages as clades separated by the St. Lawrence River, and a post-glacial divergence of populations on either side of the Richelieu River. Our results also suggest that P. leucopus was historically present and that one lineage became isolated on Montreal island, splitting off from a lineage that remained on the mainland (N1) and one that expanded north (N3 and N4). Indeed our results showed that the most northern site has accumulated substantial genetic drift. We found evidence that suggest the population surrounded by recent urbanization (SW5) has been historically present and has had a complex evolutionary of ancient admixture and isolation. The placement of this population as basal relative to all others south of the St. Lawrence River may due to the complex evolutionary history of ancient admixture and recent isolation that has occurred in this lineage.

Secondary contact between P. leucopus and P. maniculatus

One of the most neglected consequences of range expansion is the potential for local introgression. We hypothesized that northern expansion of P. leucopus populations into territory historically occupied by P. maniculatus would lead to introgression. We used genome-wide data to test this prediction, and our results showed the presence of a few putative recombinant hybrids. However, these putative hybrids were identified using clustering methods that are sensitive to sampling design and can produce results that are easily misinterpreted (Novembre and Stephens, 2008; Schwartz and McKelvey, 2009; Falush et al., 2016; Puechmaille, 2016; Wang, 2016). Such clustering methods also do not directly test for admixture between divergent gene pools. To overcome this, we used TreeMix to explicitly test for gene flow and reject the bifurcating model of genealogical evolution, confirming secondary contact between P. leucopus and P. maniculatus.

Only five individuals showed hybrid ancestry, which is likely due to ecological differences between the two species. P. leucopus was sampled mostly from small woodlots scattered across an agricultural matrix where, as a generalist, P. leucopus thrives. In contrast, there is evidence of more introgression occurring in larger and more continuous forests where P. maniculatus and P. leucopus can be locally found (Leo and Millien, 2016). In particular, Leo and Millien (2016) sampled both species where P. maniculatus was more abundant and found putative P. leucopus hybrids with ancestry proportions composed mainly of P. maniculatus ancestry, a result consistent with predictions (Currat et al., 2008). However, the results of Leo and Millien (2016) come with the aforementioned caveat of clustering methods as well as less resolution provided by microsatellites. Taking a genomic approach and sampling broadly in sites where both species are found, including north of the St. Lawrence River, and testing for gene flow will help expose the frequency and extent of hybridization between these two species of Peromyscus in Quebec.

Although P. maniculatus was represented by only 10 individuals in our study, we believe our inference of interspecific admixture is robust. For example, a mammal population genomic study showed that sufficient genetic variation could be sampled from a population with as few as 10 individuals to construct a complete evolutionary history (Trask et al., 2011). The putative hybrids found by Leo and Millien (2016) were also sampled from a more evenly distributed data set of 69 P. leucopus and 84 P. maniculatus. Furthermore, the accuracy of hybrid identification increases with the number of markers and level of divergence (Vaha and Primmer, 2006), reaching 90% accuracy with fewer than 50 microsatellite markers in populations one third as diverged as the species in this study. Importantly, Vaha and Primmer (2006) also showed that unsampled source populations have negligible effects on hybrid identification, and McVean (2009) demonstrated that the positions of genotypes on PC space relative to non-admixed individuals can also identify hybrids (for example, Supplementary Figure S6), even when missing source populations.

Our results challenge the idea of complete isolation between P. leucopus and P. maniculatus, supporting an emerging view that reproductive isolation can vary depending on individual genotype and demographic context (Kozlowska et al., 2012; Chunco, 2014; Mandeville et al., 2015; Araripe et al., 2016; Gompert and Buerkle, 2016; Senerchia et al., 2016). The presence of hybrids on the range margin also supports the idea that pre-mating barriers (for example, Doty, 1972, 1973) between sympatric species may be altered during climate-driven range shifts; in essence a displacement of co-evolved genotypes driven by anthropogenic climate change (Crispo et al., 2011; Chunco, 2014).

Hybridization is expected to lead to costs in fitness due to intrinsic and extrinsic selective pressures. However, some of the fitness costs associated with hybridization during range expansion may be mitigated by evolutionary phenomena associated with expansion in small populations, including softened selection (Peischl et al., 2013), fixation of compensatory adaptive alleles (Poon and Otto, 2000), or a decrease in the rate of small-effect deleterious mutations (LaBar and Adami, 2016). Selection against, and purging of polymorphic incompatibility loci (Cutter, 2012) may also result in a rebound of hybrid fitness (Araripe et al., 2016). These processes may contribute to the conversion of a climate-driven wave-front of a single species to a moving hybrid zone (for example, Chunco, 2014; Taylor et al., 2014, 2015).

Conclusion

We showed that genetic variation in P. leucopus from southern Quebec is associated with geography and shaped by post-glacial and recent range expansion. Our results of genome-wide population structure and genetic diversity indicate Quebec was colonized by two putative glacial lineages, one of which was further isolated during post-glacial expansion by the Richelieu River. Consistent with the predicted consequences of range expansion (Currat et al., 2008; Excoffier and Ray, 2008; Excoffier et al., 2009), we found a northern-most population showing low nucleotide diversity, the lowest effective population size, divergent allele frequencies, the highest number of private alleles and heterozygosity levels that suggest introgression from an unsampled local population. Analysis of ancestry identified putative P. maniculatus and P. leucopus hybrids. Testing the bifurcating model genealogical evolution confirmed a reticulate phylogeny and past gene flow between P. maniculatus and P. leucopus in Quebec. This result adds to the rapidly increasing list of natural hybrids discovered between species-pairs previously thought to be isolated, supporting the idea that reproductive isolation between closely related species can vary based on genotype and demographic conditions. More generally, our study supports the view that species boundaries not only change over time but also vary across space.

Data archiving

The raw sequence data has been deposited at the NCBI Sequence Read Archive (SRA) repository and can be accessed under accession number PRJNA397983.