Introduction

Most species experience shifts in their distribution ranges during their evolutionary history (Sexton et al. 2009; Gaston 2009), but contemporary large-scale shifts are considered a pervasive consequence of climate change (Parmesan 2006; Lenoir and Svenning 2015), with far-reaching implications for biodiversity conservation (Bellard et al. 2012; Pauls et al. 2013), human well-being and ecosystem function (Pecl et al. 2017).

Studies dealing with geographic range shifts aim to elucidate their ultimate causes (Rowe et al. 2015; Pacifici et al. 2020) and explore their impact on local ecosystems (Brodie 2016; Wallingford et al. 2020). In this context, biological invasions provide a framework to investigate processes underlying their successful establishment into new regions (Sherpa and Després 2021) and how populations may respond during range expansions (Hastings et al. 2005). However, few empirical studies focus on the genetic outcomes of contemporary range shifts in native species (Sánchez-Guillén et al. 2016; Gilbert et al. 2017). Understanding the genetic patterns and processes underlying expansions and colonization events in free-living organisms is increasingly necessary to accurately predict range shifts in response to climatic changes (Excoffier et al. 2009). This knowledge may inform management responses for monitoring control of invasive species (Mooney and Cleland 2001; Hulme et al. 2008), reducing the health and environmental costs of globalization due to the spread of pathogens (Crowl et al. 2008; Engering et al. 2013), disease vectors (Fuller et al. 2012) and agricultural pests (Suckling et al. 2017).

Population genetics can help determining the origin of expansions, colonization routes (Hewitt 2000; García et al. 2020), and changes in effective population size (Norén et al. 2015) as well as their impact on genetic diversity and population structure (Swaegers et al. 2013). For example, genetic diversity may decrease along an expanding range due to consecutive bottlenecks and serial founder events (Slatkin and Excoffier 2012). At the expanding front, the random effects of genetic drift will add up at each new colonization event, reducing genetic diversity and effective population size and increasing rare alleles due to surfing in the direction of that expansion (Austerlitz et al. 1997; Excoffier et al. 2009). These patterns may be more dramatic when expansions proceed slowly under long-distance colonization events (Ibrahim et al. 1996). However, when expansions occur rapidly and/or between large demes, more genetic diversity will be preserved due to weaker genetic drift (Heppenheimer et al. 2018). Spatial expansions can thus leave a genetic footprint at the metapopulation level by generating allelic gradients (Conord et al. 2012), promoting the surfing of rare alleles (Paulose and Hallatschek 2020), and structuring a genetic mosaic of populations.

Multiple interacting factors should be considered in the study of range expansions. On one hand, individual traits affect local abundances and population growth, and together with species’ life-history traits, determine the ability to colonize new areas and establish viable populations (Angert et al. 2011; Estrada et al. 2016). On the other hand, interspecific interactions modulate local population growth rates and dispersal success (Svenning et al. 2014; Legault et al. 2020). In addition to biotic factors, the most recognizable limitations to the geographic expansion of populations are natural barriers like mountains, rivers or environmental heterogeneity (Stojak et al. 2019ab). More recently, human development (e.g., roads, agriculture, and urbanization) has created new barriers to dispersal (Caplat et al. 2016). For example, linear transportation infrastructures like roads or railways can mimic the effects of natural barriers (D’Amico et al. 2018), which combined with changes in land use (Brodie 2016) may alter species distribution.

In this study, we explore the factors shaping a contemporary and sudden range expansion in the Iberian common vole (Microtus arvalis asturianus) across a patchy, agricultural landscape, and its genetic consequences. This species offers unique opportunities to understand the consequences of range expansions. First, M. a. asturianus is a small rodent endemic to the Iberian Peninsula (García et al. 2020) representing one of the few well-documented cases of a mammal naturally expanding its range to a large area—ca. 75,000 km2 in roughly two decades (Luque-Larena et al. 2013; Jareño et al. 2015). Until the 1970s, the species was restricted to mountain areas (between 800 and 1200 m above sea level) in the northern half of the Iberian Peninsula (Niethammer and Winking 1971; Rey 1973, Fig. 1). Since then, studies have recorded the progressive presence of the species in the Duero river basin (Delibes 1989, Luque-Larena et al. 2013; Jareño et al. 2015), considerably extending the boundaries of the originally known range in Spain. Second, M. a. asturianus in the newly colonized area has a cyclic nature, with large and regular fluctuations in population size and mass dispersals associated with high-density phases (Luque-Larena et al. 2013; Mougeot et al. 2019). Finally, the newly colonized region is a highly heterogeneous landscape, ideal for studying both natural and human-related barriers to dispersal.

Fig. 1: Sampling sites and main rivers in the study area.
figure 1

a Geographical location of the sampling localities included in this study, and original distribution range of M. a. asturianus in the 1970’s (grey-shaded area) (distribution adapted from Rey 1973, Fig. 1, and Niethammer and Winking 1971, Fig. 1). Sampling localities are coloured according to the groups inferred in SAMOVA analysis (see Method section): Group A (Purple), group B (Black), group C (light blue), group D (Violet), group E (Yellow). Numbers correspond to ID of localities in Table S1. A shaded relief image of Castilla y León and surroundings is used as background, and the main mountain ranges are indicated: Cantabrian Mountains (CM), Sistema Central (SC), Sierra de la Demanda (SD). b Map of the main rivers in the study area and delimitation of sub-basins (represented by colours; modified from Morales and Ortega Villazán 2002). The inset map shows the geographical location of Castilla y León within Spain.

We analysed mitochondrial DNA (mtDNA) and nuclear genetic data (short tandem repeats, STR) to reconstruct the geographic expansion of M. a. asturianus in the Iberian Peninsula across two different timescales: mtDNA for a broader evolutionary timescale and STR for a fine-scale resolution. Specifically, we asked (i) whether the extant populations show a genetic footprint of recent expansion that fits theoretical expectations (directional gradients in genetic structure and diversity from source to recently colonized sites); (ii) whether the occurred range expansion is a consequence of migration from one or more source populations, or local demographic expansions from remnant populations (“colonization from within”; Huth et al. 2015), and iii) whether environmental, ecological or geographical factors, or a combination thereof, explain the demographic history and genetic structure of M. a. asturianus along its recently invaded range.

Methods

Sampling and laboratory procedures

We collected 1119 voles between 2012 and 2017 from 57 localities throughout most of its range in Spain (Fig. 1, Table S1). We live-trapped voles using LFATDG Sherman Live Traps (7.62 cm × 8.89 cm × 22.86 cm, H. B. Sherman Traps, Inc., Tallahassee, Florida, USA), and collected ear tissue samples using a sterile 2 mm tissue biopsy punch that we stored in 99% ethanol. We used only one individual per sampling location at each locality to exclude the potential presence in the sample of closely related individuals and avoid the misrepresentation of genotype proportions at each locality. All handling procedures were approved by the UCLM Ethics Committee (reference number CEEA: PR20170201) and followed the Spanish and European policies for animal protection and experimentation.

We successfully genotyped 885 samples for 9 neutrals short tandem repeats (STR) loci (Ma09, Mar012, Ma54, Ma66, MM6, Mar063, Mar003, Mar016, Mar102; Ishibashi et al. 1997; Gauffre et al. 2007), following PCR conditions in Domínguez et al. (2021). We also PCR-amplified a fragment of the mitochondrial cytochrome b (Cyt b) gene (928-bp) for all 1119 samples using primers L7 and H6 (Tougard et al. 2008) and conditions described in García et al. (2020). Further details are given in Supplementary text S1.

Population structure and differentiation

To infer the most plausible number of genetic clusters (K) to which localities can be assigned based on their STR genotypes, we used the spatially explicit Bayesian clustering method in BAPS 6.0 (Corander and Marttinen 2006). We ran the “spatial clustering of groups” model 10 times between K = 1 and K = 20. After finding the K value with the highest likelihood, we ran the Admixture model with 100 iterations to estimate the admixture coefficients (200 reference individuals from each cluster, 20 iterations). To complement the results from BAPS with an assumptions-free method; Hardy-Weinberg equilibrium (HWE) or linkage equilibriums (LE), we used a discriminant analysis of principal components (DAPC) in the R package “adegenet” (Jombart 2008). This method reduces the dimensionality of the genetic variation between groups by combining principal components analysis (PCA) and linear discriminant analysis (LDA). Using the PCs produced, the method creates discriminant functions that represent linear combinations of correlated alleles describing the greatest amount of variation in the genetic dataset. We used a cross-validation function (xvalDapc, 100 replicates) to determine the optimal number of PCs to be retained for the DAPC. We then ran the DAPC using 9 (a priori groups minus one) discriminant functions and assessed the proportion of successful reassignment of individuals to pre-defined groups using the clusters identified in BAPS.

We calculated FST values (Weir and Cockerham 1984) as a measure of population differentiation between pairs of genetic clusters using the R package “diveRsity” v1.1.9 (Keenan et al. 2013). Statistical significance was based on 9999 permutations for all estimates. To test whether genetic differentiation followed a pattern of isolation by distance (IBD), we performed a Mantel test (Mantel 1967) using linearized FST and Euclidean geographic distances between sampling sites for the expanded and original ranges separately (as depicted in Fig. 1, see also Table S1). In addition, we performed a spatial autocorrelation analysis using 10 km distance classes to test up to which distance the genotypes showed a positive correlation (i.e., they were more similar to each other than expected by chance). We performed both analyses using GENALEX v6.5 (Peakall and Smouse 2012) and assessed significance with 9999 permutations and 9999 bootstraps.

Recent evidence suggests that major rivers have played a significant role in shaping the spatial genetic structure and historical dispersal pathways of the common vole across Europe (García et al. 2020). Therefore, we performed an Analysis of Molecular Variance (AMOVA) to test which of the following structure scenarios explained a higher percentage of genetic variance: (a) among BAPS-inferred clusters, (b) between localities North and South of the Duero river, (c) between localities in the original and expanded ranges, and e) among localities in major river basins (according to Morales and Ortega Villazán 2002) (Fig. 1).

For mtDNA, we used AMOVA implemented in SAMOVA 2.0 (Dupanloup et al. 2002) to identify groups of localities that were geographically homogeneous and that maximized genetic differentiation relative to the other groups. We selected the partition scheme from K = 2 to K = 12 groups— that maximized differences between FCT (i.e. the among-group variance component) and FSC (i.e. the among-populations within groups variance component). We replicated the analyses five times for each K value with 1000 independent annealing processes. We calculated pairwise genetic differences between the inferred mitochondrial groups using FST.

A median-joining network was created from the Cyt b haplotypes using Haploviewer (Salzburger et al. 2011) under the best tree topology inferred in RAxML-NG (Kozlov et al. 2019). The ML tree was inferred using the TIM2 + I substitution model, as determined by jModelTest 2.1.4 (Darriba et al. 2012) under the Akaike Information Criterion (AIC), and nodal support was assessed by 100 bootstrap replicates. The ingroup included all the haplotypes of M. a. asturianus, and the outgroup was an individual collected in west France that belongs to the sister lineage of the Iberian one (García et al. 2020), both included within the European Western-South lineage (GenBank ID: MG874879).

We further explored detailed patterns of spatial genetic structure across the range of M. a. asturianus using a Genetic Landscape Shape Interpolation (GLS) analysis implemented in the software Alleles in Space (AIS) v1.0 (Miller 2005). We estimated the genetic structure across the landscape on the basis of mitochondrial genetic distances calculated using an inverse distance weighted interpolation across a uniform grid laid over the entire sampling area. We used a 35 × 35 grid surface with a distance weighting parameter (α) of 0.5 for an optimal visualization of diversity pattern.

Genetic diversity

For STRs, we used the R package “diveRsity” to estimate the observed (Ho) and unbiased expected heterozygosity (uHe), allelic richness (Ar), and inbreeding coefficient (FIS) in each locality and each of the genetic clusters inferred in BAPS. For mtDNA, we used Arlequin 3.5 (Excoffier and Lischer 2010) to calculate the number of haplotypes and polymorphic sites (nH), number of private haplotypes, and haplotype (Hd) and nucleotide diversity (π), per locality and for the SAMOVA-inferred clusters. To explore clines in genetic diversity, we tested for correlations between diversity estimators and latitude and longitude, both across the original and the recently expanded ranges.

Gene flow between genetic clusters

We used STRs and two different approaches to investigate gene flow between the genetic clusters previously inferred in BAPS. We used divMigrate-online (Sundqvist et al. 2016) to calculate a directional relative migration network between the clusters. This method provides a scaled value from 0 to 1, where higher values indicate greater gene flow compared with the rest of the data set. We used Nm (Alcala et al. 2014) as an inverse measure of genetic differentiation between clusters and 9999 bootstrap iterations to assess the significance of asymmetric gene flow.

We also used the coalescent method from Migrate-n 4.2.14 (Beerli and Felsenstein 2001) implemented in CESGA (Galician Supercomputer Center, Spain). We selected triads of clusters and ran all possible combinations between pairs of clusters. We replicated all analyses to ensure convergence. We used the Bayesian inference model with the Brownian mutation model, slice sampling, and uniform distribution for Θ and M. Prior distribution for Θ and M were defined after exploratory runs: Θ (min = 0, max = 100, δ = 10), M (min = 0, max = 100, δ = 10). We performed the MCMC search with 1 × 105 burn-in steps followed by 1 × 106 steps with parameters recorded every 100 steps. We employed a static heating scheme with four different temperatures (1.0, 1.5, 3.0, and 1 × 106), where acceptance–rejection swaps were proposed at every step.

Demographic history

We used STRs genotypes to test for a genetic signal of a recent genetic bottleneck following the heterozygosity-excess method (Luikart and Cornuet 1998) and implemented in BOTTLENECK v1.2.02 (Piry et al. 1999). This method compares expected heterozygosity (HE) in an empirical sample with the heterozygosity (HEq) that is expected at mutation-drift equilibrium, given the number of alleles observed in the sample. A sharp reduction in effective population size (Ne) may result in a transitory heterozygosity excess at selectively neutral loci. We used a mixed Two-Phase Mutational Model (TPM), which fits best the mutation processes of STRs (Di Rienzo et al. 1994). We considered 95% single-step mutations, 5% multi-step mutations, and variance among multiple steps of 12, following the recommendations of Piry et al. (1999). We used a one-tailed Wilcoxon test to test for an excess of heterozygosity in each of the BAPS-inferred clusters.

For mtDNA, we explored the demographic history of each group inferred in SAMOVA using two neutrality tests: Tajima’s D test (Tajima 1989) and Fu’s Fs test (Fu 1997). The significance of these statistics was assessed using 5000 coalescence simulations using DnaSP 6.0 (Rozas et al. 2017). Only significantly negative Fu’s Fs and Tajima’s D values were retained as evidence of population expansion (Tajima 1989; Fu 1997; Pilkington et al. 2008). For those groups that significantly deviated from neutrality, we examined the distribution frequency of the number of mismatches between pairwise sequences size. The goodness-of-fit between the observed and expected distributions under a scenario of population growth was tested by calculating the Ramos-Onsins and Rozas’s R2 statistic (Ramos-Onsins and Rozas 2002) and their associate p-value with 5000 coalescent simulations in DnaSP. We tested for signatures of demographic expansion by exploring temporal changes in Ne using the Bayesian Skyline Plot (BSP) method in BEAST v.2.5 (Drummond et al. 2012). We used a coalescent prior of exponential growth or constant population size, the HKY model of sequence evolution and a strict molecular clock with an average of µ = 3.27 × 10−7 mutations/site/year (García et al. 2020; Baca et al. 2020). We ran four independent analyses for 3 × 107 generations sampling every 3000 generations. Convergence was assessed in TRACER v.1.6. (Rambaut et al. 2016), by confirming that the effective sampling size for all parameters was higher than 200 and that independent runs yielded similar posterior distributions. We combined the four runs using LogCombiner v1.8.1 (http://beast.bio.ed.ac.uk/logcombiner) after discarding the first 10% samples as burn-in and obtained estimates and credible intervals for each parameter and demographic reconstruction using TRACER v.1.6 (Rambaut et al. 2016). We then checked if the 95% highest posterior density (HPD) of the growth rate (exponential.growthRate parameter) included zero, in which case a scenario of constant population size cannot be rejected.

Contribution of geography and environment to the expansion

To study the impact of landscape features on genetic differentiation during the range expansion, we used STR markers and performed both marginal and conditional (partial) distance-based redundancy analysis (dbRDA, Legendre and Anderson 1999). As a response variable, we used a linearized FST matrix (FST/1-FST) that was tested against the following predictors: i) geographic distance between sampling sites (GGD), ii) elevation of the sampling sites (ELE), iii) elevation difference between sampling sites (ELEDIF), iv) number of permanent rivers between sampling sites (RIVN), v) presence/absence of rivers (RIVBIN) between sampling sites, and vi) presence/absence of railways (RWSBIN) between sampling sites. The conditional test estimates the relative contribution of each predictor while controlling for the effect of Euclidean distances (see Supplementary text S2). All analyses were performed in the R package “STATS”, except the FST pairwise matrices that were calculated in “hierfstat” (Goudet 2005) using the pairwise.fst function. All predictors were standardised to have zero mean and unit variance, and the dbRDAs were performed following the guidelines provided in the R package “vegan” (Oksanen et al. 2019; see Supplementary text S2 for further details). Finally, we used the varpart function to obtain the percentage of explained variance after grouping our predictors according to the following criteria: (i) LTIS: linear transportation infrastructures, including the number of roads (RDSN) and railways binary predictor (RWSN) (ii) NBAR: natural barriers, containing the presence of rivers (RIVN) and differences in elevation between sampling locations (ELEDIF), (iii) IBD: isolation by distance (GGD), that represented Euclidean geographic distances between sampling locations.

Results

We did not find consistent signs of allele dropout or null alleles in Micro-Checker across the 57 sampling localities, and the frequency for null alleles across all loci was lower than 0.25 (Dakin and Avise 2004). Similarly, we did not find consistent departures from the expected HWE for any locus, and we retained all STR loci for further analysis (Tables S2, S3, and S4).

Population structure and differentiation

The most likely number of clusters inferred by BAPS was K = 10 (Fig. 2). The 57 localities were arranged in a large central cluster including 34 localities (cluster VIII), a central-east cluster including 13 localities (cluster VII), two clusters south of the Duero River including two and four localities (clusters IX and X, respectively), and six peripheral clusters formed by single localities (clusters I to VI) (Fig. 2 and Table S1). Clusters VII and VIII coincide closely with the expanded range of M. a. asturianus, and all the other clusters fall within the original range (Fig. 1). Most individuals (88.4%) were assigned with a membership coefficient ≥0.9 to their inferred cluster. The DAPC results largely agreed with the BAPS clusters, with 88% of individuals correctly assigned (membership coefficient ≥0.9) to the cluster in which they were a priori grouped (Fig. 2a).

Fig. 2: Genetic clusters and migration networks.
figure 2

a Bayesian analysis of population structure (BAPS; K = 10). The polygons are the result of Voronoi tessellation and each one is correspondent to a population (white dots). Polygons with the same colour belong to the same BAPS cluster, numbered from I to X as in Table S1. The table at the right showed the results of the discriminant analysis of principal components (DAPC) detailing the proportions of successful reassignment (pSR) of individuals to the clusters identified in BAPS. b Relative migration network between all clusters, calculated based on Nm (Alcala et al. 2014). c Significant asymmetric migration values after 9999 bootstraps. Colour and thickness of the arrows increase with the relative magnitude of gene flow.

The AMOVA revealed significant genetic structure in three out of the four scenarios tested. In all cases, most of the variance was explained among individuals within the total population (ΦIΤ, Table S5). The only scenario that showed greater differentiation among regions than between populations within regions was when localities were arranged based on the BAPS clusters (ΦCT = 0.064, ΦSC = 0.036, Table S5). Pairwise FST values between clusters were all significant and ranged from 0.01 between clusters VII and VIII to 0.18 between clusters I and VI (Table S6).

The SAMOVA analysis using mtDNA estimated an optimum structure of five mitochondrial genetic demes (groups A–E) (Fig. 1). The proportion of variation among groups (FCT) reached a plateau at K = 5 groups (FCT = 0.466, 46.6% of the total variation P < 0.001), whereas FSC, the proportion of variation among populations within groups, dropped notably for the same value of K (FSC = 0.129, 6.8% of the variation, P < 0.001). At K = 5, the variation within populations was 46.4%. The geographic distribution of the mtDNA groups was partially consistent with the Bayesian clustering results using STR data (Fig. 2). Group A was almost entirely distributed to the north of the Duero river, occupying the entire agricultural plain, and eastern and north-western mountain areas (Sierra de la Demanda and Cordillera Cantábrica) of the original distribution of M. a. asturianus (Fig. 1). This group encompasses clusters III, VII and VIII inferred in BAPS (Fig. 2), and roughly correspond to the recently expanded range. Group B included locality 1 in the Ebro river basin (Fig. 1), in the easternmost part of the range, and corresponds to cluster I in BAPS (Fig. 2). Groups C and E were found in the northern slope of the western part of the Sistema Central, south of the Duero river, and like BAPS clusters IX and X, contains the localities 54, 55, 56 and 57 (Figs.1 and 2). Group D was found scattered across the mountains bordering the Duero basin: Cordillera Cantábrica to the north, the eastern part of the Sistema Central to the south, and Sistema Ibérico to the east (Fig. 1). Group D corresponds to BAPS clusters IV, V, VI, and locality 53 that was included in BAPS cluster IX (Fig. 2). Pairwise FST values between the five SAMOVA groups were all significant and ranged from 0.29 to 0.65 between groups B-C and B–D, respectively (Table S7).

The haplotype network showed shallow divergences among the 46 haplotypes found, most of which differed by 1–2 mutational steps from their closest relative (Fig. 3a). All the haplotypes found represent a monophyletic set of representatives of the Iberian lineage (García et al. 2020), which is considered within the European Western-South lineage. The most common haplotype (Haplotype 1, 34% of all individuals) belonged to SAMOVA group A, whereas the second most frequently observed haplotype (Haplotype 3, 17% of individuals) was found in two groups: A and D. Group A contained 28 private haplotypes, 21 of which were singletons. Group B contained four private haplotypes, groups D and E contained two, and group C had only one private haplotype. The network showed a simple structure, with a few central, highly frequent haplotypes from SAMOVA groups A, D, and E, and many low-frequency haplotypes radiating from them in a star-like fashion. A minimum of 12 mutations separated the outgroup (European SW lineage) from any Iberian haplotype (Fig. 3a).

Fig. 3: Haplotype network, patterns of spatial genetic differentiation and demographic changes over time.
figure 3

a RAxML haplotype network based on Cyt b sequences in the Spanish M. a. asturianus. The number of the haplotype is indicated. Colours corresponds to SAMOVA groups as in Fig. 1. Node size is proportional to haplotype frequency, and the length of the branch to the number of substitutions separating haplotypes. The branch leading to the group of M. a. arvalis sampled from France (outgroup) indicates the ancestral node of the asturianus clade. b Hillshade map of the study area overlaid with values of the genetic landscape shape plot interpolation (GLSI), showing the patterns of spatial genetic differentiation among 57 localities of Iberian common vole (black dots). Dark blue colour indicates areas with high pairwise genetic distances and red colour is indicative of areas of low pairwise genetic distance. Main rivers in the study area are colour-coded as in Fig. 1.c Mismatch distribution of pairwise differences for SAMOVA group A. Red line represent the observed frequencies. Dotted line corresponds to the expected distribution fitted to the data under a model of constant population size and green line under a model of population expansion (Ramos-Onsins and Rozas’s R2 value and p-value are indicated). d Reconstructed demographic history of M. a. asturianus obtained from the Bayesian skyline plot (BSP) in the five mtDNA SAMOVA groups (A to E) showing changes in effective population size over time. Purple solid line and the margins of the shaded area represent the median estimate and highest 95% posterior density for SAMOVA group A.

The genetic landscape analysis showed a clear geographic pattern, with the highest genetic differentiation among sites in the eastern part of the expanded range and lower differentiation in the western part (Fig. 3b). The greatest genetic distances were found among localities along the tributaries of the Carrion and Pisuerga rivers in the northeast (BAPS clusters VII and VII, and SAMOVA group A). Genetic distances were smaller among localities in the south-western of the expanded range, along the River Esla, north of the Duero River (see also Fig.1), suggesting more gene flow among localities in that area. We obtained qualitatively similar results regardless of the grid size or distance weighting parameters used.

An overall pattern of IBD was detected by the spatial autocorrelation analysis using STRs and mtDNA (Fig. S1), with a decline in the average relatedness as a function of linear geographic distance, with positive and significant values up to 30 km (STRs) and 70 km (mtDNA). The x-intercept provides an estimate of the extent of non-random (positive) genetic structure. The Mantel test performed on STRs revealed that the IBD slope for the expanded range (i.e. sampling sites belonging to clusters VII and VIII) was smaller (Mantel r = 0.35, p = 0.006) than that for the original range (Mantel r = 0.41, p = 0.029).

Clines in genetic diversity

Genetic diversity was high across the 9 STR loci (Table S8). The number of alleles per locus ranged from five (locus Mar063) to 27 (locus Mar012), and uHe ranged between 0.41–0.92 (Table S8). The average number of alleles across all loci in each locality ranged from 3.9 (locality 6) to 10.9 (locality 43; Table S9), and uHe varied from 0.64 (locality 6) to 0.85 (localities 7 and 8). Ar showed more similar values, ranging from 3.6 (locality 6) to 6.5 (locality 35). Values of uHe (0.71–0.83) and Ar (4.83–7.17) were more similar among genetic clusters (Table S10) than among localities (Table S9). FIS was overall low for all localities and clusters (Tables S9 and S10). The highest values of Ar were observed in the central plateau, North of the Duero river, and the greatest FIS were found in the western part of the range and a few populations to the south of the Duero river (Fig. 4). Ho was similar to uHe with only a few exceptions, indicating that most localities did not deviate from HWE (Table S2). In the expanded range, genetic diversity (Ar, Ho, uHe) was significantly correlated with longitude, decreasing towards the west and the leading edge of the expansion front (Table 1). Ar also showed a significant correlation with latitude and decreased southwards in the expanded range (Table 1). Analysing the original range of M. a. asturianus, we observed that diversity (Ho, uHe, and Ar) significantly decreased northwards, but no longitudinal correlation was observed.

Fig. 4: Map showing the spatial distribution of genetic diversity parameters based on STR and mtDNA markers.
figure 4

a Allelic richness; b inbreeding coefficient; c observed heterozygosity; d unbiased expected heterozygosity; e haplotype diversity, f nucleotide diversity.

Table 1 Latitudinal and longitudinal clines (Spearman correlation coefficients) of diversity indices based on STRs and mtDNA across the original M. a. asturianus range and for the recently expanded range.

The number of mitochondrial haplotypes in each of the SAMOVA groups ranged from 5 to 36 (see also Table S11 for values of each sampling site). Both haplotype and nucleotide diversity differed among SAMOVA groups (Table 2). Four groups (A, B, D, E) showed similar values of haplotype diversity, ranging from 0.70 to 0.74, whereas group C showed the lowest value (Hd = 0.55). Nucleotide diversity values ranged from 0.0015 to 0.0021 (Table 2). Furthermore, within the expanded range we found a significant longitudinal cline of genetic diversity (Hd and π, see Table 1). As with STRs, diversity decreased towards the west and the leading edge of the expansion front. Nucleotide diversity also decreased towards the south of the expanded range (see also Fig. 4). In contrast, within the original range, we found no evidence for such longitudinal or latitudinal clines (Table 1).

Table 2 Molecular diversity and neutrality tests for the five mtDNA groups of M. a. asturianus defined by SAMOVA analysis (see also Fig. 1).

Migration patterns and gene flow

Using divMigrate we estimated that the greatest relative migration rate occurred symmetrically between clusters VII and VIII, in the expanded range (Fig. 2b). These two clusters showed intermediate to low migration rates with the remaining genetic clusters. We only estimated weak signals of asymmetric gene flow from some peripheral clusters (II, III, IV, IX) to the central ones (VII, VIII) (Fig. 2c).

Results from Migrate-n showed that the effective numbers of migrants per generation was low, except for clusters VII and VIII, which received the greatest number of migrants (Table S12). Moreover, Bayesian posterior estimates of migration rates were the highest and with the narrowest 95% CI for clusters VII and VIII.

Demographic patterns

Our bottleneck analysis did not detect any departure from mutation-drift equilibrium in the expanded range (one-tailed Wilcoxon tests, p > 0.6 and p > 0.9 for clusters VII and VIII, respectively). Considering the whole range, only cluster III showed a significant genetic signal of a recent bottleneck (significant excess of heterozygosity) (one-tailed Wilcoxon tests, p = 0.019; Table S10).

Our mtDNA data suggest that SAMOVA group A has undergone a recent population expansion, as evidenced by the “star-like” network (Fig. 3a) and the significant values of the neutrality tests (negative values for Tajima’s D and Fu’s Fs tests, Table 2). The mismatch distribution is unimodal and does not deviate from the modeled distribution under a population growth scenario (Ramos-Onsins and Rozas’s’ R2 = 0.021, p = 0.199, Fig. 3c), supporting population expansion in this group. The Bayesian Skyline Plot (BSP) revealed a dramatic increase in the effective population size of this group in the recent past, starting around 125 generations ago (~125 years) and persisting until a few generations ago (Fig. 3d). In all other SAMOVA groups the neutrality tests were non-significant (all p > 0.21), and the 95% HPD of the growth rate parameter in BSP included zero, so a hypothesis of constant population size for SAMOVA groups B, C, D, and E cannot be rejected.

Genetic-environment associations during the expansion

Both marginal and conditional tests showed a significant relationship between all the environmental predictors and our response variable (FSTLIN). The presence/absence of rivers (RIVBIN) explained the largest variance proportion (20.94%), followed by the number of rivers (RIVN, 19.93%), and the presence/absence of railways (RWSBIN). These predictors remained significant after accounting for the Euclidean geographical distances among sampling sites (GGD) in partial (conditional) dbRDA tests (Table 3). According to AIC, the simplest model included ELEDIF, RDSN, and RIVBIN (20% of variance). Including additional variables (ELE, RWSBIN, and GGD) did not increase the variance explained nor reduce substantially the AIC value (Table 3). The variance inflation factor among predictors was below five, so multicollinearity was not of big concern. The variance partitioning analysis for the best model (22.78%) showed that most of the genetic variance in our dataset (11.6%) was attributable to natural barriers (NBAR, Fig. S2), followed by linear transportation infrastructures (LTIS, 8.1%), and the interaction between these two predictors explained a 2.1% of the response variable. Isolation by distance (IBD) explained only 1.8% of the genetic differences.

Table 3 Distance-based redundancy analysis (dbRDA) results for all 57 localities.

Discussion

The Iberian common vole M. arvalis asturianus is native to the mountain areas of northern Spain and the Cantabrian region and has recently and vastly expanded its range. We found genetic evidence that the expanded range originated from a single-source area in the Sierra de la Demanda and surroundings, with subsequent dispersal towards the west and the south. The genetic homogeneity across the colonized range is compatible with a short time lag, large Ne, and high gene flow that could quickly erase previous signals of genetic divergence. Both historical and inherent natural landscape features, together with anthropogenic disturbances, explain a large portion of genetic variance in contemporary populations and have had a significant impact on the colonization pathways used by voles.

Evolutionary background and historical population demography

Previous studies employing mtDNA have proposed that M. a. asturianus is an ancient endemism of the Iberian Peninsula that remained isolated from other European lineages during the Last Glacial Maximum (Tougard et al. 2008; Baca et al. 2020; García et al. 2020) and suffered a severe reduction of effective population size around the Pleistocene/Holocene transition (Baca et al. 2020). The time to the most recent common ancestor of the M. a. asturianus clade (~9000 ybp), and its subsequent diversification (8500–4000 years ago) suggest that a historical expansion occurred after the Younger Dryas presumably related to an increase in habitat availability (Baca et al. 2020, 2023; García et al. 2020). Although the diversification of the arvalis lineages occurred thousands of years ago, the rapid range expansion in recent decades represents an additional step in the evolutionary history of the population, probably in response to new environmental changes related to climate and land use, such as increased fall precipitation and winter temperature, and increases in irrigated and green crops, that have occurred in the Iberian northern plateau creating a favourable habitat for the expansion and settlement of the species (Jareño et al. 2015).

The populations that remained confined in the southern part of the original range—in the northern slope of the western parts of Sistema central (Fig. 1)—likely derived into the mitochondrial SAMOVA groups C and E, which would explain the little inferred gene flow to other SAMOVA groups. The lack of a significant signature of population expansion suggests long-term isolation and restricted geographic distribution. In contrast, the populations in the north-eastern region (Sierra de la Demanda and surroundings, see Fig. 1), represented by SAMOVA group A, showed clear evidence of expansion, and a high frequency of the most common and central haplotypes in the phylogenetic network, which supports this group being the ancestral. We also found that the populations in the northern and eastern mountain ranges and the eastern part of the Ebro Valley, SAMOVA groups D and B, were the most derived haplotypes in the haplotype network. This pattern could reflect regional differences in population histories and/or in the origin of these two sets of SAMOVA groups (SAMOVA groups A, B, and D vs. SAMOVA groups C and E). The lack of a genetic signature of demographic and/or spatial expansion in all the M. a. asturianus SAMOVA groups, except group A, suggest an overall pattern of demographic stasis during the last thousands of years. These pieces of evidence are compatible with a “refugia within refugia” scenario (Gómez and Lunt 2007), where populations of M. a. asturianus remained isolated in at least two glacial refugia associated with mountain systems of the Iberian Peninsula, one located in the south (Sistema Central), and another located in the north/northeast (Cordillera Cantábrica/Sierra de la Demanda).

Recent population expansion

Neutrality tests and BSP analysis indicate that a group of voles experienced a remarkable increase in population size and geographical range in the last century. This very recent increase in effective population size fits well with the spatial expansion reported by other authors based mainly on trapping and occurrence data (Luque-Larena et al. 2013; Jareño et al. 2015). In a short time, the species has experienced an astonishing geographical expansion from its original range in the mountains and across the whole agricultural plain of the Duero river (~75,000 km2 in about 30 years), which matches the area occupied by the expanding group (SAMOVA group A, and BAPS clusters VII and VIII). It is also remarkable that since the early 1980s, shortly after the colonization of the new area, some authors have reported the occurrence of cyclical demographic patterns in the newly established populations of M. a. asturianus (Delibes and Brunett-Lecomte 1980; Delibes 1989; González-Esteban et al. 1995; Luque-Larena et al. 2013). It has been suggested that recurrent population outbreaks coupled with high local population densities may have contributed to the fast colonization of the whole region.

Overall, our analyses support a pattern of IBD across the distribution range of the Iberian common vole, which is steeper in the original than in the newly colonized range. This pattern could simply reflect the older occupation of the original range and/or the spatial limitation of dispersal during colonization. Assuming a generation time of 1 year (García et al. 2020), populations would have expanded between 30 and 50 generations ago, which can be enough time for IBD to develop in species with restricted dispersal capabilities, such as M. arvalis (Gauffre et al. 2008; Domínguez et al. 2021). Therefore, our results are congruent with a fine-scale dispersal limitation during the colonization of Iberian voles, which is also in agreement with the significant spatial autocorrelation suggesting that gene flow per generation is spatially restricted enough to generate local genetic structure.

Range expansions mediated by short-distance dispersal will result in the gradual loss of diversity, increasing differentiation, allele surfing, and the establishment of clines in the frequencies of neutral alleles (Austerlitz et al. 1997; Excoffier and Ray 2008; Goodsman et al. 2014). This process can be counteracted by a high growth rate and gene flow between populations, which will reduce the potential effect of drift and genetic differentiation (Austerlitz et al. 1997). The clines of STR genetic variation in the recently colonized area are consistent with the clines in mtDNA diversity, which we interpreted as a signal of range expansion (Swaegers et al. 2013). There is also lower genetic differentiation between genetic clusters in the newly colonized area than among sites in the original range. Generally, genetic structure develops when populations differentiate via local adaptation and/or genetic drift, both of which are long-lasting processes. We expect lower population structure in newly colonized areas than in the original range because (i) there has not been enough time for local adaptation or drift to change allele frequencies, (ii) the spread of individuals in a novel range is associated with high gene flow that has a genetic homogenising effect, and (iii) the founder effect implicit in the colonization process will result in a reduction of genetic diversity, slowing down the addition of new allelic variants to the genetic pool and accumulating genetic differences in the established population (Bohonak 1999; Domínguez et al. 2021). In this case, we hypothesize that the short time since the onset of the expansion (~20 generations) together with the cyclic nature of the species, characterized by high Ne during the peak phases and the recurrent and massive dispersal events (high gene flow), may have genetically homogenized populations (Domínguez et al. 2021) and prevented genetic differentiation in the expanded range. Firstly, we found that the recent expansion is not associated with a genetic bottleneck but that both nuclear and mitochondrial genetic diversity decrease in populations that are farther from the presumed source of the expansion. The geographic distribution of haplotypes, nucleotide diversity and private haplotypes suggest that the surroundings of Sierra de la Demanda-La Rioja (locality 3 in this study) are the most likely source of the recent range expansion of M. a. asturianus, confirming previous suspicions (Delibes 1989). Directionality in the expansion, therefore, implies that our results do not support the “colonization from within” hypothesis, that is, an expansion from reduced or unnoticed populations within the expanded range (Huth et al. 2015). Secondly, migration and Genetic Landscape Shape (GLS) analyses support high genetic exchange at the edge of expansion front, with small genetic distances and high rates of nuclear gene flow among localities in the west side of the expanded area, and large genetic distances between localities that are closer to the putative source of the expansion (i.e. the earliest colonized sites in the east). This high genetic exchange at the edge of expansion front may be related to habitat-related factors or to selection on dispersal-enhancing phenotypes. For example, better quality and less fragmented agricultural matrix may facilitate natural movements between patches and locations in the most recently colonized area. Other possibility could be that individuals that are better dispersers are selected positively during the colonization process. Studies in different species, including voles (Forsman et al. 2011), show that colonization of new habitats may be dominated by highly dispersive individuals, whereas in older patches individuals are more philopatric (Duckworth 2008). If dispersal traits are heritable, spatial sorting (Shine et al. 2011) could increase the frequency of high-dispersal alleles at the expansion front, thus favouring the evolution of higher dispersal ability there (Forsman et al. 2011; Miller et al. 2020). This process would lead to higher genetic exchange and lower differentiation between nearby, expanding populations—a pattern that agrees with our results.

Factors affecting the recent expansion

The geographic expansion of this species resembles a process of continuous filling of vacant territories, in which individuals colonize available and suitable areas in their surroundings. The rapid spread of Iberian voles can be attributed to short-range human-aided dispersal. It is possible that the profound changes occurred in Spanish agricultural landscapes since the 1960s (e.g. land concentration, abandonment of low-yielding croplands and grasslands, irrigation expansion) (González de Molina et al. 2017), have provided voles with a favourable and highly productive landscape (e.g. alfalfa crops, Delibes 1989; Jareño et al. 2015; Rodríguez-Pastor et al. 2016). This might have allowed a notable expansion in only 20–30 generations while reaching large effective population sizes following the emergence of cyclical dynamics (see below).

We also find evidence of natural barriers (rivers) that may have impacted genetic processes such as gene flow during expansion. The boundaries of nearby river basins coincide with a sharp shift from localities with low to high gene flow (i.e. high and low pairwise genetic distances, respectively, in the GLS). Large rivers are efficient barriers to gene flow that drive genetic differentiation in many terrestrial animals (e.g. Gerlach and Musolf 2000; Ledevin and Millien 2013), including the common vole (Stojak et al. 2016). Large rivers in Spain have been proposed both as a physical and ecological barrier for mammal species (O’Regan 2008), but few studies have evidenced their role as a barrier to gene flow in terrestrial mammals (but see Alda and Doadrio 2014). The major role of rivers during the expansion process is further supported by dbRDA analysis, demonstrating that rivers are the most important single predictor of genetic differentiation of Iberian common voles.

Our results from the dbRDA analysis showed that, in addition to natural barriers like rivers, the presence of pervasive infrastructures in anthropogenic landscapes, such as roads and railways, explains a large part of genetic variance among populations, despite the short geographical distance between them. Several studies have revealed that human landscape disturbances have significant genetic effects on wild species at different temporal and spatial scales (Benítez-López et al. 2010; García et al. 2021). Because the expansion occurs when most linear infrastructures were already built, the system provides a natural experiment for studying the impact of human-driven habitat fragmentation on a species’ genetic variation. Recent studies have shown that Iberian voles are genetically unaffected by roads at a fine (home-range) scale (Domínguez et al. 2021). This finding agrees with simulation studies demonstrating that species with small territories, movement ranges, and high reproductive rates are generally unaffected by roads (Rytwinski and Fahrig 2013). The high population densities and recurrent episodes of massive dispersal associated with the cyclic nature of voles may genetically homogenize their populations and erase the potential barrier effect of roads (Domínguez et al. 2021). However, the emergence of “boom-bust” dynamics in the Iberian vole postdate the first expansion in the late 1970s (Jareño et al. 2015), when the species had already colonized the central agricultural lowlands of the Duero basin (Delibes 1989; González-Esteban et al. 1995; Luque-Larena et al. 2013). We hypothesize that the relative role of anthropogenic barriers was more important during the early phases of the range expansion because colonizers were probably few and had lower power at the demographic level (absence of cycles and low effective population size) to face these barriers.

A large number of species are currently showing range expansions in response to climate change and human-related factors (Ledevin and Millien 2013, Pecl et al. 2017), but studies investigating the genetic consequences of such range expansions are still scarce. The life history strategies of some species make them strong candidates for rapid expansion over novel habitats and, together with biogeography, are the most important predictors of expansions (Pacifici et al. 2020). We here propose and document that a range expansion is associated to a reduction of genetic diversity but genetic homogeneity at the colonized range, provided by the life-history features of the (cyclic) species, such as high reproductive rates, short-distance dispersal, and large Ne and extensive gene flow during peaks that could quickly erase previous signals of genetic divergence. Recent anthropogenic factors, in the form of genetic barriers, continue to reshape the genetic structure of expanding species at regional scale. More studies on the genetic effects of range expansions in models with different life-history strategies would help to understand the relative importance of deterministic and stochastic processes during range expansions.

Data archiving

The data used in this paper can be accessed at the Dryad repository https://doi.org/10.5061/dryad.j9kd51ch9.