Introduction

The reconstruction of invasion histories is crucial to understand the ecological and evolutionary processes underlying invasions (Estoup and Guillemaud, 2010). One of the main features to have emerged from several well-documented examples is that invasion histories involving human activities are often far more complex than initially thought, with multiple introductions, bridgehead effects and stochastic processes leading to the development of a genetic structure within invaded areas that is difficult to predict (see, for example, Lombaert et al., 2014). Although human travels and trade have always facilitated the dispersal of other organisms, most of these case studies have concerned recent introductions (Dlugosch and Parker, 2008).

Commensal rodents are ideal models to study the complexity of invasion histories over different timescales, as these animals have been dispersing with humans since Neolithic times (Jones et al., 2013). The house mouse, Mus musculus domesticus, in particular, is recognized as a major invasive taxon (http://www.issg.org/database/) having dramatic impacts on biodiversity, human health and human activities (Singleton et al., 2003). This subspecies from the Mus musculus complex originates from South-West Asia (Suzuki et al., 2013), and became commensal during the initial settlements of humans in the Middle East at 10 000 BC (Cucchi et al., 2012). The distribution range of M. m. domesticus then expanded, probably thanks to increasing human trade, around the Mediterranean Sea during the Iron Age (Cucchi et al., 2012). The subspecies subsequently spread to North-West Europe during the Viking era, and then to much of the rest of the world following the Age of Discovery (Jones et al., 2013).

Recent phylogeographic studies have described the past and recent colonization histories of M. m. domesticus in Europe (see, for example, Bonhomme et al., 2011), and in some islands (see, for example, Gray et al., 2014). However, only a few historical records (Dalecky et al., 2015 and references therein) and a few genetic data (a unique population sample from Cameroon (Ihle et al., 2006); 12 individuals from Senegal (Bonhomme et al., 2011)) were available to document the evolutionary history of M. m. domesticus in Africa. The house mouse may have been present in West Africa since the arrival of Portuguese sailors in fifteenth century (Rosevear, 1969). In Senegal, which was a focal area for European settlers, large and stable populations of house mice have been described in the colonial cities along the Atlantic coast since the middle of the nineteenth century (Dalecky et al., 2015 and references therein). Following the development of human transport, the subspecies has spread further inland since the twentieth century. Its range now covers the northern half of the country, and is still expanding (Granjon and Duplantier, 2009; Dalecky et al., 2015; BPM Database: http://vminfotron-dev.mpl.ird.fr/bdrss/index.php).

The aim of this study was to decipher the invasion history and spatial demographic dynamics of M. m. domesticus in Senegal, and to assess the consequences of human history in shaping neutral genetic variation of this subspecies in its expanding range. We used two different types of genetic markers to characterize the genetic variation of the house mouse: sequences from the mitochondrial DNA control region (D-loop) and 16 nuclear microsatellites. The D-loop is the only molecular marker for which substantial data are available over the entire distribution of the house mouse (see, for example, Bonhomme et al., 2011). It is therefore a useful marker for investigations of the exogenous origin of this subspecies. Microsatellites provide more detail about introduction history and the spatial expansion processes at work within the invaded area. We first carried out classic phylogenetic and population genetics analyses on an extensive sample set covering the entire distribution area in Senegal. We placed the D-loop data in a wider context, by including a large set of previously published sequences in the phylogenetic analyses. Approximate Bayesian computation methods (ABC) were then applied to the microsatellite data in order to compare different introduction scenarios and estimate several parameters of interest, such as introduction time.

Materials and methods

Sample collection and laboratory analyses

In Senegal, the distribution of the house mouse is restricted to villages and towns along four main roads (Dalecky et al., 2015) in the north (the northern road), the centre of the country (the central road), within the Ferlo region (the Ferlo road) and along the coast (the coastal road; Figure 1). Between 2011 and 2013, house mice (target sample size: 20 individuals) were sampled by live trapping in 36 human settlements (villages or cities, hereafter referenced as sites) along these main roads (Figure 1), according to a standardized protocol described by Dalecky et al. (2015) (but see also Supplementary Table S1). Fieldwork was carried out under the framework agreement established between the Institut de Recherche pour le Développement and the Republic of Senegal, as well as with the Senegalese Head Office of Waters and Forests. Handling procedures were performed under our lab agreement for experiments on wild animals (no. 34-169-1), and followed the official guidelines of the American Society of Mammalogists (Sikes et al., 2011). Trapping campaigns within houses were systematically performed with prior explicit agreement from relevant local authorities. Each captured mouse was killed by cervical dislocation, weighed, measured and necropsied. A piece of liver was stored in 95% ethanol for molecular analyses.

Figure 1
figure 1

Geographic origin and genetic clustering of sampling sites (code names in Table 1) for Mus musculus domesticus in Senegal. (a) Geographic distribution of the two main genetic groups (K=2) obtained using STRUCTURE. For each site, colours in pie charts indicated the proportions of house mice that were assigned to each genetic group. (b) Individual ancestry estimates assuming two or three genetic groups in STRUCTURE. Each vertical line represents an individual, and each colour represents a genetic group. Individuals are grouped by site and sites are ordered along each sampled road according to a west–east gradient. Clustering patterns obtained with TESS were similar (Supplementary Figure S4).

The complete D-loop sequence was amplified for 119 mice sampled from distant houses within each of the studied sites (from 2 to 10 mice per site, Table 1), with the PCR primers and conditions described in Rajabi-Maham et al. (2008). PCR products were sequenced in both directions by Eurofins MWG (Ebersberg, Germany).

Table 1 Genetic estimates of key statistics within sampling sites of Mus musculus domesticus in Senegal

We genotyped 16 nuclear microsatellite loci (D1Mit291, D2Mit456, D3Mit246, D4Mit17, D4Mit241, D6Mit373, D7Mit176, D8Mit13, D9Mit51, D10Mit186, D11Mit236, D14Mit66, D16Mit8, D17Mit101, D18Mit8 and D19Mit30: available from the MMDBJ database: http://www.shigen.nig.ac.jp/mouse/mmdbj/top.jsp) for the total of 763 mice sampled (including the 119 individuals for which the D-loop was sequenced). The selected loci had perfect dinucleotide motifs, flanking sequences suitable for primer binding and were located on different chromosomes (except D4Mit17 and D4Mit241). They were amplified in three multiplex PCRs (Supplementary Table S2). PCR products were separated and detected with an ABI 3130 automated sequencer (Applied Biosystems, Foster City, CA, USA) and analysed with GeneMapper v.3.7. For each mouse successfully genotyped at some loci but not at others, each failed locus was reamplified by simplex PCR (to prevent primer competition).

Sequence analyses

All 119 D-loop sequences from Senegal were aligned with 1673 sequences retrieved from GenBank (1313 published in Bonhomme et al. (2011) and 361 obtained from house mice sampled in Europe, Asia, Oceania and Africa: see references in Figure 3). We used the Multiple Alignment of the Fast Fourier Transform algorithm (MAFFT v.7; Katoh and Standley, 2013). Haplotypes were identified with FaBox v1.41 (Villesen, 2007). Bayesian phylogenetic reconstruction (four chains, burn-in=2 × 103 iterations; chain length=2 × 107 iterations) was performed on all haplotypes with MRBAYES v.3.2 (Ronquist and Huelsenbeck, 2003), with a Hasegawa–Kishino–Yano (Hasegawa et al., 1985) mutational model previously identified as the best model by JmodelTest v.2.1.3 (Darriba et al., 2012) using the Akaike information criterion. A phenogram was then constructed with NETWORK v.4.6.11 (Bandelt et al., 1999) on haplotypes from Senegal in order to illustrate their relative frequencies within the invaded area.

Microsatellite population diversity and structure

Deviations from Hardy–Weinberg equilibrium within loci and sites, and genotypic linkage disequilibrium between pairs of loci, were assessed using GENEPOP v.4 (Rousset, 2008). We corrected for multiple testing by the false discovery rate approach (Benjamini and Hochberg, 1995) implemented in the QVALUE package (Dabney et al., 2011) of R. Heterozygote deficiencies are often found in house mouse populations and are classically attributed to their social system, resulting in subpopulation structuring (Ihle et al., 2006). We analysed the subpopulation structure by calculating the kinship coefficient (ρ) of Loiselle et al. (1995) between all pairs of individuals at each site, with SPAGeDI v.1.4 (Hardy and Vekemans, 2002), using the genotype data for each site as the reference for allelic frequencies.

Genetic diversity at each site was estimated with FSTAT v.2.9.3 (Goudet, 2001) by calculating the allelic richness ar (rarefaction procedure; minimum sample size of 14 diploid individuals), and Nei’s unbiased genetic diversity (Hs, Nei, 1987). The mean M index (Garza and Williamson, 2001), an indicator of past demographic changes, was calculated across loci for each site with DIYABC v.2.1 (Cornet et al., 2014). Genetic differentiation between sites was summarized by calculating pairwise FST estimates (Weir and Cockerham, 1984) with FSTAT.

We characterized the spatial genetic structure of mice in Senegal using two approaches. First, we used the clustering approach implemented in STRUCTURE v.2.3.4 (Pritchard et al., 2000) in order to estimate the number of homogeneous genetic groups (K) in the data set. The analyses were performed with a model including admixture and correlated allele frequencies (Falush et al., 2003). We performed 20 independent runs for each K value (from K=1 to 20). Each run included 500 000 burn-in iterations followed by 1 000 000 iterations. The number of genetic groups was inferred by the deltaK method applied to the log probabilities of data (Evanno et al., 2005). We checked that a single mode was obtained in the results of the 20 runs for all K-values explored, using the Greedy Algorithm implemented in CLUMPP v.1.2.2 (Jakobsson and Rosenberg, 2007). Barplots were finally generated with DISTRUCT v.1.1 (Rosenberg, 2004). Second, we used the spatial Bayesian clustering method implemented in TESS v.2.3.1 (Chen et al., 2007). In TESS, the spatial information considered is a neighbourhood network of the sample sites, obtained from a Dirichlet tessellation of their coordinates. As allowed in TESS, the network was modified in order to delete unrealistic neighbourhood relationships between individuals sampled in sites that are not directly connected by roads, but similar results were obtained considering the unmodified network (results not shown). We performed 20 independent runs for each K-value ranging from 2 to 20, using the admixture model CAR, a burn-in period of 10 000 sweeps followed by 30 000 sweeps and the interaction parameter set to 0.6. The number of genetic groups was inferred using the deltaK method applied to deviance information criterion. We also used CLUMP to check that a single mode was obtained in the results for each K.

Both STRUCTURE and TESS results may be biased because of deviations from Hardy–Weinberg equilibrium. We validated the clustering solution obtained using Discriminant Analysis on Principal Components (DAPC) that is not based on a predefined population genetics model and is thus free from Hardy–Weinberg equilibrium assumptions (Jombart et al., 2010). DAPC was performed using the adegenet package (Jombart, 2008) in R. The consistency of the results was assessed through 10 independent DAPC runs.

Regular loss of genetic diversity along colonization routes is often expected because of the occurrence of successive bottlenecks during the expansion of the range of the colonizing species (Ramachandran et al., 2005). In the context of an invasion, geographic gradients of genetic diversity may thus provide insight into the source populations that were initially introduced. STRUCTURE and TESS analyses identified two main genetic groups (see the Results section). We tested the hypothesis suggested by historical data that the source populations of these two groups were initially introduced into the main colonial cities of Senegal located on the Atlantic coast (Dalecky et al., 2015). To this aim, we performed Spearman’s rank correlation analyses between genetic diversity estimates (ar, HS) and the longitude (that is closely related to distance from the coast) of the sampled sites for each genetic group.

If each of the genetic groups of Senegalese house mice had an independent origin, we could expect a greater genetic diversity at sites of admixture. We defined the admixture rate as the mean proportion of membership (between 0 and 50%) of each site to the alternate genetic group given by STRUCTURE. We then applied Spearman’s rank correlation analysis to the entire data set to assess the relationship between such admixture rates and genetic diversity estimates (ar, HS).

Inference of dispersal from microsatellite data

The dispersal of offspring over limited distances from their parents results in an increase in genetic differentiation with geographic distance through a process known as isolation by distance (IBD; Rousset, 1997). We characterized the dispersal patterns of Senegalese house mice by conducting IBD analyses at two different spatial scales: (1) at a local scale, that is, within sites (as in Verdu et al., 2010), to estimate the spatial restriction of dispersal between houses within villages; and (2) at a larger scale, along presumed expansion roads, to evaluate the contribution of long-distance dispersal to the genetic structure.

We used two different inference methods for this purpose. First, for both the local and large scales, we used the regression method based on the expected linear relationship between genetic and geographic distances (Rousset, 1997, 2000). These analyses were run with GENEPOP, using the pairwise genetic differentiation estimator er calculated between individuals (Watts et al., 2007) and Euclidean geographic distances between individuals or their logarithms, depending on whether dispersal occurred principally in one dimension (along roads) or in two dimensions (within sites). The minimum distance between sites (3 km) was used as a threshold to exclude pairs of individuals from the same site or from different sites in the analyses performed along roads and within sites, respectively. Mantel tests with 10 000 permutations were performed to assess the correlation between matrices of genetic and geographic distances, with a home-made R script that modified the Mantel test to calculate rank correlation coefficients and to permute the pairwise distances within sites or between individuals from different sites only (script available upon request).

Second, IBD was explored at the large scale by the maximum likelihood method implemented in MIGRAINE that infers model parameters using importance sampling algorithms (de Iorio et al., 2005) extended to consider linear IBD as a model for population structure (Rousset and Leblois, 2007). A geometric distribution is considered for dispersal and a K allele model for mutation (Rousset and Leblois, 2007, 2012). MIGRAINE provides point estimates, 95% coverage confidence intervals (CIs) and two-dimensional parameter likelihood profiles for several parameters: the scaled local population size (θ=2 × Ngenes × μ, where Ngenes is the local population size expressed in number of genes and μ the mutation rate per locus per generation), the scaled emigration rate (number of emigrant per generation: γ=2 × Ngenes × m, where m is the total emigration rate per generation for a local population), the geometric dispersal distribution parameter (g) and neighbourhood size (Ns=2 × D × σ2, where D is the density of individuals and σ2 the mean squared parent–offspring dispersal distance). All MIGRAINE runs were performed under a linear model of IBD (that is, 1D IBD) on the 16 microsatellites with the following computing parameters: 1000 trees, 600 points and 2 iterations. We translated the parameters inferred from MIGRAINE into effective population size (Ngenes) using the mutation rate commonly used for microsatellites: 5 × 10−4 (Sun et al., 2012).

Inference about introduction scenarios from ABC on microsatellites

ABC analyses were performed on microsatellite data only, for which we had population samples (see Table 1). The small size and large geographical scale of the sampling of mitochondrial DNA variation (that is, a subset of individuals from all sites sampled in Senegal) was clearly not appropriate for ABC analyses that assume Hardy–Weinberg population units. The common practice of pooling differentiated site samples may give misleading results in ABC analyses (Lombaert et al., 2014). Hence, ABC analyses were conducted on sites chosen to be representative of each genetic group identified by the clustering analyses, and known on the basis of rodent community data (collected from the nineteenth century to the present days: see references in Dalecky et al., 2015) to be in the most likely areas of introductions. We chose to test a small number of competing scenarios rather than an exhaustive list to focus computational efforts on well-founded introduction hypotheses. The first and second scenarios involved two introduction events in Senegal, one in the north and the other further south, from two different unsampled ancestral populations (scenario 1, Figure 2a) or from a single unsampled ancestral population (scenario 2, Figure 2b). The third and fourth scenarios involved a single introduction event from a single unsampled ancestral population in a southern (scenario 3, Figure 2c) or in a northern (scenario 4, Figure 2d) coastal site, with a subsequent secondary introduction event from the first introduced population into a northern or southern coastal site, for scenarios 3 and 4, respectively.

Figure 2
figure 2

Graphical representation of the four competing introduction scenarios for Mus musculus domesticus in Senegal compared by ABC. UA, unsampled ancestral population. Time 0 is the sampling date. The main historical events are represented on the timescale to the left of each scenario. Black, grey and white bars represent different stable effective population sizes in the ancestral populations and in Senegal, and thin lines represent bottleneck events characterized by their own effective number of founders and duration. All parameters and their associated prior distributions are described in Supplementary Table S3. (a) In scenario 1, there are two independent introduction events in Senegal from two unsampled ancestral populations UA1 and UA2 that diverged ta generations ago, one (tn generations ago) giving rise to the NORTH group, the other (ts generations ago) giving rise to the SOUTH group; tn<ta and ts<ta. Graphically, ts is represented as more recent than tn, but no assumption is actually made about the chronological order of these parameters. (b) In scenario 2, there are two independent introduction events in Senegal as in scenario 1, but the populations introduced are considered to originate from a single unsampled population UA. (c) In scenario 3, there is a single primary introduction event in the south of Senegal from a single unsampled population UA (ts generations ago), followed by a secondary introduction event further north tn generations ago; ts>tn. (d) Scenario 4 also involves a single primary introduction event from a single unsampled population UA, this time in the north of Senegal, followed by a secondary introduction event further south; tn>ts.

Significant genetic population substructure was observed locally within many sampled sites (Table 1). We hence evaluated the potential effect of local substructure within our sampled sites on scenario choice when using ABC treatment in which the absence of local population substructure is assumed within the analysed samples. To this aim we analysed different sets of simulated pseudo-observed data sets characterized by the absence or presence of genetic substructure within samples (Supplementary Appendix S1).

ABC analyses were performed with DIYABC v.2.1 (Cornuet et al., 2014). The prior distributions of the historical, demographic and mutational parameters are described in Supplementary Table S3 (prior distribution set 1, including only uniform priors). Wild house mice are generally thought to have a generation time of 3 months (Nachman and Searle, 1995). Priors for introduction and divergence times were thus defined within the last 2000 generations to encompass the period during which Europeans initially arrived in Senegal (fifteenth century: Sinou, 1993) within the possible values. A second set of prior distributions was used to evaluate the robustness of the ABC inferences to prior choice. It included (1) normal distributions with the same mean and bounds as in prior set 1 for demographic parameters; and (2) logUniform distributions with the same bounds as in prior set 1 for mutation parameters (see Supplementary Table S3, prior distribution set 2).

We summarized the genetic information within and between populations using all single-sample and two-sample summary statistics (that is, 16 summary statistics) available in DIYABC (see p. 16 in the DIYABC user manual, available from http://www1.montpellier.inra.fr/CBGP/diyabc/). In a preliminary study, we evaluated the confidence in the choice of scenario and accuracy of parameter estimation under a given scenario for different sets of summary statistics using DIYABC simulated pseudo-observed data sets (pods) drawn randomly from prior distributions for both the scenario ID and the parameter values. We showed that the use of all summary statistics provided a better discrimination among the tested scenarios without degrading the estimation of parameter values under a given scenario than the more or less arbitrary choice of a subset of statistics.

We simulated 106 data sets per scenario, and the posterior probability of each competing scenario was estimated by a polychotomous logistic regression on the 1% of simulated data sets closest to the observed data set. We carried out a linear discriminant analysis transformation of the 16 summary statistics before calculating the logistic regression (Estoup et al., 2012). We then estimated the posterior distributions of demographic parameters under the selected scenario by local linear regression on the 1% of simulated data sets closest to the observed data set (Cornuet et al., 2008). We used raw (that is, non-linear discriminant analysis transformed) summary statistics for this analysis (see, for example, Lombaert et al., 2014).

We evaluated confidence in the choice of scenario and the accuracy of parameter estimation under a given scenario, using simulated pseudo-observed data sets (pods), for which the true scenario identity (ID) and parameter values are known. Pods were simulated from posterior distributions to focus around the observed data set as error and accuracy indicators conditional to the observed data set (that is, from posterior distributions) are clearly more relevant than indicators blindly calculated over the whole prior data space. We used the new option proposed by DIYABC v.2.1 to compute posterior error rates for model choice and posterior accuracy indicators for parameter estimation from sets of 5000 pods (see DIYABC manual p. 5 and sections 3.5.2 and 3.5.5 for details).

Finally, we evaluated a Bayesian equivalent of goodness of fit for the selected scenario using the model checking option of DIYABC. From the 106 data sets simulated under the selected scenario, we obtained a posterior sample of 104 values from the posterior distributions of parameters through a rejection step based on Euclidean distances and linear regression post treatment (as previously described). We then simulated 104 data sets and corresponding summary statistics with parameter values drawn with replacement from this posterior sample. Finally, we ranked the summary statistics for the observed data against those for the simulated data sets. For the model fit to be considered good, the number of observed statistics falling in the margins of the distributions of simulated statistics (that is, statistics with a Proportion (simulated<observed values) <5 or >95%) has to be low (that is, <10% of the 16 summary statistics used here as test statistics).

Results

Mitochondrial sequence analysis

In the 119 D-loop sequences obtained, there were only 11 variable sites, defining 15 haplotypes (Supplementary Table S4; mean haplotype diversity h=0.51±0.04; mean nucleotide diversity π=0.003±0.0003). Two major haplotypes (H1 and H2) were found in 79 and 21 individuals, respectively (Figure 3a). The other haplotypes were separated from H1 or H2 by only a few mutational steps (only 1 for 12 of the 19 remaining individuals), except for the more distantly related haplotype H12 that was found in one mouse from Saint-Louis (SIN; Figure 3a). The observed distribution of D-loop haplotypes in Senegal followed no clear geographic pattern (Supplementary Figure S1). Under a HKY85 mutational model, Bayesian reconstruction showed that haplotypes H1, H2 (and the haplotypes derived from them) and H12 belonged to the haplogroups HG11 (or clade E), HG4 (or clade F) and HG1 (or clade C1), respectively, in the nomenclature defined by Bonhomme et al. (2011) (or by Jones et al., 2011) (Figure 3b).

Figure 3
figure 3

Mitochondrial D-loop haplotypes (701 bp) in Senegalese Mus musculus domesticus. (a) Median joining network of the 15 D-loop M. m. domesticus haplotypes found in Senegal: white squares correspond to nonobserved haplotypes, and blue, red and green symbols correspond to haplotypes from three haplogroups (HG11, HG4 and HG1, respectively) identified by Bonhomme et al. (2011). The positions of mutational steps are indicated by the numbers in italics, and symbol size scales are proportional to the number of house mice, as indicated in the legend to the right. (b) Phylogenetic tree for the 367 D-loop haplotypes sequences found in M. m. domesticus. Haplotypes were identified in a data set containing the 119 sequences from this study, 1313 sequences from Bonhomme et al. (2011) and 361 sequences from other studies (Prager et al., 1996, 1998; Gündüz et al. 2000, 2001, 2005; Ihle et al., 2006; Searle et al., 2009a, 2009b; Jones et al., 2010; Linnenbrinck et al., 2013; Suzuki et al., 2013; Gabriel et al., 2015; Jones and Searle, 2015). The 15 labelled haplotypes (from H1 to H15) are those found in Senegal. They belong to three haplogroups (in blue: HG11; in red: HG4; in green: HG1) identified by Bonhomme et al. (2011). Although haplogroups appear as reasonably cohesive, they are not statistically supported in phylogenetic analyses, as it can be expected from a recent expansion phenomenon (Bonhomme et al., 2011).

Microsatellite genetic diversity and structure

Linkage disequilibrium was significant for 27 of the 4320 tests performed, and hence the 16 loci were considered to be genetically independent. Only three loci (D4Mit241, D11Mit236 and D16Mit8) were at Hardy–Weinberg equilibrium at all sites. All others displayed significant heterozygote deficiencies at most sites. Null alleles were unlikely to explain heterozygote deficiencies, because only a small number of null genotypes were observed (0–5) per locus. Overall, positive FIS values were obtained at 21 sites (Table 1). Within sites, the median kinship coefficient ρ ranged from −0.048 to −0.001 (Table 1). Very high ρ values (ρ>0.5) were obtained for only a few pairs of individuals at some sites (Supplementary Figure S2), indicating the occurrence of some full siblings. Analysis of the restricted data set corresponding to house mice captured in different buildings only (540 individuals) yielded similar results for deviations from Hardy–Weinberg equilibrium, FIS and ρ values (results not shown), suggesting that buildings were not the relevant units for defining genetic subgroups within sites.

Allelic richness (ar) ranged from 3.2 to 6.1 alleles (mean 4.4±0.5) and HS from 0.48 to 0.74 (mean 0.61±0.05). Mean values of the M index (between 0.42 and 0.56) were all consistent with a bottleneck signal (<0.68; Garza and Williamson, 2001). Pairwise FST values (Supplementary Table S5) ranged from 0.05 to 0.34, with a global mean FST value of 0.19 (95% CI=0.17–0.21). Substantial genetic structure was observed even between sites that were geographically close together (Supplementary Table S5).

Spatial genetic structure was first characterized with STRUCTURE. The highest deltaK value was that for K=2 (Supplementary Figure S3a). At K4, there was no congruence between the 20 runs for each K. At K=2, sites along the northern road between SIN and AEL were largely assigned to a first group, whereas those along the central and coastal roads were largely assigned to a second group (Figure 1). House mice from the Ferlo road and from eastern sites (GAL, THM, MAT) had a variable mixed inferred ancestry (Figure 1). At K=3, the genetic groups corresponding to the northern route between SIN and AEL, on the one hand, and the central and coastal roads, on the other hand, remained mostly unchanged. The third group corresponded to individuals from the GOU site, from the Ferlo road (between KSD and DEN) and from the eastern sites of GAL, THM and MAT that were admixed at K=2 (Figure 1b).

Using TESS, the highest deltaK value was that for K=3 (Supplementary Figure S3b). Note that the deltaK value cannot be calculated for K=2, as it is not possible to run a TESS analysis for K=1. At K=2 and 3, the clustering patterns were identical among runs and similar to those obtained using STRUCTURE (Supplementary Figure S4). For K4, there was no congruence between the different runs for each K value, as observed with STRUCTURE.

The DAPC clustering pattern at K=2 was similar to those obtained with STRUCTURE and TESS (Supplementary Figure S4). Some differences concerned THM, MAT and DEN (eastern sites), and sites between KSD and BAR (along the Ferlo road) that had a variable mixed inferred ancestry in STRUCTURE and TESS (see Supplementary Figure S4 in the Supplementary Material). These inconsistencies may result from admixture effects that cannot be accounted for in the DAPC.

In summary, clustering analyses identified two main genetic groups: the NORTH group, mostly located along the northern road between SIN and AEL, and the SOUTH group, mostly distributed along the central road (Figure 1). Other sites (along the coastal and Ferlo roads, and the eastern-most sites along the northern road) displayed variable levels of admixture between the two groups. There was a tendency for allelic richness ar to decrease with increasing longitude for sites along the northern road between SIN and AEL (Spearman’s rank correlation coefficient: rs=−0.51, P=0.09), and along the central road (rs=−0.56, P=0.06). No significant correlation was observed between longitude and HS for sites between SIN and AEL (P=0.59), and along the central road (P=0.89). No relationship was found between admixture rate and ar (P=0.90) or HS (P=0.65) calculated for all sites.

Two-dimensional IBD was significant within sites (Mantel test: P<0.0001; slope b=0.038, 95% CI=0.034–0.049). The slope of the IBD regression line provides a robust estimator of 1/4πDσ2, the inverse of neighbourhood size (Rousset, 1997, 2000). From the inferred slope, we calculated that 2=6.5 (5.0–7.4). Using a rough estimate of D=100 house mice per km2 (based on the mean number of households occupied by house mice and the mean surface area of the sampled sites: data not shown), we obtained an estimate of σ=255 m.

In contrast, linear IBD patterns were very weak and were globally nonsignificant for between sites analyses along the northern road (Mantel test: P=1; slope b=9.5 × 10−8, 95% CI=5.2 × 10−8 to 1.5 × 10−7), and along the central road (Mantel test: P=1; slope b=2.0 × 10−8, 95% CI=−1.2 × 10−8 to 5.3 × 10−8). Slope values (the inverse of the neighbourhood size estimates) gave σ values of 16 and 35 km for the north and central roads, respectively.

Similar inferences emerged from MIGRAINE between sites along the northern and central roads (Table 2). Very high neighbourhood size values (Ns) were inferred by MIGRAINE, indicating weak IBD patterns and, therefore, frequent long-distance dispersal events. In addition, the island model (corresponding to g=1) was not rejected for either the northern or the central road, consistent with a lack of spatial restriction of dispersal. The numbers of mice per village were calculated from estimates of scaled population size (θ) (126 (104–152) and 108 (86–128) mice per village for the northern and central roads, respectively) and were hence close to our rough estimate of 100 mice per village.

Table 2 Isolation-by-distance (IBD) parameters estimated using MIGRAINE for house mice sampled along their main dispersal axes in Senegal

ABC inferences about introduction scenarios

The introduction history of the NORTH and SOUTH genetic groups was studied using ABC. Six ABC analyses were processed independently with pairs of sites corresponding to the major colonial cities of the coast, from the NORTH (St Louis: SIN or SND) and SOUTH (Dakar: DAK; Rufisque: RUF, or Mbour: MBR) groups (Table 3).

Table 3 ABC model choice results for the introduction history of Mus musculus domesticus in Senegal

For all six sample pairs considered, scenario 4 consistently had the highest posterior probability (for example, P=0.89 for the SIN-RUF sample pair, 95% CI=0.888–0.897; Table 3). This scenario involves a primary introduction event on the northern part of the coast from a single unsampled population, and subsequent divergence due to a secondary introduction event from Northern Senegal to a coastal site further south. The second-best scenario was scenario 3 (for example, P=0.066 for the SIN-RUF sample pair, 95% CI=0.062–0.069) that also involved a single introduction event but occurring on the southern part of the coast. The data provided weaker support for scenarios involving two independent introduction events (for example, scenario 1: P=0.002 (0.001–0.002); scenario 2: P=0.04 (0.036–0.042), for the SIN-RUF sample pair). Simulation-based analyses showed that the conclusion about the most likely scenario among the four compared scenarios is not challenged by the level of local population substructure observed in the present study (Supplementary Appendix S1).

Posterior error rates are presented in Table 3 for the choice among the four scenarios considered individually or between scenarios 1+2 (scenarios including a single primary introduction event) and scenarios 3+4 (scenarios including two primary introduction events). Posterior error rates were relatively low (that is, 10%) for the choice between scenarios 1+2 and 3+4, but were substantially higher (that is, 30%) for the choice among the four scenarios considered independently. Thus, confidence in the choice between scenarios 1, 2, 3 and 4 in the vicinity of the observed data set is rather poor, whereas simply choosing between histories involving a single primary introduction event versus histories involving two independent primary introduction events has more statistical support.

When the model checking option of DIYABC was applied with the selected scenario 4 and associated parameter posterior probabilities, we found that none of the 16 summary statistics used as test quantities had a low tail-probability value (that is, 0.05<P<0.95 for all test quantities; Supplementary Table S6). The inferred scenario–posterior combination therefore provides a good fit to the observed data set. Accordingly, the projections of the simulated data sets onto the principal component axes for the tested scenario–posterior combination were relatively well grouped and centred on the target point corresponding to the observed data set (Supplementary Figure S5).

ABC inference about demographic and historical parameters

We inferred the posterior distributions of demographic parameters under scenario 4. The ABC analyses reported below concerned the SIN (for NORTH) and RUF (for SOUTH) sites, but the other sample pairs provided similar results (data not shown). For most parameters, the estimated posterior distributions were not much more informative than the priors (Supplementary Tables S3 and S7 and Supplementary Figure S6 for prior set 1; data not shown for prior set 2). Consistent with this finding, the RMedAD values obtained from pods were similar to those calculated as base level from prior information only (that is, without genetic information) for all parameters, including the introduction times for the NORTH and SOUTH groups (Supplementary Table S8). More information was obtained for composite parameters (Supplementary Table S8), but it remains difficult to interpret these estimates biologically. Finally, we found that each introduction event was followed by a demographic bottleneck that was less intense for the primary introduction in the north (median bottleneck intensity tbn/Nbn=0.26) than for the secondary introduction in the south (tbs/Nbs=0.54; Supplementary Table S7 and Supplementary Figure S6).

Discussion

In this study, we aimed to investigate the invasion history of M. m. domesticus in Senegal by characterizing its genetic structure with both mitochondrial sequences and microsatellite markers. We wanted (1) to evaluate whether the introduction history of the subspecies and its spatial demographic dynamics are consistent with human history at colonial and contemporary times and (3) to give insights into the evolutionary processes that may underlie the invasions of commensal rodents.

Introduction history

We found some evidence from D-loop data that a small group of mice was introduced in Senegal in a single main introduction event. Only two major haplotypes (H1 and H2) were found in Senegal, and mean haplotype and nucleotide diversities in Senegal (h=0.51 and π=0.003) were substantially lower than those for the house mice of Western Europe (mean h from 0.82 to 0.95; mean π from 0.002 to 0.008; see, for example, Rajabi-Maham et al., 2008; Searle et al., 2009b; Jones et al., 2011; Gabriel et al., 2015) or from invaded areas after multiple introductions (mean h from 0.66 to 0.91; mean π from 0.004 to 0.01; see, for example, Searle et al., 2009a; Gabriel et al., 2015). In addition, both major haplogroups were found at coastal sites, and no geographic pattern was observed in the distribution of haplotypes across Senegal. These features are consistent with the presence of ancestral polymorphism in a single initial introduction area, with a subsequent spatial spread inland.

Microsatellite data also suggested that there had been a single primary introduction event in Senegal. Consistent with the notion that one of the two main genetic groups spreading in Senegal originated from the other, we found no relationship between admixture levels and genetic diversity within sites. ABC analyses provided more statistical support for scenarios involving a single primary introduction event than those involving two independent introduction events (Table 3). More specifically, the best scenario selected by ABC (scenario 4 in Figure 2), which involves a single primary introduction event in northern Senegal, was repeatedly selected in each of the six ABC analyses carried out, despite substantial differentiation between the sites chosen as representative of each genetic group. This suggests that we can be confident in the selection of scenario 4, despite high posterior error rates associated with this choice.

It remains a challenge to finely identify the origin of the first introduced house mouse population in Senegal. Both microsatellite and historical data suggest that Saint Louis, the first colonial port to be developed in Senegal (Sinou, 1993) and a major colonial city involved in the trading of slaves and Arabic gum during the eighteenth century (Bonnardel, 1992), might be the putative area of introduction. France was involved in the establishment of Saint Louis, and the British controlled the city for 80 years during the eighteenth century (Sinou, 1993). Unfortunately, the lack of precise information about introduction times provided by ABC makes it impossible to evaluate the consistency of these times with historical data. All the mitochondrial haplogroups found in Senegal are typical from Western Europe (Bonhomme et al., 2011). Both major mitochondrial haplotypes (H1 and H2) and their closely related haplotypes have been reported at relatively high frequencies (>10%) in not only Western France and Great Britain, but also Germany (H1), Norway (H2) or Morocco (H1) (Bonhomme et al., 2011; Linnenbrinck et al., 2013; Supplementary Table S9). The unique haplotype H12 from haplogroup HG1 was found at high frequency (>20%) in Southern France and Portugal (Bonhomme et al., 2011). Nevertheless, D-loop data for Europe are sparse and concern sites with no particular connection to colonial history. D-loop and microsatellite data from house mouse populations located close to major harbours historically involved in trade with Senegal (such as Nantes or Bordeaux in France, Liverpool in Britain) may facilitate identification of the precise Western European source of the mouse populations of Senegal.

At first glance, the known occurrence of H1 in Morocco (Bonhomme et al., 2011) might suggest another scenario of colonization by a continental route from North-West Africa to Senegal. This scenario would be unlikely, however, to explain the primary distribution area of the house mouse in Senegal that was shown to be restricted to coastal villages and towns (Dalecky et al., 2015). Indeed, several hundred km separated Senegalese mouse populations from the nearest populations further north (Granjon and Duplantier, 2009), and the trade between Senegal and North-West Africa did not historically occur via the Atlantic coast, but via inland sites (Miège, 1981).

Spatial expansion

Historical data and longitudinal surveys of commensal rodent communities in Senegal have suggested that the spread of house mice in Senegal is recent (twentieth century) and related to the development of road traffic (see Dalecky et al., 2015 and references therein). Indeed, mouse populations would first have become established in villages and towns on the coast, possibly because of the development of railway trade between St Louis and Dakar at the end of the nineteenth century (Bonnardel, 1992). Genetic admixture between the NORTH and SOUTH groups would have occurred in this area before expansion to the east with the development of asphalt roads inside the country.

In the context of biological invasions, spatial expansion is often linked to high levels of gene flow that may minimize population structure and IBD patterns (Marrs et al., 2008). It may also be characterized by sequential founder events, leading to strong genetic structure and spatial decrease of allelic diversity along the colonization axis (Clegg et al., 2002). Founder events may strongly limit or at least delay the rise of the IBD pattern because of independent changes in allele frequencies at each introduction. In Senegalese house mice, substantial genetic structure was observed in the analysis of microsatellites, even within the main genetic groups identified by Structure and TESS, indicating that founder events may have occurred repeatedly during the expansion process. Mean M values are consistent with bottleneck signals and decreases in allelic richness along the main expansion road of each genetic group from the coast further suggested serial founder events during expansion (Ramachandran et al., 2005).

At the local geographic scale (that is, within sites), IBD was significant and associated with low estimates of neighbourhood size, reflecting the spatial limitation of dispersal. These results are consistent with the scarce estimates of home ranges of a few tens of metres reported to date for commensal house mice (Pocock et al., 2005). However, the occurrence of long-distance dispersal events over a larger spatial scale is clearly suggested by IBD analyses between sites along the northern and central roads, showing large neighbourhood size estimates. Genetic signatures involving both local diffusion and long distance dispersal are often observed in invasive species with a limited capacity of autonomous dispersal but with many opportunities for passive dispersal by humans (Marrs et al., 2008). This seems to be the case for the house mouse that is generally thought to display active dispersal over only short distances (Pocock et al., 2005). Anthropogenic dispersal probably occurs both between neighbouring villages and over large distances, as mice can take advantage of even small vehicles to disperse.

Estimated values of σ given by IBD regression analyses are compatible with the size of the attraction area of villages having weekly rural markets in Senegal (about 10–20 km: Ninot, 2003) that may be viewed as ‘invasion hubs’ for the mouse towards geographically close villages (Dalecky et al., 2015). The occurrence of long-distance dispersal events in Senegal is highlighted by the assignment of eastern sites along the northern road to the SOUTH genetic group. The distinguishing feature of these eastern sites is to be inhabited by families of human emigrants sending sufficient financial resources to pay for large amount of goods to be brought in directly from Dakar (Bredeloup, 1997), creating opportunities for long-distance transport of mice. Another example is provided by the genetic grouping of individuals from GOU, THM and MAT at K=3 in STRUCTURE. This grouping could be explained by the past transport of goods between these sites, before the construction of an asphalt road between Bakel and Kidira (KID) (Kayser and Tricart, 1957).

Evolutionary processes underlying invasions

Multiple introductions leading to genetic admixture in the introduced populations may play an important role in invasion success (Kolbe et al., 2004). We did not formally test the hypothesis that the first population of house mice introduced in Senegal was a pool of individuals from multiple differentiated European sites, as we wished to focus on a limited number of competing scenarios. The predominance of two mitochondrial haplogroups in Senegal (HG4 and HG11) suggests that two maternal lineages were introduced, but these two lineages may have originated from the same site in Western Europe. Little evidence of multiple introductions is generally found in house mouse populations from remote islands (see, for example, Gabriel et al., 2015). This supports behavioural studies suggesting that once established, populations of mice are substantially closed to immigration of conspecifics (Palanza et al., 1996). This may also account for the marked microsatellite genetic structure observed in Senegal, even between sites located close together.

A similar pattern involving a small number of successful introduction events was found for the tropical fire ant that invaded the Old World as a result of Spanish colonial trade (Gotzek et al., 2015). These (and others) undoubtedly successful invasions provide support for the notion that multiple introductions are not key events explaining the expansion of introduced populations (Dlugosch and Parker, 2008). As suggested by Dlugosch et al. (2015), further research is needed to identify the genetic basis of adaptation allowing spread into new areas, even in the presence of close competitors.

Data archiving

DNA sequences: GenBank accession nos KY686322–KY686440. Microsatellite genotypes and final sequence assemblies for D-loop haplotypes: data available from the Dryad Digital Repository http://dx.doi.org/10.5061/dryad.n0n60