Introduction

Phlebotomus (Larroussius) ariasi Tonnoir is one of the two sandfly species (Diptera, Phlebotominae) transmitting the causative agent of zoonotic leishmaniasis, Leishmania infantum Nicolle (Kinetoplastida, Trypanosomatidae), in the Mediterranean bioclimates of Iberia and France (Ready, 2010), and this report investigates whether the opportunities for its populations to spread northwards in France have been constrained by past climate change. In Italy, this sandfly's range includes only the cooler northwest (Maroli et al., 2008), where its low population densities suggest re-colonization from France. From 20th C reports (Rioux and Golvan, 1969), it is far more abundant and widespread in southern France, where the adults are active on hillsides in wooded rural regions during the dry Mediterranean summer. Therefore, P. ariasi provides a rare case study of the role of the Pyrenees mountains as a barrier to the postglacial re-colonization of Europe (Hewitt, 1996; Schmitt, 2007) by a Mediterranean species.

Most reports on the four paradigms of postglacial range expansions in Europe (Taberlet et al., 1998; Hewitt, 1999) have focused on the role of the Pyrenees, either for maintaining hybrid (or suture) zones of temperate species or as refugia for montane species (Hewitt, 2004). Glacial refugia for temperate species have been identified north of the Pyrenees and Alps (Schmitt, 2007), which often formed barriers for dispersals from the three Mediterranean peninsulas (Iberia, Italy, Balkans) where most diversity was conserved because of warm ‘refugia within refugia’ (Gómez and Lunt, 2007). The presence of refugia in southern France might have given P. ariasi a springboard for postglacial re-colonization northwards but, alternatively, such refugial populations might have blocked (sensu Hewitt, 2004) the dispersal of Spanish populations possibly better adapted to northern environments.

On the basis of isoenzyme variation among populations, P. ariasi shows greater local geographical variation than P. (Larroussius) perniciosus Newstead (Pesson et al., 2004), the other regional vector of L. infantum, making it easier to investigate re-colonization events. Demographic analyses would be confounded by the presence of sibling species or adaptive selection, and so we considered these topics. We selected DNA markers developed for Larroussius species, because they identify and relate phylogenetic species—mitochondrial cytochrome b (cyt b), nuclear elongation factor-1α (EF-1α) (Esseghir et al., 2000)—or can be sufficiently polymorphic for differentiating populations—cyt b, EF-1α, and anonymous nuclear loci AAm20 and AAm24 (Aransay et al., 2003). This report focuses on re-colonization in southwest France, and so P. ariasi was sampled with the objectives of testing for environmental barriers (a lowland corridor between eastern Pyrenees and upland Massif Central, France), leading-edge effects (Lot and Rhone valley, France), alternative immigration routes (via northwest or northeast Spain) and potential ancestral sources (Portugal, Morocco).

Materials and methods

Sandfly sampling

In all, 19 rural populations of adult P. ariasi were sampled (Figure 1; Supplementary Table A, Electronic Supplementary Information (ESI)), using Centers for Disease Control miniature light traps (Sudia and Chamberland, 1962) set in peri-domestic locations overnight or sticky traps (A4 papers soaked in castor oil) placed in road side walls (Rioux and Golvan, 1969) for four nights. Flies were stored in 80% ethanol at −20 °C or dry at −80 °C. Species were identified using the morphological characters of the head and genitalia (Gállego Berenguer et al., 1992) and a dissection method that minimizes the risk of carryover to PCR (Testa et al., 2002).

Figure 1
figure 1

Digital elevation map of the western Mediterranean region, showing locations where 19 populations of P. ariasi were sampled for molecular characterization. Ellipses demarcate the labelled regions mentioned in the text. Additional information on locations is given in Supplementary Table A (ESI).

Molecular methods

Genomic DNA was extracted from each sandfly thorax and/or anterior abdomen, and PCR and sequencing were performed according to Testa et al. (2002) using the conditions first mentioned below unless stated otherwise.

PCR

A volume of 25 μl (cyt b, EF-1α) or 20 μl (AAm20, AAm24) reactions were prepared with 1.5U GoTaq Flexi DNA polymerase and 1.5 mM MgCl2 (Promega UK Ltd, Southampton, Hampshire, UK).

Cyt b

A 796-base pair (bp) fragment was amplified following Parvizi et al. (2010), except that the annealing temperature was 51 °C.

EF-1α

An 856-bp fragment was targeted by the primers of Esseghir et al. (2000). Six allele-specific reverse primers were designed to discriminate genotypes using the PCR amplification of specific alleles (PASA) technique (Sommer et al., 1992) with 1 mM MgCl2. The number in the primer name denotes the variable 3′ nucleotide (underlined) that conferred specificity: EFRSM-817G 5′CTGAGCGGTAAAGTCAGAG (optimized annealing temperature 61 °C), EFRSM-709C 5′ATTGTCACAGGGAACGGCC (62 °C), EFRSM-643T 5′GAGATTGGCCGGGGCGAAT (64 °C), EFRSM-631G 5′GGCGAAAGTCACGACAGTG (62 °C), EFRSM-619C 5′GACAGTTCCTGGCTTCAGC (62 °C) and EFSMR-496C 5′CAGAATGGCGTCCAGAGCC (62 °C).

Anonymous loci

Of the six microsatellite loci developed for P. perniciosus (Aransay et al., 2001, 2003), only two (AAm20, AAm24) could be amplified consistently from P. ariasi. They were characterized by direct sequencing, because of the insufficient size polymorphism. Two PASA forward primers were designed for locus AAm20: AAm20F-33G 5′AGTTGAGGCTTGCGTATCCG (66 °C) and AAm20F-51C 5′CCCAGAGAGCGACGACTC (66 °C). Two PASA primers were designed for AAm24: AAM24R-151C 5′TATTCCCGCCCCACTTGGC (66 °C, 0.7 mM MgCl2) paired with forward primer AAm24F1, and AAM24F-79G 5′AGTTCAGCCGTCGCAGCAG (64 °C) paired with reverse primer AAm24R2.

Direct sequencing

Cycle sequencing of both strands used conserved or PASA primers and the BigDye Terminator v1.1 Cycle Sequencing Kit (Applied Biosystems, Warrington, Cheshire, UK) with one-eighth reactions. Sequences were read using a 3730 capillary sequencer (Applied Biosystems), manually edited in SEQUENCHER v4.6 for Macintosh (Gene Codes Corporation, Ann Arbor, MI, USA), and aligned based on open reading frames (ORFs).

Algorithm for allele inference

Most alleles in heterozygous genotypes were deduced directly (one dimorphic site) or by PASA. However, it was uneconomical to resolve by PASA genotypes present in only 1–3 specimens. Then alleles were inferred, based on estimating the number of alternatives at a single locus (rn, where r=2 because all heterozygous sites were dimorphic, and n=number of sites) and the following steps: (1) For each population, the allele and genotype frequencies were calculated based on alleles read directly (homozygotes) or deduced. (2) These alleles were aligned with each ambiguous sequence to identify allele pairs that could constitute the latter. (3) The likelihood that each genotype formed the ambiguous sequence was then ranked, based on their alleles’ frequencies in the population. Often, the selected genotype contained one allele with high local frequency (regional bias) and one unrecorded allele. (4) If alternative genotypes were equally likely, a TCS network was used to identify by statistical parsimony the least derived unrecorded allele.

Phylogenetics

Phylogenies were reconstructed to assess the species status of P. ariasi and to identify appropriate outgroups for selection tests. Trees were constructed by Bayesian estimation using MRBAYES v3.1.2 (Ronquist and Huelsenbeck, 2003; http://cbsuapps.tc.cornell.edu/mrbayes.aspx). Nucleotide substitution models were selected using the Akaike information criterion approach in MRMODELTEST v2.3 (Nylander, 2004). Each analysis was run for 10 million generations with two parallel searches, using one cold and three heated Markov chains. Trees were sampled from each chain every thousandth generation and the first 5000 samples were discarded as burn-in. All other parameters of the Markov Chain Monte Carlo analysis were default. Mixing and convergence of the two Markov Chain Monte Carlo analysis onto a stationary distribution were assessed in the sump file. TRACER v1.4.1 (Rambaut and Drummond, 2007) was used to plot log likelihood values of the cold chain against generation sample, to determine the stationary-phase's burn-in point. From this, harmonic means were taken to calculate the Bayes factors for selecting the better of two models (Kass and Raftery, 1995). A probability value of 100 was applied when multiple taxa were constrained as the outgroup.

Maximum parsimony analyses (branch-and-bound search) were implemented in PAUP* v4.0b10 (Swofford, 2002) for datasets considered to reconstruct the best Bayesian topologies. Each locus was analysed unweighted and then partitioned by codon positions weighted 2:5:1. Search parameters were default. Statistical support for trees was obtained by re-sampling using 1000 bootstrap replicates.

Genealogical network

Haplotype/allele networks were constructed in TCS v1.21 for Macintosh (Clement et al., 2000), to account for population phenomena. A 95% parsimony connection limit was set to test whether the haplotypes/alleles of P. ariasi, alone or with those of a putative sister species, formed a single network expected of a phylogenetic species (Hart and Sunday, 2007).

Testing for reproductive isolation, panmixia and independent gene assortment

The null hypothesis of random allele associations was investigated by testing for adherence to exact Hardy–Weinberg equilibrium at a single locus (ARLEQUIN v3.11; Excoffier et al., 2005) and for linkage disequilibrium across multiple unlinked loci (GENEPOP v4.0; Raymond and Rousset, 1995), making no assumptions about evolutionary factors influencing gamete frequencies or phase. For estimating the standard error and the probability of rejecting the null hypothesis, Markov chain parameters included: dememorization=10 000, batches=10 000, iterations per batch=5000.

Testing for selection on molecular markers

The CODEML program of phylogenetic analysis by maximum likelihood PAML v4.2 (Yang, 2007) was run to test if the P. ariasi branch had an accelerated rate of amino acid substitution (dN/dS or ω>1) against the background rate in a Phlebotomus phylogeny, for which it estimated the ratio of non-synonymous mutations per non-synonymous site (dN) to synonymous mutations per synonymous site (dS). The branch lengths of input gene trees were re-estimated under ML (model=0, NSsites=0 for nucleotide substitutions per codon) and then used as initial values in further CODEML analyses. Alpha was fixed at a constant rate in the control file.

The MK population test for selection (McDonald and Kreitman, 1991) was implemented in DnaSP v4.90.1 (Rozas et al., 2003). A neutral model predicts that the proportions of non-synonymous to synonymous polymorphisms within a species are linearly related to the proportions of such divergences between it and an outgroup. The significance of departures from neutrality was estimated by a two-tailed Fisher's exact test. The direction and degree of selection was assessed using the neutrality index (Rand and Kann, 1996). Sensitivity depends on the choice of outgroup (Wayne and Simonsen, 1998), which was selected on the basis of dN and dS saturation levels estimated by the approximate per site model (Nei and Gojobori, 1986; Jukes-Cantor correction in DnaSP v4.90.1) and a substitution-based method in CODEML.

Departures from the assumption of neutrality, or their demographic alternatives, were also evaluated within P. ariasi, using tests not requiring outgroups and based on mutation frequency—Tajima's D (Tajima, 1989; ARLEQUIN v3.11) and R2 (Ramos-Onsins and Rozas, 2002; DnaSP v4.90.1)—or haplotype distribution data, for Fu's Fs (Fu, 1997; ARLEQUIN v3.11). Statistical significance was estimated using 16 000 or 10 000 simulated replicates in ARLEQUIN and DnaSP, respectively. The minimum number of recombination events (Rm) was estimated in DnaSP v4.90.1.

Descriptive population statistics

The number of segregating sites (S), the number of haplotypes (h), haplotype diversity (Hd) and nucleotide diversity (π) were estimated in ARLEQUIN v3.11, partly to infer population bottlenecks and expansions.

Population structure

Pairwise FST

Estimates based on allele frequencies at each locus were obtained in FSTAT v2.9.3.2 using the exact test of Weir and Cockerham (Goudet, 2002), which is unaffected by the sampling scheme, with 1000 permutations and a sequential Bonferroni correction to derive significance levels for multiple comparisons. ARLEQUIN v3.11 was used for haploid data. FST estimates are relatively independent of mutation rate and selection (Rousset, 1997), and thus reflect genetic drift and gene flow.

Analysis of molecular variance

Analysis of molecular variance in ARLEQUIN v3.11 was used to evaluate the amount of haplotype diversity (based on haplotype frequencies and pairwise genetic distances) correlated with different nested levels of hierarchical population subdivision. Non-parametric procedures (16 000 permutations) tested the significance of the Φ-statistics.

Isolation by distance (IBD)

Association was sought between genetic differentiation and geographical proximity of population pairs (Rousset, 1997). FST values based on allele/haplotype frequencies were converted to FST/(1−FST) in GENEPOP v4.0 and geographical distances were measured from a digital map using the Distance and Azimuth Matrix v2.1 extension within ARCVIEW v3.2 (Jenness, 2005). Regression and non-parametric significance of association were implemented within the ISOLDE sub-option of GENEPOP v4.0. (1000 permutations for the Mantel test). No assumptions were made on the dispersal dimension.

Dating demographic events

MDIV analysis of cyt b haplogroups

The basic isolation with migration coalescence model in MDIV (Nielsen and Wakely, 2001; http://cbsuapps.tc.cornell.edu/mdiv.aspx), implementing likelihood and Bayesian methods, was used to estimate posterior distributions of θ (scaled parameter for nucleotide heterozygosity, 2Nfeμ), M (scaled migration rate), T (scaled gene divergence time, t/Nfe), estimated time to the most recent common ancestor (TMRCA) or gene coalescence time (tMRCA/Nfe). A Markov chain length of 3 × 106 was used with a burn-in time of 10%. Model priors included nucleotide substitution according to the finite site model (HKY) to account for multiple hits (Hasegawa et al., 1985). Preliminary Mmax and Tmax were varied to select the final optimal priors for each pairwise comparison. A minimum of three replicate Markov chains using different random seeds were run with the optimal priors, to check for convergence and consistency of the parameter estimates. The estimators of T and TMRCA were converted to years for a haploid genome according to Nielsen and Wakely (2001): t=(/2μ); and tMRCA=TMRCAθ/2μ, where μ is the mutation rate for the whole sequence per year per generation.

Mismatch distributions of cyt b haplogroups

These were calculated in ARLEQUIN v3.11, to test for sudden demographic expansion. Pairwise differences among haplotypes are unimodal for recent demographic expansions but multimodal for populations at demographic equilibrium (Rogers and Harpending, 1992). The raggedness index (Harpending, 1994) was used to quantify the smoothness of the observed distribution against the simulation, and statistical support was provided by 10 000 bootstrap replicates under the null model of expansion. Time since the beginning of an expansion event was estimated using the mode of an observed Poisson distribution expressed as the parameter τ=2μt, where μ is the mutation rate per generation for the DNA sequence, and t is the number of generations since the event's start.

Rates of cyt b sequence divergence

Dating used two rates of pairwise divergence of cyt b: 1% per million years upper limit (Esseghir et al., 2000) and 2.3% per million years lower limit (Brower, 1994). Two generation times for P. ariasi were used: one per annum (p.a.) upper limit and three p.a. lower limit (Ready and Croset, 1980).

Results

Mitochondrial cyt b phylogenetics: The cyt b fragment was sequenced for 452 out of 464 specimens of P. ariasi from all 19 populations. Pairwise-distance analysis identified 94 haplotypes (Supplementary Table B (ESI)) defined by 86 polymorphic nucleotide sites. Haplotypes were characterized from other sandflies, including sympatric Phlebotomus (Transphlebotomus) mascittii (Supplementary Table A (ESI)). The uninterrupted ORF and the lack of heteroplasmy indicated the absence of pseudogenes (GenBank accessions HQ023281-HQ023380).

With single species of the subgenera Phlebotomus and Paraphlebotomus as the outgroup, the Bayesian phylogeny of new and GenBank haplotypes (714 bp, no indels) gave strong support (posterior probability=0.7–1) for: the monophyly of all taxonomic species; subgenera Transphlebotomus and Adlerius being sister to subgenus Larroussius; and, the latter with a branching order of P. ariasi complex, P. major complex and others (Figure 2). GenBank accession AF161196 came from a female we refer to as P. (chadlii), because it is probably this species based on location and the large genetic distances between its cyt b haplotype and those of males of P. ariasi (Esseghir et al., 2000). Only the males of the two species are morphologically distinct. The haplotype of P. (chadlii) was sister to a monophyletic pruned set of haplotypes of P. ariasi, which showed a well-supported primary bifurcation of a European haplogroup B from a macro-haplogroup A. The latter contained three well-supported European haplogroups (C, A, E) but a poorly supported branching order and one unsupported Moroccan haplogroup (F). This phylogeny was the statistically favoured model (Bayes factor>6). There was greater statistical support for: partitioning by the three codon positions independently, rather than not partitioning or partitioning first+second apart from the third position; and, model selection by MRMODELTEST, rather than over-parameterizing with the GTR+I+G model. With or without weighting, maximum parsimony analyses was concordant with the Bayesian model except for not resolving the P. perniciosus complex.

Figure 2
figure 2

Bayesian phylogeny of the haplotypes of the 3’ end of cyt b (714 bp) from Phlebotomus species. Branches for subgenera, species complexes, some species, and the haplogroups of P. ariasi are labelled. Haplotypes obtained from GenBank contain the accession number in their code. Codes for species—papa: P. papatasi; cauc: P. caucasicus; masc: P. mascittii; brev: P. brevis; hale: P. halepensis; chad: P. (chadlii); aria: P. ariasi; negl: P. neglectus; majo: P. major; perf: P. perfiliewi; lang: P. langeroni; tobb: P. tobbi; pern: P. perniciosus; long: P. longicuspis; orie: P. orientalis. Cyt b was partitioned by codon positions, each with an independent substitution model selected by MRMODELTEST v2.3. Posterior probability values on branches (and to the right of haplogroups A, C, E) indicate the support for nodes (significant for posterior probability >0.7).

Nuclear gene phylogenetics

The EF-1α fragment was sequenced for 403 out of 424 specimens of P. ariasi from 18 populations (not ROQ). On the basis of 29 polymorphic nucleotide sites, 45 alleles (Supplementary Table C (ESI)) and 65 genotypes were scored (GenBank accessions HQ023381-HQ023425). Only four alleles and seven individuals’ genotypes (1.7%) had to be inferred. There were no non-synonymous mutations in a single intronless ORF. The new alleles aligned with the 720-bp alleles of Larroussius in GenBank (Esseghir et al., 2000) without introducing indels. The Bayesian phylogeny (Supplementary Figure A (ESI)) gave strong support for: the P. major complex as the outgroup; the monophyly of all taxonomic species or complexes; and a branching order of the P. ariasi complex, and others. There were no intraspecific lineages of P. ariasi. Maximum parsimony analyses gave a congruent tree.

The c. 146-bp fragment of anonymous locus AAm20 was sequenced for 396 out of 398 specimens of P. ariasi from 18 populations (not ROQ) and also for P. mascittii. Allele size variation was recorded for only 12 P. ariasi, and always within or flanking the microsatellite region identified in P. perniciosus (Aransay et al., 2001). BLAST searches found only GenBank sequence AJ303377 (P. perniciosus) with significant homology to the alleles of P. ariasi, and an interspecific alignment gave a 149-bp ORF shared by both species and P. mascittii. Bayesian phylogenetic reconstruction supported only the monophyly of P. ariasi (posterior probability=0.82). For all population genetics analyses of P. ariasi, the 5′ indel region and 3′ missing data were deleted from the alignment, and the remaining 90 bp contained 13 polymorphic sites, from which 14 alleles (Supplementary Table C (ESI)) and 19 genotypes were scored (GenBank accessions HQ026001-HQ26018). Only four alleles and five genotypes had to be inferred for seven individuals (1.8%).

The 130-bp fragment of anonymous locus AAm24 was sequenced for all 398 specimens of P. ariasi from 18 populations (not ROQ). Allele size variation was not discovered. The aligned sequences contained five polymorphic sites, from which 11 alleles (Supplementary Table C (ESI)) and 21 genotypes were scored (GenBank accessions HQ025989-HQ026000). No alleles and only one individual's genotype (0.3%) had to be inferred. Manual gap insertion permitted the alignment of the alleles of P. ariasi with a new one of P. mascittii and that of P. perniciosus (GenBank accession AJ303378), to produce a 175-bp ORF containing 13 contiguous amino acids that matched 100% the ‘Jumonji domain containing 1B’ of Nasonia vitripennis (GenBank accession LOC100120387), a transcription factor with homologues in mouse and Drosophila (Sasai et al., 2007). Phylogenetic reconstruction by maximum parsimony analysis provided strong support (80% bootstrap) for the monophyly of P. ariasi.

Genealogical networks and geographical variation

On the basis of a 95% parsimony connection limit, TCS rejected the inclusion of P. (chadlii) in the networks of cyt b (714 bp) and EF-1α (720 bp) of P. ariasi. All four loci supported the monophyly of P. ariasi and the absence of cryptic sibling species. TCS collapsed 452 P. ariasi into 92 cyt b haplotypes connected by 11 steps and identified the same haplogroups as the Bayesian phylogeny, with neighbouring haplogroups separated by three to seven mutations (Figure 3). Only haplogroup B had no reticulate loops with any other haplogroup.

Figure 3
figure 3

Parsimony network (TCS v1.21) showing the genealogical relationships between the 92 cyt b haplotypes from 452 P. ariasi. These haplotypes are shown as shaded or open circles with sizes proportional to their frequency of occurrence, which is given if >5. Smallest filled circles denote missing haplotypes. The lettered haplogroups or sub-haplogroups (B) are followed by the code of their modal haplotype (CBNN) along with their geographical distributions. All most parsimonious pathways are shown. C, central; E, eastern, NE, northeast; NW, northwest.

Geographical variation (Supplementary Table B and C (ESI)) was mapped on the gene networks of P. ariasi. Each moderate-high frequency cyt b haplogroup had one to two modal haplotypes with a shallow radiation of less frequent haplotypes derived from them by one to four mutations (Figure 3), signalling recent expansion from a small or modest number of founders. The distribution of haplogroup F was disjunct, found only in Morocco, where all specimens contained it. Haplogroup A predominated in numbers (80.0% in Europe), haplotype diversity (54), and geographical range. Except in northwest Spain, its modal haplotype (CB25) predominated in each Iberian and French population (69% overall). Haplogroup B was less abundant (15.9% in Europe), absent from Portugal and Morocco, present at low-moderate frequency in northern Spain and the southern Massif Central (4.2–12.5%), but omnipresent at moderate frequency (11.1–48.1%) in the French Pyrenees. Haplogroup C was uncommon (3.4% in Europe) but widespread in Iberia and France, whilst haplogroup E was rare (0.7% in Europe) and found only in the Central Pyrenees and northeast Spain.

Each nuclear network (Supplementary Figure B (ESI)) showed geographical structure, with all alleles from Morocco and most from Portugal being associated with just one of the two modal alleles from Europe. The EF-1α network had numerous reticulate loops, but those of AAm20 and AAm24 were simpler (0–1 loops).

Near the north of its range, P. ariasi had populations lacking genetic diversity, consistent with them being leading edge. For cyt b, haplogroup A or its modal haplotype (CB25) were near to fixation in the Massif Central with Lot, where haplogroups B and C occurred only near the Pyrenees. Each nuclear locus also showed low diversity in the Lot (one to three alleles, none private), and the distinction between the French Pyrenees and the northern Massif Central with Lot was observed for EF-1α (Predominant alleles 01 and 03, respectively), AAm20 (Allele 02 with frequencies of 22.7–35.4% and 28.3–61.4%, respectively) and AAm24 (Alleles 08 and 09 found only in the Pyrenees).

The unique demographic history of P. ariasi in the French Eastern Pyrenees was indicated by the presence of two cyt b sub-haplogroups B with nearly disjunct distributions (Figure 3). Modal haplotype CB04 and its radiating haplotypes were restricted to the Eastern Pyrenees, east from the Ariège valley, where they were omnipresent at low-moderate frequency (6.3–31.8%) and constituted 61.4% of haplogroup B. In contrast, modal haplotype CB05 had a contiguous distribution only in the bordering populations, in the Central Pyrenees and northern Spain.

No reproductive isolation between populations defined by cyt b haplogroups

Two populations of P. ariasi in the Eastern Pyrenees (PAS, ARQ) contained sufficient flies with cyt b haplogroups A or B to test for such reproductive isolation. The population size for ARQ was increased by pooling 2 years’ collections, which was considered valid because each nuclear locus remained in Hardy–Weinberg equilibrium. No evidence of biological speciation was found (Supplementary Table D (ESI)). Observed genotype frequencies did not differ from those expected in a single randomly mating population in each location: no significant deviation from Hardy–Weinberg equilibrium was found (P>0.05). No linkage between haplotypes or alleles at different loci (cytonuclear and nuclear–nuclear) was found: pairwise comparisons of linkage disequilibrium showed no significant difference from the null hypothesis of independent haplotype/allele association between loci (P>0.05).

Effectively neutral evolution of loci: population expansions

Positive directional or balancing selection was not detected at any locus using the appropriate tests. CODEML and MK investigate polymorphism in codons relative to divergence in other species and, therefore, were used only to analyse the coding fragments of cyt b and EF-1α.

CODEML provided no support for positive directional or diversifying selection along the P. ariasi branch compared with the other branches in phylogenies of cyt b (Figure 2) and EF-1α (Supplementary Figure A (ESI)). Only cyt b showed a significant difference (LRT: 2Δl=4.4; χ2df=1; P<0.05) in ω on the P. ariasi branch (0.0084) compared with the background branches (0.0202), indicating that cyt b is under greater purifying selection in P. ariasi (ω<1). It is often assumed that mitochondrial genes are under strong purifying selection, but this should always be tested (Dowling et al., 2008).

Phylogenetic analysis by maximum likelihood has low power for detecting intra-specific selection (Anisimova et al., 2002). The more sensitive MK test showed no significant departures from neutral expectation for cyt b and EF-1α of P. ariasi (Supplementary Table E (ESI)). The associated neutrality index indicated that cyt b tended towards purifying selection (not significant), there being few fixed non-synonymous substitutions, but it could not perform well for EF-1α because there were no fixed differences.

Tajima's D is more sensitive than the former tests for detecting recent selection, but significant results can arise from demography (Schmidt and Pool, 2002). No significant D value was positive for P. ariasi, indicating that balancing selection or some demographic process (population decrease, subdivision or recent bottlenecking) was not maintaining some haplotypes/alleles at intermediate frequencies. All significant D values were negative, indicating that directional selection (positive or purifying), a selective sweep/hitchhiking or an alternative demographic process (population expansion or older bottlenecking) had produced an excess of low-frequency haplotypes/alleles.

Negative values of Fu's Fs also arise from an excess of low-frequency haplotypes/alleles and, compared with D, are a more sensitive indicator of recent population expansions (or selective sweep/hitchhiking) (Ramirez-Soriano et al., 2008). For cyt b (Supplementary Table F (ESI)), 8 out of 14 French populations (haplogroup A), or 3 out of 14 (all haplogroups), had expansion signals based on significant negative values for both Fs and D. By these criteria, only two populations (one French) were expanding based on EF-1α, and none was expanding based on AAm20 or AAm24 (Supplementary Table G (ESI)).

With high recombination rates, however, Fs loses power or becomes significant in the absence of population expansion, and then the use of R2 with D can be more informative (Ramirez-Soriano et al., 2008). Considering the longer gene fragments, the estimated minimum number of recombination events (Rm) was low for cyt b (0 for 14 populations, and 1–3 for four populations only when considering all haplogroups), as expected for mitochondrial DNA, and for EF-1α (0 for nine populations, 1–2 for nine populations). Nevertheless, the significant results for R2 were compared with those for Fs and D. For cyt b (Supplementary Table F (ESI)), R2 failed to support the expansion indicated by Fs and D in 4 out of 10 populations (haplogroup A only), or 2 out of 5 (all haplogroups), and this could be explained by its lower power when recombination and S are low or expansions are not sudden (Ramirez-Soriano et al., 2008). Within the eastern Pyrenees, expansion signals were detected only for haplogroup A. The detection of expansion in one population by Fs alone and in two populations by R2 alone suggests that these statistics need further modelling.

For the three nuclear loci, no population had a significant R2, although values approached significance for EF-1α in the two populations (Central Pyrenees, northeast Spain) with significant and negative values of Fs and D (Supplementary Table G (ESI)), making population expansion the most likely explanation. The three significant values given for EF-1α by Fs alone might have arisen from recombination not population expansion.

Descriptive population diversity of P. ariasi

Low cyt b diversity was associated with northern populations (Supplementary Table F (ESI)), likely to be leading edge: the populations from Lot (LNP, RME) and the Rhone valley (DRAz4) had significantly lower haplotype diversities (Hd: 0–0.363) than the others (0.541–0.825); and those from the Massif Central, Lot and Rhone valley had lower nucleotide diversities (π: 0.00011–0.00248) than the others (0.00263–0.00839). Populations from the eastern Pyrenees had the highest π values (0.00524–0.00839), because of the diversity of haplogroups (A–C). Population HP2 was an exception to the Pyrenean trend: low Hd (0.314) can be explained by its leading-edge location, on a plain distant from the preferred forested foothills.

EF-1α was the only nuclear gene (Supplementary Table G (ESI)) to show geographical variation in diversity, from highest in the south to lowest in the north: Morocco, Portugal and northwest Spain (Hd: 0.761–0.881; π: 0.00174–0.0271); northeast Spain, French Pyrenees and southern Massif Central (Hd: 0.427–0.659; π: 0.00060–0.00138); and, northern Massif Central, Lot and Rhone valley (Hd: 0.212–0.552; π: 0.00027–0.00077).

Populations: genetic differentiation and IBD

Many demographic tests require individual populations to be panmictic. Fortunately, all populations of P. ariasi were found to be in Hardy–Weinberg equilibrium at each nuclear locus (Supplementary Table G (ESI)), except the possibly mixed population from northeast Spain at EF-1α.

Very great genetic differentiation between populations (Wright, 1978), caused by drift and limited gene flow, was signalled by very high FST values (>0.25, P<0.05) for cyt b haplogroup A (northwest Spain versus most populations except northeast Spain; Lot versus nearby Massif Central), EF-1α (northern Massif Central or Lot versus most except northwest Spain), AAm20 (Portugal versus all populations; northwest Spain versus 7 out of 13 locations), and AAm24 (Portugal versus all populations except northwest Spain; northwest Spain versus 9 out of 14 French populations; Lot versus many populations). Thus, populations from the north Massif Central and Lot showed a leading-edge effect, with FST values often being significantly high in comparisons with the French Pyrenees and northeast Spain.

Hierarchical analysis of molecular variance of all cyt b haplotypes supported three regional population subdivisions (Table 1). Among region variance was significant (P<0.05) between eastern Pyrenees, ‘Massif Central’ (including Lot and Rhone Valley) and Central Pyrenees/northeast Spain. The a priori regional groupings were supported by the homogeneity of within-region variances, except for the comparison between eastern Pyrenees and the ‘Massif Central’. This can be explained by gene flow between eastern Pyrenees and Southwest Massif Central, because excluding the latter eliminated significant within-region variance.

Table 1 Hierarchical analysis of molecular variance tests of cytochrome b haplotype frequencies for four different a priori regional divisions of the spatial populations of P. ariasi

For cyt b, IBD was not supported by a Mantel test fitting FST/(1−FST) against geographical distance (P >0.05), using European haplogroup A only to eliminate the effects of multiple haplogroup histories. Noticeably, genetic differentiation did not increase with distance from Portugal. In contrast, IBD was supported by Mantel tests (P<0.05) for each nuclear locus by including flies associated with all cyt b haplogroups from all countries (Figure 4). Some inter-regional comparisons did not support the overall result: for cyt b haplogroup A or EF-1α, there was a wide range of pairwise genetic distances between the Pyrenees and the ‘Massif Central’; and, for AAm24 genetic distances were disproportionately high between the Lot populations and most outside France.

Figure 4
figure 4

Plots and regression of genetic distance (FST/(1−FST)) on geographical distance (km) between pairs of P. ariasi populations, for nuclear loci: (a) EF-1α, (b) AAm20, and (c) AAm24. Symbols of pairwise population comparisons denote: ▪, Morocco versus all others; ▪▪▪, Portugal versus all others except Morocco (that is, France, Spain); +, NW (northwest) Spain versus all others except Morocco and Portugal (that is, France, northeast Spain); , northeast Spain versus all others except Morocco, Portugal and northwest Spain (that is, France); •, France only. Ringed data points indicate from left to right: Lot (LNP and RME) with northeast Spain, Portugal and Morocco for EF-1α; Lot (LNP and RME) with northwest Spain, Portugal and Morocco for AAm24.

Geographical distances were initially calculated as straight-line measurements. Within the ‘Massif Central’, IBD was supported for cyt b, and each nuclear locus showed positive but statistically non-significant associations with distance (16–244 km). However, for no locus was there support for IBD within either the French Pyrenees (2.85–202.57 km) or a region containing the eastern Pyrenees and the ‘Massif Central’. IBD was also tested using alternative dispersal pathways, but there was no greater statistical support for migrations from Spain to France through the western or eastern coastal foothills (800 m a.s.l.) compared with straight-line crossings of the Pyrenees.

Multiple genetic divergences and population expansions of P. ariasi in the Pleistocene

MDIV analyses were used to estimate gene coalescence (tMRCA) and divergence times (t) between all pairs of cyt b haplogroups (Table 2). On the basis of two mutation rates (μ=2.3% or 1%) and a generation time of 1 year, all divergence times (1 254 492–376 757 years ago) were dated to the second half of the Pleistocene epoch (2 588 000–10 000 years ago). The estimates would be reduced by two-thirds for three generations p.a. but, even using μ=2.3%, all divergences would still have been well before the last interglacial (Eemian, 125 000–110 000 years ago) (Gibbard and van Kolfschoten, 2004).

Table 2 Isolation with migration coalescence model in MDIV: to estimate gene coalescence and divergence times for pairs of cytochrome b haplogroups found in all populations of P. ariasi, using two mutation rates (μ)

Population expansions were investigated by mismatch distribution analyses (Supplementary Figure C (ESI)). Cyt b haplogroups A and F showed unimodal distributions of pairwise nucleotide differences among haplotypes of individual flies and, along with haplogroup C, had distributions not significantly differing from those expected under the null hypothesis of expansion (raggedness index P>0.05). On the basis of τ statistic, the 95% confidence intervals for the estimated dates of expansion overlapped for populations defined by haplogroup (Table 3). All regional groupings of populations defined by haplogroup A (eastern Pyrenees, Central Pyrenees plus northeast Spain, ‘Massif Central’) showed unimodal mismatch distributions, with significant signals of expansion (raggedness index P>0.05), suggesting that they underwent haplogroup expansions shortly before or during the last glacial period (c. 110 000–12 000 years ago). Treating haplogroup B as a single population gave a unimodal mismatch distribution, but the hypothesis of expansion was rejected (raggedness index P<0.05). Expansion was supported for sub-haplogroup B CB04 (420 168–25 452 years ago.), which occurred only in eastern Pyrenees, but not for sub-haplogroup B CB05 or its haplotypes from eastern Pyrenees.

Table 3 Mismatch distribution statistics for P. ariasi cytochrome b haplogroups (A–F) and sub-haplogroups (B_CB0N)

Discussion

On the basis of mitochondrial cyt b and three nuclear loci, P. ariasi was shown to be a single phylogenetic species, and there was no reproductive isolation between sympatric European populations defined by its two most abundant and divergent cyt b haplogroups (A and B). In addition, no locus was shown to be under strong directional or balancing selection and, therefore, it was appropriate to apply demographic tests to all populations of this sandfly using all loci.

Pleistocene genetic divergences and population expansions of P. ariasi

The MDIV datings placed all the divergences of the cyt b haplogroups in the Pleistocene, 2 588 000–10 000 years ago (Gibbard and van Kolfschoten, 2004), long after the final opening of the Gibraltar Straits, 5.5–4.9 m.y.a. (Steininger and Rogl, 1984). Therefore, any vicariant evolution was caused by other barriers, probably related to the climate oscillations that intensified in the early Pleistocene (Hewitt, 2004). Only more widespread sampling might indicate the origins of the Pleistocene ‘Eve’ of P. ariasi and if haplogoup F is restricted to the Atlas region where it was found. However, the current distribution of haplogroups is only possible if at least one made an intercontinental crossing. On the basis of Bayesian phylogeny, cyt b haplogroup B is a marker for a phylogenetic species (Figure 2). There is similar evidence for the speciation of other taxa in Spain during the Pleistocene and Holocene, including water beetles (Ribera and Vogler, 2004). However, many species might only be phylogeographical variants, associated with refugia, unless reproductive isolation or strong divergent selection (Hewitt, 2004) has been demonstrated.

Each P. ariasi contained just one cyt b haplotype, and in most European flies this belonged to one of the two most divergent haplogroups, A (80.0%) and B (15.9%). These haplogroups did not correspond directly to geographical populations, indicating a distinct and separate history for each. Therefore, we followed O’Loughlin et al. (2007) by constructing separate mismatch distributions for each haplogroup, to infer whether there had been expansions experienced by each lineage rather than the mixed-ancestry geographical populations. Our analyses showed that cyt b lineages had experienced expansions much later than their divergences, consistent with population contractions during glacial periods. The expansion of haplogroup A was the more recent (Table 3) and, despite the large confidence intervals for the dating obtained by using two extreme rates of change of cyt b, it is likely to have occurred in the late Pleistocene before the last glacial maximum, 23–18 k.y.a. (Hewitt, 2004). This is consistent with the expansion signal detected in many French populations by comparing Fu's Fs with Tajima's D and R2 statistics. On the basis of current predominance of haplogroup A in Lot and the northern Massif Central, where the species is unlikely to have survived the last glacial maximum, there must have been a post-glacial spatial expansion northwards in the Holocene. Immigration into France was probably from northeast Spain, because of the regional similarities at all loci and the broad littoral region that has often been warmer than northwest Spain (Delmas et al., 2008), but IBD analyses did not demonstrate this.

Refugial populations of P. ariasi north of the Pyrenees?

Cyt b haplogroups A–C were found both north and south of the Pyrenees, indicating that P. ariasi has not undergone any long-term bottleneck in France. The nuclear loci indicate shallow IBD, except sometimes for northern leading-edge populations (Massif Central, Lot and Rhone valley), which is also consistent with the paradigm for some temperate species (Hewitt, 2004) of a spread northwards out of Iberia in the Holocene. However, we found cyt b sub-haplogroup B CB04 to be omnipresent in the French eastern Pyrenees, but absent from all other populations in France, Spain and Portugal, and with its latest expansion date (420 168–25 452 years ago for one generation p.a.) before the last glacial maximum. Its current endemicity can be explained by a range spread from Spain into France, either in the Holocene or before the last glacial period, 110–12 k.y.a. (Gibbard and van Kolfschoten, 2004). The Holocene scenario requires sub-haplogroup B CB04 to have arrived in a phalanx-like mass displacement (see next section) of all haplogroups (A–C), whereas the Pleistocene scenario requires it to be a marker for a population that survived north of the Pyrenees during one or two glacial periods, before its refuge was invaded by inter-glacial dispersers with haplogroup A. The evidence supports the latter: the combination of Fs, D, and R2 statistics detected expansion signals only for haplogroup A of populations within the putative refuge; and, mismatch analysis demonstrated sub-haplogroup B CB04 undergoing an expansion event separate from that of sub-haplogroup B CB05, which is now widespread both sides of the Pyrenees. However, a higher density of sampling is needed to confirm the absence of sub-haplogroup B CB04 in northern Spain.

The ability of P. ariasi to survive north of the Pyrenees during glacial periods can only be inferred from its current bioclimate envelope. Its hibernating larvae can survive for weeks at 2–7 °C (Ready and Croset, 1980), and the oaks characteristic of its favoured humid and sub-humid Mediterranean bioclimates (Rioux et al., 1984) left pollen traces of their survival in southern France during the last glacial (Beaudouin et al., 2007). However, these oaks flourish in colder (Deciduous Quercus pubescens) and drier climes (Evergreen Q. ilex) (Jalut et al., 2009) than P. ariasi, and the interpretation of the pollen record is questionable (Calvet, 2004). The snow line on the northern face of the eastern Pyrenees is now higher (2700–2800 m a.s.l.) than it was at the last glacial maximum, 1400–1500 m a.s.l. (Calvet, 2004), when the altitudinal limit of P. ariasi could have dropped from the current c. 1500 m a.s.l. (Rioux and Golvan, 1969) to near sea level. The saddlebacked bushcricket is a temperate species also proposed to have had glacial refugia in the French eastern Pyrenees (Spooner and Ritchie, 2006), in the valleys where P. ariasi was sampled.

Recent post-glacial re-colonizations of P. ariasi not blocked by refugial populations north of the Pyrenees

Hewitt (1996, 1999) highlighted the effects of different modes of dispersal on re-colonization. Our study indicates that the dispersal mode of P. ariasi has often been phalanx—on a broad front, typified by IBD and mixing of cyt b haplogroups. Long-range pioneer dispersal (leptokurtic) often produces genetically homogeneous populations, which was not typical of P. ariasi and would not be expected because of a flight range limited to 0.1–2 km (Killick-Kendrick et al., 1984). Phalanx-like dispersal is least likely to have been blocked by small refugial or relict leading-edge populations that had survived glacial periods, and this fits finding cyt b haplogroup A predominating over haplogroups B and C in the eastern Pyrenees. The re-colonization of P. ariasi could have kept pace with that of its woodland habitat, which spread at a rate of 50+ m p.a. (Hewitt, 1999), and for Q. ilex produced an Iberia–Italy hybrid zone in the Rhone valley (Lumaret et al., 2002).

Monopolization currently blocking the northward spread of Pyrenean P. ariasi and potentially of leishmaniasis

There was a step change in FST values between the eastern Pyrenees and the Massif Central for EF-1α and cyt b, and this regionality was supported by the analysis of molecular variance of cyt b. We conclude that cyt b haplogroup A is a marker for the most recent population expansion of P. ariasi, and this probably started south of the Pyrenees (c. 128 294–36 869 years ago) because of the high frequency of this haplogroup and its modal haplotype (CB25) in northern Portugal and Spain. It predominates north of the Pyrenees, to the exclusion of all other haplogroups in bottlenecked, leading-edge populations in the Massif Central and nearby. Currently, stepping-stone dispersal across a lowland corridor from the eastern Pyrenees to the Massif Central might be being blocked by the monopolization (Loeille and Leibold, 2008) of the latter by sandfly populations characterized by cyt b haplogroup A and relatively large discontinuous genetic distances for both cyt b and EF-1α. This would hinder the spread of zoonotic leishmaniasis to the northern France if the Massif Central populations were found to be relatively poor dispersers or vectors. Re-forestation of the lowland corridor between the two southern uplands might increase gene flow and alter population characteristics. These working hypotheses should be tested by using more polymorphic markers (such as true microsatellites) to estimate migration rates.

Actually, leishmaniasis foci in the Massif Central are distinctive, characterized by low diversity of regional L. infantum strains, high disease prevalences in domestic dogs (the reservoir), frequent cutaneous lesions but low prevalences of visceral leishmaniasis in humans, and a preponderance of P. ariasi (Pratlong et al., 2004). The population differentiation of P. ariasi is unlikely to match that of P. perniciosus, because this alternative vector peaks at lower altitudes (Rioux et al., 1984), and has two cyt b lineages (Iberia, Italy–northern Africa) mixing in France (Perrotey et al., 2005).