Introduction

In southwest Europe and northwest Africa, the bloodsucking females of Phlebotomus (Larroussius) perniciosus Newstead, 1911 (Diptera: Psychodidae) are the most widespread sandfly vectors of Leishmania infantum (Kinetoplastida: Trypanosomatidae), the parasitic protozoan that causes most visceral and cutaneous leishmaniasis in humans and canine reservoir hosts in the region (Rioux and Lanotte, 1990). We report here the use of polymorphic microsatellite DNA loci, isolated by Aransay et al (2001), to investigate the population structure of P. perniciosus in Spain. Such studies of insect transmitters of disease are potentially important for planning intervention strategies, as they can help predict the dispersal not only of vectorial traits associated with specific populations but also of genetically modified organisms and the genes they carry. The interest of this approach has been demonstrated for the Anopheles gambiae complex, which contains most of the major vectors of human malaria in sub-Saharan Africa (Lehmann et al, 1996,2000; Donnelly and Townson, 2000).

Important changes in the distributions of ‘tropical diseases’ like leishmaniasis (Cross and Hyams, 1996) can be expected to result from climate warming, which in Europe could shift northwards the boundary of the Mediterranean subregion. P. perniciosus is restricted to the western Palaearctic region, where it is found in Portugal, Spain, France (north to Paris), southern Switzerland, the northwest Balkans, Italy, Malta, Libya, Tunisia, Algeria and Morocco (Seccombe et al, 1993). It is more abundant at lower altitudes in the Mediterranean subregion, and in northwest Africa the subhumid and semiarid bioclimate zones are preferred (Rioux et al, 1984). This subtropical distribution makes it a novel insect model for investigating the effects of climate warming in Europe, studies of which have relied on temperate insects (Hewitt, 1996,1999).

The current population structure of P. perniciosus is consistent with it having suffered a range contraction during the last Ice Age, in the Pleistocene epoch, when its European populations could have survived only in southern Spain and Italy. This conclusion comes from molecular evolution studies of mitochondrial cytochrome b gene sequences (Cyt b) (Esseghir et al, 1997,2000) and from population genetics based mostly on three polymorphic isoenzyme loci (Benabdennbi, 1998). Two main lineages were discovered: the typical lineage, extending from Morocco, through Tunisia and Malta (the type locality) to Italy; and an Iberian lineage, within which two northeastern populations were slightly differentiated at one isoenzyme locus (Benabdennbi, 1998; Mahamdallie et al, 2003). Lack of polymorphism in these markers, however, meant that population structure within the two lineages was poorly resolved.

Microsatellites are often highly polymorphic (Loxdale and Lushai, 1998) and, therefore, should be better able to resolve the current population structure of the Iberian lineage of P. perniciosus and its postglacial dispersal during the last 8000–12 000 years. Commonly used classes of microsatellites can occur in low copy numbers in sandflies (Day and Ready, 1999), and so an enrichment protocol had been modified to help isolate two classes of trinucleotide repeats (AGC and AGG) from P. perniciosus (Aransay et al, 2001). Five polymorphic trinucleotide microsatellite loci were used in the present study, one of which was not described by Aransay et al (2001).

Materials and methods

Sample collection

A total of 21 population samples were collected from 16 localities (Table 1, Figure 1) within four Mediterranean bioclimate zones in Spain (Anon, 1965; Morillas-Marquez et al, 1983). Sandflies were captured overnight in CDC (Centers for Disease Control) miniature light traps, placed peridomestically inside or near stables and houses. They were stored either dry at −80°C or in 96% ethanol at −20°C.

Table 1 Descriptions of the populations analysed, the samples and their collection details
Figure 1
figure 1

Distributions of P. perniciosus populations and their microsatellite alleles in Spain. (a) Map of Spain showing the locations of the populations analysed. Bioclimates (Anon, 1965; Morillas Marquez et al, 1983): light grey for temperate, medium grey for supramediterranean, dark grey for mesomediterranean and black for thermomediterranean. The dotted net covers the distribution of P. perniciosus (Houin, 1965; Gil Collado et al, 1989). (b) Variation in allele frequencies at five microsatellite loci in the northeastern and southern regions. The base pair (bp) numbers refer to the sizes of the more common alleles.

DNA extraction

The heads and genitalia of individual sandflies were dissected and slide-mounted in Berlese fluid, prior to morphological identification (Martinez-Ortega and Conesa-Gallego, 1987). Total genomic DNA was extracted from the thorax and anterior abdomen of each specimen (Ready et al, 1991), before being resuspended in 1 × TE buffer (10 mM Tris pH 8.0, 1 mM EDTA pH 8.0) and stored at −20°C.

PCR amplification of microsatellite loci and genotype scoring

Microsatellites were previously isolated from sandflies collected in Turre (Almería province), a putative Pleistocene Ice Age refuge in southern Spain. Locus-specific primers were designed based on the sequences flanking the microsatellites, and PCR conditions were established for four of the five loci used in the present study – AAm13, AAm20 and AAm82 with AGG-class repeats and AAm24 with AGC-class repeats (Aransay et al, 2001). A new AGC-class locus is reported here for the first time. AAm10 contained seven CAG repeats in the original clone (GenBank accession no. AJ516955 and its alleles were amplified with the primers (5′–3′) CACTCCCGGCACTGCTCAC and 6FAM-GCAACTGAAGCTGGAGCAGC (with fluorescent marker dye in bold) at annealing temperatures of 57°C (first stage) and 55°C (second stage).

Microsatellite alleles were amplified by PCR from single-sandfly DNA extracts and aliquots of the products were quantified on 2% agarose gels to estimate dilution factors for the fluorescent dye labels (Aransay et al, 2001). The amplified allelic DNA fragments were sized using an automated ABI377 sequencer in the PE-GeneScan mode, by multiplexing all five loci of each sandfly in a single lane of a polyacrylamide gel. Genotyper software (PE Applied Biosystems) was used to score the alleles.

PCR amplification and sequencing of a fragment of mitochondrial cytochrome b (Cyt b)

A 488-base pair (bp) fragment of Cyt b was amplified with primers CB1-SE (5′-TATGTACTACCCTGAGGACAAATATC-3′) and CB3-R3A (5′-GCTATTACTCC(T/C)CCTAACTT(A/G)TT-3′). Cycle sequencing was carried out on both strands with CB3-R3A and a modified sense-strand primer, CB1 (5′-TATGTACTACCATGAGGACAAATATC-3′), using an ABI PRISM Big Dye™ Terminator Cycle Sequencing Ready Reaction Kit v. 2.0 and ABI373/377 sequencing systems (ABI, PE Applied Biosystems). The primers and conditions were described by Esseghir (1998) and Esseghir et al (2000). It should be noted that the latter failed to mention that CB1-SE, not CB1, was used for PCR amplification.

Data analysis

Genotype distributions were tested for departure from Hardy–Weinberg equilibrium at each locus in each population, using Arlequin version 2.000 (Schneider et al, 2000) that incorporated a modified version of the Markov-chain random walk algorithm described by Guo and Thomson (1992).

FSTAT version 2.9.3.1 (Goudet, 2001) was used to calculate allele frequencies at each locus, to estimate the pairwise FST statistic with a weighted analysis of variance according to Weir and Cockerham (1984), and to test the statistical significance of the pairwise FST values after standard Bonferroni corrections.

Geographical distances between sample sites were calculated using the measuring tool in Atlas Mundial Microsoft® Encarta® 99 (Anon, 1998), so that isolation by distance could be examined by a Mantel test (with 1000 permutations) using Genepop (version 3.3) (Raymond and Rousset, 1995). Genepop (version 3.3) was also used to test for linkage disequilibrium and to calculate the Rho statistic (RhoST) as described by Michalakis and Excoffier (1996).

Cyt b sequences were aligned and analysed in Sequencher™ 3.0 (ABI, PE Applied Biosystems).

Results

Allelic variation at five microsatellite loci

Alleles at the five microsatellite loci were successfully amplified from all 384 P. perniciosus (187 males and 197 females) obtained from 21 samples (16 localities) in Spain (Table 1, Figure 1). No locus failed to amplify in any single sandfly DNA extract. The number of sandflies available and characterized per sample ranged from 1 to 38, and so some samples had to be pooled for population genetics analysis. This was based on 13 populations, four of which were pools. Pooled populations were only made with samples collected on different dates (1–20 days apart) but from the same location (MAD-Boa, RIO-Tud, BAR-San), or on different dates (1–3 days apart) from nearby locations (RIO-NW). This was justified by the homogeneity of samples within each of the two geographical regions (Figure 1), explained in the next section.

The total number of alleles per locus ranged from 4 to 9. Allele frequencies for all loci in each population are shown in Table 2. Sex-specific alleles were not observed for any of the loci. Locus AAm24 was neither informative for analysing population structure nor significantly polymorphic: the frequency of its most common allele (208 bp) was greater than 0.95 except in the MUR-Sis population (0.943). However, locus AAm13 was informative, even though the frequency of its most common allele (178 bp) was 0.96 overall. In contrast, the loci AAm10, AAm20 and AAm82 were polymorphic by definition in each and every population – the frequency of the most common allele at each being <0.95 (Wright, 1978) – as well as being informative for analysing population structure. Southern populations (5/8), not northeastern populations, had private alleles at four loci (not AAm82): mean frequency 0.033 (range: 0.013–0.053).

Table 2 The allele frequencies at each microsatellite locus recorded by population. N=population size

The ranges of observed and expected heterozygosity values were 0.026–0.750 and 0.052–0.683, respectively (Table 3). Deviations from Hardy–Weinberg (HW) expectations were only significant (P<0.05) for locus AAm82, in the two populations RIO-NW (P=0.027) and BAR-San (P=0.005), undoubtedly because these were the most heterogeneous of the pooled populations in terms of locations and dates, respectively.

Table 3 Observed (HO) and expected (HE) heterozygosities at each microsatellite locus. P=significance of the deviation from Hardy–Weinberg expectations; N = population size

No significant linkage disequilibrium was detected between pairs of loci in any population, except for AAm10/AAm13 in population GRA-Tre (P=0.021) and for AAm20/AAm82 in RIO-NW (P=0.027).

Population structure indicated by the allelic variation at the microsatellite loci

The genetic differences between pairs of populations, quantified with the FST statistic, ranged from 0.0006 to 0.4104, and clearly indicated the existence of two regional groups of populations (Table 4, Figure 2). Thus, FST estimates were very low (0.0006–0.0156) within a group containing all eight populations from seven southern locations (HUE-Rio, GRA-Alf, GRA-Tor-01 (1999), GRA-Tor-02 (2000), GRA-Tre, ALM-Tur, MUR-Sis and MAD-Boa); and, they were also very low (0.0017–0.0414) within a second group consisting of all five populations from eight northeastern locations (RIO-NW, RIO-Rob, RIO-Tud, TAR-Tor, BAR-San). In comparison, the FST values for inter-regional pairs of populations were very high (0.2357–0.4104). Within each region, FST values were not significantly greater than could be expected by chance (P>0.05, after standard Bonferroni corrections), but all FST values between regions were significant at the 1% level (Table 4). Most pairwise FST values were greater when locus AAm13 was removed from the analysis (in consideration of the absence of polymorphism in all northeastern populations), but the statistical significance of the results remained the same.

Table 4 Matrix of genetic differentiation and geographical distances between pairs of populations. Below diagonal: pairwise FST statistics; above diagonal: geographical distances (km)
Figure 2
figure 2

Variation of genetic differentiation between populations with geographical distance. Pairwise FST estimates are plotted against geographical distances (km) between pairs of populations within the southern region (Black circles. Spearman Rank correlation coefficient to fit FST to distance: a=−0.002525, b=−0.000004; P=0.732 for greater correlation and P=0.268 for lesser correlation, using Mantel test with 1000 permutations), within the northeastern region (Grey circles. a=0.022355, b=−0.000025; P=0.763 for greater correlation and P=0.270 for lesser correlation), and between regions (open circles).

RhoST estimates (data not shown) were larger than the FST ones for most of the comparisons within the southern group, but smaller for most of the comparisons within the northeastern group. For inter-regional comparisons, RhoST values were higher than FST estimates.

The existence of these two regional groups of populations was also evident from an inspection of allele frequency differences at loci AAm13, AAm10, AAm20 and AAm82 (Table 2, Figure 1). At locus AAm13, alleles 175, 181 and 184 did not occur in any of the northeastern populations, which were homozygous for allele 178. For locus AAm10, the frequencies of the two common alleles showed nonoverlapping ranges in the southern and northeastern groups of populations, respectively: 0.141–0.200 and 0.385–0.652 for allele 103; and, 0.771–0.859 and 0.348–0.615 for allele 106. Similar, nonoverlapping regional variation was observed for locus AAm20 (0.143–0.276 and 0.833–0.923 for allele 181; and, 0.686–0.839 and 0.077–0.167 for allele 190) and locus AAm82 (0.000–0.113 and 0.278–0.385 for allele 141; and, 0.661–0.781 and 0.288–0.538 for allele 153).

There was a total of 29 alleles at all five loci. All occurred in the southern group of populations, but only 16 were found in the northeastern group (Table 2). This reduced allele diversity in the northeastern group was more notable for loci AAm10 (two alleles found out of a total of four), AAm13 (one out of four) and AAm20 (three out of nine). Less differentiated regionally were loci AAm82 and AAm24 (each with five alleles found in the northeastern group, out of a total of six). Furthermore, locus AAm24 was strictly monomorphic for its most common allele (208) in some populations, both in the south (HUE-Rio, GRA-Alf) and in the northeast (RIO-NW and RIO-Tud).

Pairwise FST estimates were plotted against geographical distances (Figure 2, Table 4) to illustrate the absence of any support for isolation by distance. When assessed by Mantel tests, there was no significant positive or negative correlation between pairwise FST estimates and geographical distance, neither for the total data nor within each of the southern and northeastern regions, where there was very low genetic differentiation (FST<0.05) compared with the very high inter-regional genetic differentiation (FST>0.23). The same result was obtained when FST/(1−FST) was substituted for FST and either of these measures of genetic differentiation (Rousset, 1997) was correlated with loge distance.

Identity of the mitochondrial Cyt b lineage in northeastern populations

New Cyt b sequences were obtained for a total of 24 specimens from the northeastern region: six (three males and three females) from RIO-NW, three (all females) from RIO-Rob, six (three males and three females) from RIO-Tud, three (all females) from TAR-Tor and six (three males and three females) from BAR-San. Additionally, sequences were obtained for the first time from specimens (three males and three females) of the most northerly population (MAD-Boa) of the southern region.

All Cyt b sequences belonged to the Iberian lineage according to the fixed nucleotide differences between it and the typical lineage found in northwest Africa, Malta and Italy (Esseghir et al, 2000). In the alignment of the nucleotide (nt) sequences of the Iberian and typical lineages (GenBank accession numbers AF161205 and AF161204, respectively), the bases at the diagnostic positions were: nt 9-G (not A), nt 54-A (not G), nt 81-G (not A), nt 138-T (not C), nt 375-C (not T) and nt 420-T (not C).

Discussion

Allelic variation and evolution at the five microsatellite loci

Allelic variation at four out of the five trinucleotide microsatellite loci proved to be informative for investigating the population structure of P. perniciosus in Spain. Only locus AAm24 was found to be neither informative nor polymorphic by definition (ie the overall frequency of its common allele was >0.95). In this sense, locus AAm13 was also not polymorphic, but it did help to define the two regional groups of populations. The small amount of actual polymorphism at these two loci could have resulted from the presence of null alleles, which failed to amplify because of at least one difference in a flanking region sequence. Unlike AAm13, the other three loci (AAm10, AAm20, AAm82) were not only informative about population structure but also showed no evidence of a null allele effect. In all unpooled and large populations (N=19+), the genotype frequencies at these loci conformed to HW expectations, and no significant association between loci was detected by linkage disequilibrium analysis. This indicated that their alleles were selectively neutral and being inherited in an independent and Mendelian fashion.

The number of alleles at each trinucleotide locus (4–9) in P. perniciosus was intermediate between the number (3–15) reported for An. gambiae (Lehmann et al, 1996) and that (2–6) reported for Drosophila melanogaster (Schug et al, 1998). This variation could reflect the different histories of the populations of these species, as well as differences in the evolutionary rates of different microsatellite classes and loci. Dinucleotide microsatellite loci may be more polymorphic, eg An. gambiae had 12–14 alleles per locus in sub-Saharan Africa (Lehmann et al, 1996) and its sibling species, An. arabiensis had 11–19 alleles per locus in East Africa (Donnelly and Townson, 2000).

FST values were lower than RhoST estimates for most of the comparisons between populations of P. perniciosus, which is expected from the stepwise mutation model (SMM) of microsatellite evolution, rather than the infinite alleles model (Jarne and Lagoda, 1996). However, as other parameters influence the behaviour of both statistics, this result cannot be taken as proof of the SMM.

Evidence for two distinct regional populations of P. perniciosus in Spain

The results provide strong evidence for population subdivision of P. perniciosus in Spain. Four out of the five microsatellite loci showed significant variability between, but not within, two regional groups of populations. One regional group included all eight populations from southern Spain, and the other comprised all five populations from the northeast. Pairwise FST estimates within each regional population were less than 0.05, which according to Wright (1978) is considered as indicating little genetic differentiation. In contrast, most inter-regional pairwise FST values were greater than 0.25 (two were 0.23–0.24), which indicates very great genetic differentiation. Some of the intra-regional FST (and RhoST) values were negative, indicating that the variation was sometimes higher within populations than between them. This suggests that the populations within regions were panmictic, which is supported by the absence of evidence for isolation by distance (Figure 2).

Origins and continued isolation of the two regional populations

There was reduced allele diversity in the northeastern region, where only 16 out of the 29 alleles were recorded, compared with the southern region (all 29 alleles found). By analogy with An. gambiae (Lehmann et al, 2000), the lack of unique alleles and the higher FST than RhoST values among northeastern populations of P. perniciosus both suggest that pure drift was the main differentiating process for the northeastern region of Spain. This is consistent with the hypothesis of historical migration from a southern Ice Age refuge, with the northeastern population arising as a northern peripheral isolate in which founder effects and bottleneck events led to the loss of rare alleles and significant changes in allele frequencies, as argued for temperate European insects (Hewitt, 1996,1999). The two Spanish regional populations were shown to share the mitochondrial Cyt b lineage previously reported as characteristic of the Iberian race of P. perniciosus (Esseghir et al, 1997; Mahamdallie et al, 2003), and so they are likely to have become isolated in the postglacial Holocene epoch (starting 12 000 y.a.), long after the divergence of the Iberian and typical races, which was dated well before the end of the Pleistocene by the Cyt b molecular clock (Esseghir et al, 2000).

The absence of any significant correlation between genetic differentiation and geographical distance suggests that the two regional populations still remain isolated, in spite of the species' continuous distribution (Figure 1). It is unknown whether the barriers are pre-/postmating or environmental. Maternally inherited Wolbachia can cause reproductive incompatibility between populations of some Diptera, and a strain has been isolated from laboratory-reared P. perniciosus (Ono et al, 2001). However, there is no direct evidence for genetic barriers between populations of P. perniciosus.

So far, the northeastern group of populations of P. perniciosus has been found only in the valley of the River Ebro and the adjacent Mediterranean coastal plain. Much of this region is isolated from the south by a mountain range (Sistema Ibérico), and the Ebro valley has some distinctive climatic, floristic and faunistic characteristics (Montserrat and Montserrat, 1988; Ribera and Blasco-Zumeta, 1998). However, we sampled P. perniciosus from a range of bioclimates, altitudes and habitats in both the northeastern and southern regions (Table 1, Figure 1) without detecting any obvious associations between allele frequencies and environment.

Epidemiological significance of the regional populations

The low rates of inter-regional gene flow in P. perniciosus suggest that the population structure of this vector might constrain the dispersal of genes and traits of epidemiological importance. Reduced hybrid fitness in a ‘tension zone’ (Arnold, 1997) between the two regional populations could prevent the dispersal of each through their neighbour's range, as discussed for other European insects by Hewitt (1996, 1999).

There are two likely explanations for the low FST values between pairs of populations within each region: either some populations have become isolated, but there has been no significant change in allele frequencies in any population since the relatively recent postglacial dispersal (no drift, no natural selection, no evolution of new alleles); or no population has become permanently isolated, and there is continuing high gene flow between contiguous populations over distances up to 500 km. The latter seems more likely, although it could be difficult to reconcile with the supposedly ‘weak flight capabilities’ of sandflies (Cárdenas et al, 2001). Weak flight has been used to help explain the limited gene flow (estimated from alloenzyme data) between Colombian populations of Lutzomyia longipalpis (Morrison et al, 1993) or Lutzomyia shannoni (Cardenas et al, 2001; Mukhopadhyay et al, 2001). However, the present results are consistent with the knowledge that adults of Phlebotomus (Larroussius) species can travel up to 2.2 km in as little as 3 days during the Mediterranean summer (Killick-Kendrick et al, 1984). Our results suggest that leading-edge populations of P. perniciosus should be able to disperse relatively rapidly into northwest Europe, if climate warming provides suitable environmental conditions.