Introduction

A major question in evolutionary biology is whether interspecific hybridisation is a significant route for the transfer of genetic adaptations (Barton, 2001). Most studies of introgression have focused on plant systems (Martinsen et al, 2001). Where animal studies have been performed they have been in tractable systems such as hybrid zones (Evans et al, 2001), where species are morphologically distinguishable (Beaumont et al, 2001) or where exotic species have been introduced (Goodman et al, 1999). In these studies, the organisms have evolved separately for a long period and introgressed alleles can be readily identified due to mutational differences or differing allele distributions (Goodman et al, 1999; Roques et al, 2001).

The Anopheles gambiae complex of mosquito species forms the most important insect disease vector system. Of the seven recognised species within the complex, A. gambiae s.s. and A. arabiensis are the most abundant and widespread, occurring in sympatry over most of their distributions. In sub-Saharan Africa they are the primary malaria vectors, the main vectors of Bancroftian filariasis in rural areas and have localised importance in arbovirus transmission. In these and many other disease vector complexes, it is difficult to identify introgressed alleles, possibly because the vectors have only recently speciated and there has been insufficient time to accumulate mutations (Coluzzi, 1982; Powell et al, 1999; Walton et al, 2000). Hybrids occur naturally in the wild, in some regions at frequencies of up to 2 per 1000 (White et al, 1972; Tripet et al, 2001) but the evolutionary significance of these hybrids is unclear. It is not known if genes are actually introgressed into parent species by backcrossing of fertile F1 hybrid females (males are sterile). In laboratory experiments polymorphic chromosomal inversions can be transferred between species (della Torre et al, 1997). Some workers have postulated that certain chromosomal inversions, that are associated with aridity tolerance, may have introgressed from A. arabiensis into A. gambiae and have enabled this species to spread into novel habitats (Powell et al, 1999).

Phylogenetic approaches to resolving taxa relationships in these species have been hampered by high intraspecific and low interspecific variation (Besansky et al, 1994; Krƶywinski et al, 2001). Previous studies of partial sequence of mitochondrial DNA have shown that large numbers of haplotypes are shared between A. gambiae, A. arabiensis and A. bwambae (Besansky et al, 1997; Thelwell et al, 2000; Donnelly et al, 2001), but whether this is a result of introgression or retention of ancestral polymorphisms is unresolved (Besansky et al, 1997; Thelwell et al, 2000). In this study, we examine patterns of intra and interspecific differentiation in ways that can allow us to distinguish between contemporary introgression and retention of ancestral mitochondrial sequences. We propose that if introgression is ongoing, FST values between sympatric populations will be lower than between interspecific allopatric comparisons. Furthermore, this approach can reveal locale specific introgression and can determine if introgression occurs in a unidirectional or bidirectional manner.

The data in this study also serve as additional markers to compare with the large number of microsatellite-based studies that have investigated population structuring in these organisms on micro- and macrogeographic scales (Lehmann et al, 1997, 2003; Kamau et al, 1998; Lanzaro et al, 1998; Simard et al, 1999; Donnelly and Townson, 2000; Wondji et al, 2002).

Materials and methods

Mosquitoes were collected from 25 sites throughout sub-Saharan Africa (Figure 1 and Table 1). Most locations have been described in detail previously (Besansky et al, 1997; Lehmann et al, 1997, 1998, 2003; Donnelly and Townson, 2000; Pinto et al, 2002). Species identification, collection location and sample size are given in Table 1. Adult mosquitoes were obtained from houses within a village by resting and pyrethrum knockdown collections. A sample consisted of specimens collected in an area with a radius less than 1 km. Previous studies revealed no population subdivision within and among adjacent villages separated by 10–50 km and that mosquitoes within a house represent a random sample of the population (Petrarca and Beier, 1992; Besansky et al, 1997; Lehmann et al, 1997; Donnelly et al, 1999). To maximise the power of interspecific analyses specimens were pooled from the three regions where A. gambiae and A. arabiensis occur in sympatry, Kenya, Senegal and Malawi.

Figure 1
figure 1

Location of sample sites. Species ranges are marked for A. arabiensis (stippled) and for A. gambiae (grey). More detailed information for sampling locations 12–19 can be found in Besansky et al (1997). The key to locations is indicated by the location superscripts in Table 1.

Table 1 Geographic distribution and details of ND5 haplotype for each sample

Recent work suggests that there are two forms of uncertain taxonomic status within A. gambiae, which may be characterised by their rDNA molecular type (Favia et al, 1997; Wondji et al, 2002). All the samples used in this study are the S rDNA type except samples from Senegal and Ghana, which are from populations that only exhibit the M form (Lehmann et al, 2003) and the sample from Sao Tome and Principe which is also M form (Pinto et al, 2003). There is no evidence for reproductively isolated forms within A. arabiensis. After DNA extraction (Lehmann et al, 1997; Donnelly et al, 1999) and PCR-based species identification (Scott et al, 1993) individuals were sequenced over a 650 bp stretch of the ND5 region of the mitochondrial genome following the protocols of Besansky et al (1997). DNA sequencing was performed on ABI 377 or ABI 3100 machines (Perkin-Elmer) using fluorescent labelling technology and standard analytical protocols. Sequences were unambiguously aligned using the Clustal W option in Sequence Navigator (Version 1.0.1; ABI Systems, CA, USA).

FST-based methods

We estimated intra- and interspecific differentiation between samples using FST (Hudson et al, 1992). A weighted means approach was used to account for differences in sample size. Significance of FST values was estimated using a permutation program written in the SAS language with 2000 replications (SAS Institute, 1990). Pairwise estimates of FST were used as distance measures to generate a neighbour-joining (NJ) tree using Mega V2.1 (Kumar et al, 2001).

Phylogeography

A haplotype tree was constructed using the statistical parsimony algorithm of Templeton et al (1992). The TCS 1.13 program of Clement et al (2000) was used to estimate the haplotype tree.

Results

Complete and unambiguous sequence of a 650 bp section of the ND5 region of the mitochondrial genome was obtained for a total of 331 specimens (Table 1). In addition to newly generated sequences of 52 A. gambiae and 61 A. arabiensis specimens, we used data from 68 A. gambiae and 50 A. arabiensis published by Besansky et al (1997) and 45 A. gambiae published by Lehmann et al (1997) and 55 A. arabiensis for which summary statistics were published previously (Donnelly et al, 2001). Reference sequences have been deposited in Genbank (accession numbers AY312090-AY312142). No nuclear/nonfunctional copies of the gene were present as evidenced by the very low pairwise divergence, the absence of stop codons, the predominance of synonymous substitutions and the unambiguous electropherograms.

Intraspecific estimates of differentiation

A. arabiensis

Pairwise estimates of differentiation as inferred from FST statistics were generally low (Figure 2a; mean±SE=0.098±0.027; Data matrix available from authors). Strong evidence for structuring was observed between the sample from the island of Reunion and continental populations. Only a single haplotype was present in the Reunion sample (n=6) probably reflecting a lower effective population size (Ne) on the island. Therefore, as the lower Ne will complicate interpretation of differentiation this sample was excluded from the analysis. Remaining pairwise FST values were used to construct an NJ tree (Figure 2a). Three clusters were observed: Sudan and Senegal (cluster 1); Western Kenya and Ethiopia (cluster 2) and Malawi (cluster 3).

Figure 2
figure 2

FST distance-based trees for A. arabiensis (a) and A. gambiae (b) recovered from partial 650 bp ND5 mitochondrial DNA sequences. The trees were constructed using an NJ method based upon a pairwise interpopulation values of FST. Branch lengths (see scale bar) are proportional to the mean pairwise FST values. For the A. arabiensis data there were no significant pairwise comparisons of FST within each of the clusters (P>0.05). When samples from different clusters were compared, FST values were significantly different (P<0.05) for 30% (clusters 1 and 3), 40% (clusters 1 and 2) or 50% (clusters 2 and 3) of pairwise comparisons. For the A. gambiae data when samples from different clusters were compared, FST values were significantly different (P<0.05) for 56% (clusters 1 and 3), 62% (clusters 1 and 2) or 50% (clusters 2 and 3) of pairwise comparisons.

A. gambiae

The NJ tree suggested a primary division between samples from West African/western Kenya (clusters 1 and 2) with samples from Tanzania and Malawi (cluster 3) (Figure 2b). The genetic division corresponds with the topographic division of populations by the Rift Valley complex. Ghanaian and Senegalese populations grouped closely (cluster 1), while Nigerian specimens grouped more closely with populations from Western Kenya (cluster 2). These groupings run contrary to the geographical distance. Notably, cluster 1 comprises M form specimens whereas S form specimens are found in clusters 2 and 3. There was only one significant pair wise comparison of FST within the clusters (cluster 3 Thyolo-Wathrego) but over half of the pairwise comparisons between clusters were significant (Figure 2). The Wathrego sample and the Escarpment sample are both from Western Kenya and the grouping with Tanzanian and Malawian samples from east of the Rift Valley Complex is likely to be a result of the lower sample size of these Kenyan samples (Lehmann et al, 2000). Pairwise values of FST were high for comparisons involving Wathrego and Escarpment but resampling tests were insignificant (data matrix available from authors).

Haplotype networks

For these data haplotypes separated by up to 11 mutational steps have a probability of 0.95 of being connected in a parsimonious manner. The statistical parsimony algorithm produced networks for both species within the 11-step limit of parsimony but there were a number of ambiguities within the tree, apparently due to homoplasy within the mitochondrial sequences. These ambiguities were resolved following the suggestions of Templeton et al (1992) and Crandall and Templeton (1993) (Figures 3 and 4). In general this resulted in the haplotype being placed externally and linked to one of the high-frequency internal alleles. The high levels of homoplasy observed in our data are common in Anopheles (Walton et al, 2000). At present the evidence suggests that the homoplasy reflects mutational hot spots rather than recombination. There was no evidence for heteroplasmy in our electropherograms and of the 72 polymorphic sites within the data set there were 10 sites with three variants and one site with four variants suggesting large variation in mutation rates across the locus.

Figure 3
figure 3

Statistical parsimony network for A. arabiensis data set. Haplotype node area is proportional to the number of specimens contained. Links between nodes are all single mutational steps regardless of length. Shaded nodes are haplotypes that are found in both species.

Figure 4
figure 4

Statistical parsimony network for A. gambiae data set. Haplotype node area is proportional to the number of specimens contained. Links between nodes are all single mutational steps regardless of length. Shaded nodes are haplotypes that are found in both species.

Introgression and ancestral retention

There were no fixed nucleotide differences between the species and 17 out of 113 haplotypes were shared between species (Table 1). Mean FST (±95% CI) for sympatric (n=3) and allopatric (n=6) comparisons were 0.070 (±0.104) and 0.121 (±0.068), respectively (Table 2) (P>0.05). While estimates of FST involving A. gambiae from Malawi were nonsignificant (P>0.05) the two allopatric comparisons involving A. arabiensis from the same sample locations were highly significant (P<0.001) (Table 2).

Table 2 Interspecific pairwise estimates of FST

Of the shared haplotypes only two (Nos. 2 and 33 Table 1) were common to A. arabiensis and A. gambiae M and S forms, and were internal to both species haplotype trees (Figures 3 and 4). The remaining shared haplotypes were found predominantly in A. arabiensis and S form comparisons (n=11) rather than between A. arabiensis and M form comparisons (n=2). The shared haplotypes are found throughout the species distribution of A. gambiae although there is an apparent clustering of shared haplotypes in certain populations. Kyela, Tanzania; Mkali, Malawi; Kisian, Kenya and Dienga, Gabon. Five shared haplotypes were found in samples of A. arabiensis from Ethiopia, Sudan and Reunion (Table 1). A. gambiae is absent from the samples collected from these countries and therefore the presence of these haplotypes must reflect noncontemporary processes such as incomplete lineage sorting between the two species or historical introgression events. As would be predicted if ancestral polymorphisms were retained, the majority of shared haplotypes in these populations (four of five; Figure 3 and Table 1) were internal to the network. Similarly in the single species collection from Gabon, where A. arabiensis is absent, there were also shared haplotypes, again internal to the haplotype network (Figure 4 and Table 1) and reflecting a noncontemporary process. In A. gambiae except for two populations the proportion and frequency of haplotypes that were common to both species was between 22–50 and 17–50%, respectively. However, in samples from Tanzania and Malawi these figures rose to 75–78 and 67–68%, respectively.

Discussion

These data provide a corollary of the results of microsatellite-based macrogeographic studies of differentiation in both species (Lehmann et al, 1997, 2003; Kamau et al, 1998; Lanzaro et al, 1998; Simard et al, 1999; Donnelly and Townson, 2000). In general, as in microsatellite-based studies the levels of population differentiation were lower in A. arabiensis (Figure 2) (Donnelly and Townson, 2000; Lehmann et al, 2003) and the only large estimates of FST involved an island population which is likely to have experienced founding effects/genetic bottleneck (Simard et al, 1999; Donnelly et al, 2002). Surprisingly, samples of A. arabiensis from Senegal and Sudan were grouped in a cluster, which runs contrary to geographic distance. Similar patterns have been observed in A. gambiae and were thought to reflect similarities in ecological zones and the absence of topographic barriers to gene flow such as those that may isolate Sudanese samples from those to the south (Lehmann et al, 2003).

In A. gambiae there was a clear distinction between populations of M and S rDNA forms and within the S-form comparisons between samples from the West and East of the Rift Valley complex. These data are in accordance with the studies of Wondji et al (2002) and Lehmann et al (2003) that provided evidence for a degree of genetic isolation between M and S forms. The failure of other studies to detect differences between M and S forms reflects how recent the disruption of gene flow must have been (Gentile et al, 2001) or conversely that ongoing gene flow may be homogenising allele arrays in both forms, outside certain regions of the genome. This study utilised haplotype frequency-based approaches, which are more sensitive to more recent separation events since they are not reliant upon the accumulation of infrequent mutation events.

Introgression and ancestral retention

When contemporary introgression can be discounted, due to absence of one of the species from a locale, there are large numbers of haplotypes shared between species. Whether these shared sequences are true ancestral retentions or traces of recent introgression events cannot be determined and caution against definitive statements. These data suggest that for closely related taxa even very extensive mitochondrial DNA data sets may have insufficient power to conclusively resolve between the conflicting hypotheses of ancestral retention and contemporary introgression in certain populations.

Mitochondrial introgression is the most parsimonius explanation for the similarity in haplotype arrays between some sympatric populations of A. gambiae and A. arabiensis but introgression is apparently not an ubiquitous phenomenon. If introgression was occurring between all sympatric populations of A. gambiae and A. arabiensis then we would expect interspecific estimates of FST to be significantly lower in sympatric rather than allopatric comparisons. This was not observed in these data. However, interspecific FST analyses involving A. gambiae from Malawi were all nonsignificant whereas those interspecific comparisons involving Malawian A. arabiensis and A. gambiae from Kenya and Senegal were both highly significant (Table 2). This suggests that, since the haplotype distributions in Malawian A. gambiae are similar to all three A. arabiensis distributions, introgression in Malawi is likely to be a unidirectional process from A. arabiensis into A. gambiae. However, it should be noted that sample size was lowest in the sample of A. gambiae from Malawi and that mtDNA-based phylogenies using colonised specimens of A. gambiae and A. arabiensis suggested that introgression may have occurred in the opposite direction (Caccone et al, 1996). However, these two species have a far wider species distribution than other members of the complex. This is likely to result in higher effective population size and therefore since genetic drift will be lower there may well be a greater retention of ancestral haplotypes in these species than in other members of the complex, despite the possible closer phylogenetic proximity of different species pairs. There were insufficient data to apply similar tests to samples from Tanzania but a large number and proportion of haplotypes were shared between sympatric populations in Tanzania suggesting that introgression may be occurring in this region as well. A. arabiensis occurs at much higher frequencies than A. gambiae in the study sites in Tanzania and Malawi (Charlwood et al, 2000; Spiers et al, 2002) and differing species' density is thought to be one of the major determinants of introgression (Avise and Saunders, 1984). Whether introgression is frequent enough to play a role in the differentiation within A. gambiae S form that is observed either side of the Rift Valley remains to be investigated.

An analysis of the relative position of shared haplotypes on the species networks, based on the assumption that on average the haplotypes found interior to the haplotype network should be older than those found at the tips, was also suggestive of ongoing introgression. If there is contemporary introgression both old and more recently derived haplotypes are equally likely to cross the species barrier and therefore shared haplotypes will not preferentially occur at internal nodes. An exact test based upon the location (internal/external) of shared haplotypes in each species tree also showed no evidence for significant differences (P>0.05). This approach, although far from conclusive for our data given the number of ambiguities within the species trees, may be a powerful way of detecting introgression in those species with more robust and deeper networks.

Recent studies have demonstrated the importance of unidirectional introgression events in evolution associated with environmental changes (Grant and Grant, 2002). In the A.gambiae complex the interspecific transfer of DNA we observed at mitochondrial loci is likely occur at nuclear loci (Besansky et al, 2003) but it is unknown if this process is important for the acquisition of selectively advantageous genes. However, the evidence for introgression in natural populations and, in particular, from A. arabiensis into A. gambiae, lends credence to the hypothesis of Powell et al (1999) that some selectively advantageous genes may have moved from A. arabiensis into A. gambiae. Furthermore, while A. gambiae has been thought to be undergoing incipient speciation in certain locations the converse may actually be occurring and the M and S forms and A. arabiensis may be converging to form a hybrid swarm. This is of particular concern given widespread insecticide resistance and possibilities of transgenic release in these highly pernicious malaria vectors.