Introduction

Due largely to its highly anthropophilic nature, Anopheles gambiae is the most important vector of malaria in sub-Saharan Africa. An understanding of the genetic structure of vector populations is required for the proposed release of transgenics (mosquitoes genetically altered with genes that confer malaria refractoriness) (Collins & Besansky, 1994; Curtis, 1994; Ashburner et al., 1998). Moreover, gene flow is a useful indicator of direction and rates of dispersal among populations (Slatkin, 1987) and may be useful in designing and evaluating mosquito control strategies (reviewed in Coluzzi et al., 1979; Tabachnick & Black, 1995).

Studies of the genetic structure of A. gambiae have reached conflicting conclusions. Analyses of mitochondrial DNA (mtDNA) sequences (Besansky et al., 1997), allozymes (Lehmann et al., 1996) and microsatellite markers (Lehmann et al., 1996; Kamau et al., 1999) suggest extensive gene flow across Africa in A. gambiae collected at least 5000 km apart. In contrast, a complex differentiation of paracentric inversions on chromosome II exists in West Africa, such that five chromosomal forms are recognized: Savanna, Forest, Bamako, Mopti and Bissau (Bryan et al., 1982; Coluzzi et al., 1985; Toure et al., 1998). The Savanna form occurs throughout much of sub-Saharan Africa and is therefore the most widespread form. The Forest form refers to forest-breeding A. gambiae, whereas Bissau occurs only in The Gambia and Senegal. Bamako occurs in Mali and northern Guinea, and Mopti is found in Mali, Ivory Coast, Guinea and Burkina Faso. In parts of Mali where the Mopti, Savanna and Bamako forms exist in sympatry, hybrids are found at low frequencies (Bamako × Savanna, Mopti × Savanna) or not at all (Mopti × Bamako) (Bryan et al., 1982; Coluzzi et al., 1985; Toure et al., 1998). Recent studies (Black & Lanzaro, 2001; della Torre et al., 2001; Favia et al., 2001; Gentile et al., 2001; Mukabayire et al., 2001) showed that ribosomal DNA spacers (intergenic spacer and internal transcribed spacers) distinguish two major groups of A. gambiae across Africa, the S and M molecular forms. These studies suggest at least partial barriers to gene flow between the two forms in the Ivory Coast and other West African countries to the north and west, whereas introgression occurs between them in the Benin Republic and countries to the east. Nigeria is located to the east of the Benin Republic and therefore introgression is very likely to occur between our samples.

There are few studies of the population genetics of A. gambiae in Nigeria. Coluzzi et al. (1979, 1985) demonstrated a north to south cline in the frequency of some inversions on chromosome II in A. gambiae across the ecological zones of Nigeria (arid savanna zones in the north gradually turn into humid forest zones in the south. To avoid confusion with the chromosomal forms above, we use the terms `savanna zones' and `forest zones' to refer to ecological zones). The most significant differences in the frequencies of inversions were between the northern and southern extremes of the country. Thus, A. gambiae in Nigeria consists of two chromosomal forms – the Savanna form in the north and the Forest form in the south – between which there appears to be normal intergradation (Coluzzi et al., 1979, 1985).

The magnitude of gene flow across ecological zones is unclear from Coluzzi et al. (1979). Chromosome inversions are probably poor indicators of gene flow because they are not selectively neutral. The distribution of inversions across Nigeria suggests that gene flow is restricted by geographical distance, that is, isolation by distance, because the largest disparities in inversion frequencies were between the extremes of the country (Coluzzi et al., 1979). Alternatively, assuming the island model (Wright, 1931), i.e. each geographical locale exchanges migrants with each of the other localities at an equal rate, the correlation of some inversions, but not others, with ecological zones (Coluzzi et al., 1979, 1985) suggests that selection counters extensive gene flow across the country. Thus, parts of the genome that are located within inversions, especially on chromosome II, might be expected to measure higher levels of differentiation than those that are located outside inversions (Lanzaro et al., 1998; Black & Lanzaro, 2001). If both isolation by distance and selection restrict gene flow, loci within inversions should detect higher levels of differentiation than those outside inversions, but loci located outside inversions should measure significant differentiation that increases with geographical distance.

The objective of this study was to determine the extent of gene flow across the ecological zones of Nigeria in A. gambiae sampled from eight localities (Fig. 1) spanning a distance of 90–833 km. Genetic differentiation was estimated by employing 10 microsatellite loci, short blocks of tandemly repeated simple DNA sequences of length 1–5 bp (reviewed in Bruford & Wayne, 1993). To measure the effect of selection on genetic structure, five of the loci employed are located within inversions, whereas the other five loci occur outside inversions.

Fig. 1
figure 1

Sampling sites of Anopheles gambiae across Nigeria. Numbers in parentheses indicate number of pools sampled per locality.

Materials and methods

Study area

As one travels from north to south across Nigeria, mean annual rainfall increases, the number of dry season months decreases, and the vegetation becomes taller and more dense (Davies, 1977). The country is therefore divided into seven major ecological zones (Fig. 1) (Coluzzi et al., 1979), such that arid savanna zones in northern Nigeria gradually turn into humid forest zones in the south. Mean annual rainfall ranges from less than 500 mm in the Sudan and Sahel savanna zones to more than 2500 mm in the forests, so that mosquito breeding is seasonal in the north, but is subcontinuous in the south (Coluzzi et al., 1979).

Mosquito collection

Mosquitoes were collected from eight localities across Nigeria (Fig. 1) during the rainy season, between June and August 1997 and between May and June 1999. Samples from Kwenev, Okigwe and Lagos were collected in 1999, whereas the other localities (Fig. 1) were sampled in 1997. Giwa, Lafia, Bida and Benin were each sampled over 2 or more days, whereas all other localities were each sampled in 1 day.

In each locality, larvae of all available instars, or pupae, or both were collected from shallow, temporary, sunlit pools within a radius of approximately 1 km. Samples were collected from at least 11 pools per locality (Fig. 1), but sample size per pool was not determined and collections from all pools in each locality were mixed. At least 200 larvae were collected in each locality and larvae were reared on location in paper cups to the adult stage because they were more easily distinguished as A. gambiae sensu lato by the keys of Gillies & Coetzee (1987). Rearing success was greater than 95%, and emerging adults and occasional dead larvae or pupae were fixed in 95% ethanol. Specimens were then hand-carried to the University of Vermont. Due to logistic difficulties, samples were not collected from the Sahel savanna and Mangrove forest (Fig. 1).

Species identification

Samples for identification to species were selected by random draw and DNA was then extracted from whole specimens according to Collins et al. (1987). Because A. gambiae and A. arabiensis occur in sympatry in many localities across Nigeria (Coluzzi et al., 1979; D. Y. Onyabe & J. E. Conn, unpublished data), each specimen was identified to species by the polymerase chain reaction (PCR) method of Scott et al. (1993). Only samples identified as A. gambiae were included in this study.

Microsatellite genotyping

Ten microsatellite loci (Table 1) were selected from the published genomic map of A. gambiae (Zheng et al., 1996). Location of loci in the genome (Table 1) was determined from genomic (Zheng et al., 1996) and cytogenetic maps (Coluzzi & Sabatini, 1967). At least 39 specimens per locality (Table 2) were genotyped at each locus using the primers of Zheng et al. (1996) by PCR amplification in a PTC-100 thermal cycler (MJ Research, Waltham, MA). The forward primer of each pair was labelled with a fluorescent dye (6FAM, HEX, or TET, Applied Biosystems, Foster City, CA). The reaction mixture contained approximately 1/100th of the genomic DNA of each specimen, 10× PCR buffer, 0.2 mM of each dNTP, 10 pMols per primer, 1.5 mM MgCl2, and 0.5 units DNA polymerase (Amersham Pharmacia Biotech, Piscataway, NJ). DNA amplification was conducted through 30 cycles as follows: initial denaturation for 5 min at 94°C, primer annealing for 30 s at 55°C, and extension for 20 s at 72°C. In subsequent cycles, denaturation was for 20 s at 94°C, whereas the final extension step was for 5 min. Each PCR reaction (1.5 μL aliquot) was prepared for electrophoresis by mixing with 2.4 μL formamide, 0.5 μL blue dextran and 0.6 μL GeneScan-350 (TAMRA) size standard (Applied Biosystems, Foster City, CA). Electrophoresis was performed using 1.3 μL of the final mixture in a 4.5% acrylamide gel on an ABI Prism 377 DNA sequencer (Applied Biosystems). Gels were analysed and project and sample files were generated using ABI GENESCAN software, version 3.1. (Applied Biosystems). Subsequently, allele sizes were scored using ABI GENOTYPER software version 2.0 (Applied Biosystems).

Table 1 Microsatellite loci studied in Anopheles gambiae
Table 2 Genetic variability in Anopheles gambiae from eight localities across Nigeria (a) Sokoto, Giwa, Lafia and Kwenev (b) Bida, Okigwe, Benin and Lagos

Statistical analyses

All analyses, except where noted, were performed using ARLEQUIN software version 2.000 (Schneider et al., 2000). The genetic diversity indices, observed and expected heterozygosity (HO and HE, respectively), and number and frequency of alleles were calculated per locus for each locality. The HE data were first tested for normality by the method of Shapiro & Wilk (1965) and non-normally distributed data were then square-root arcsine-transformed to better approximate a normal distribution (Zar, 1996). The significance of differences among localities in number of alleles per locus and in HE was tested in separate nested analyses of variance (ANOVA) performed in JMP software, version 3.1 (SAS Institute Inc., Cary, NC). The factors in the ANOVA were: (1) ecological zone; (2) localities nested within ecological zones; (3) location of loci with respect to inversions; (4) loci nested within or outside inversions; and (5) interaction between ecological zones and location of loci with respect to inversions.

Each locus was tested separately for departures from Hardy–Weinberg equilibrium (HWE) using the Markov chain algorithm of Guo & Thompson (1992). The number of steps in the Markov chain was 100 000 and the number of dememorization steps was 10 000. The proportion of observed heterozygote deficiencies (D) and the frequencies of null alleles (r) that caused the deficiencies were estimated by the method of Chakraborty et al. (1992). The following expressions were used: D=HE – HO/HE and r=(HE – HO)/(HE + HO).

The significance of pairwise linkage disequilibrium was determined by the exact test, which also employed the Markov chain described above. Significance levels were adjusted using the sequential Bonferroni method to account for multiple comparisons (Holm, 1979) in tests of HWE and linkage disequilibrium.

Two indices of genetic differentiation were estimated between the eight localities as follows: (1) FST (Weir & Cockerham, 1984); and (2) RST (Slatkin, 1995). FST was calculated based on the absolute frequencies of alleles, whereas RST was estimated from the sum of squared number of repeat differences (Slatkin, 1995). (The number of repeats for alleles at each locus was determined by comparison with the sequence of one allele from each locus or with published data (Zheng et al., 1996). The two indices assume different mutation models: FST assumes the infinite alleles model (IAM), whereas RST assumes that alleles evolve according to the stepwise mutation model (SMM), that there are no constraints on allele size and that the mutation rate is similar across alleles (Slatkin, 1995). The significance of FST and RST was determined by permuting genotypes between localities. The number of migrants per population per generation (Nm) between localities was estimated from pairwise FST and RST (Slatkin, 1995).

Isolation by distance, the correlation between genetic and geographical distance, was tested by the regression of FST/1 – FST on the natural logarithm (ln) of straight-line geographical distance (Rousset, 1997). A similar regression was performed using RST instead of FST. Straight-line geographical distances were determined by using Microsoft Encarta 97 World Atlas.

Results

Distribution of alleles

North to south clines in allele frequencies occurred at three loci, AG2H26, AG2H79 and AG2H637 (Fig. 2a,b,c), all of which are located within inversions on chromosome II. At the other seven loci, the modal allele was the same (AG2H175, AG2H523, AG3H88 and AG3H577) or nearly so (AG3H249, AG3H128 and AGXH99) across localities (data not shown).

Fig. 2
figure 2figure 2figure 2

Distribution and frequencies of alleles at three microsatellite loci in Anopheles gambiae across Nigeria. Notice the different scales on the y axis between zones in each figure. Fig. 2(a), AG2H26; (b), AG2H79; (c), AG2H637.

The range of mean number of alleles per locus per locality was 10.9–12.7 (Table 2). There was no significant difference in number of alleles per locus among localities within ecological zones (F3,59=0.791, P=0.503) or among ecological zones (F4,59=1.012, P=0.408).

Mean HE per locus per locality ranged from 0.74 to 0.79 (Table 2) and it did not differ significantly among localities within ecological zones (F3,59=0.728, P=0.538) or among ecological zones (F4,59=0.705, P=0.591).

Hardy–Weinberg equilibrium and linkage disequilibrium

A comparison of HE and HO is shown in Table 2. Thirteen of 70 tests showed significant departures (P < 0.05) from HWE after sequential Bonferroni correction. Two loci, AG2H523 and AG3H88, accounted for 10 of the 13 significant deficits of heterozygotes. Heterozygote deficiency at microsatellite loci is usually attributed to null alleles, which are caused by mutations in the primer-binding site. The estimated frequency of null alleles (r) is shown in Table 2. At least one specimen per locality failed to yield PCR products at AG2H523 and AG3H88 after two or three attempts. Yet, the same specimens were successfully amplified at other loci. This strongly suggests the existence of null alleles at the two loci above.

Eight of 360 tests for linkage disequilibrium were significant (P < 0.05) after sequential Bonferroni correction. None of the tests were significant in Giwa and Bida, whereas all other localities had at least one significant pair. None of the eight significant pairs occurred in more than one locality; physical linkage is thus an unlikely explanation for the significant results, but the Wahlund effect or assortative mating cannot be ruled out.

Genetic differentiation

Over all loci, genetic differentiation was significant (P < 0.05), especially in comparisons between localities in the savanna vs. forest zones – range of FST 0.028–0.087 and RST 0.014–0.100 (Tables 3 and 4). In contrast, estimates of genetic differentiation were usually lower within both forest and savanna zones (Tables 3 and 4). In comparisons between localities in the savanna zones, FST values ranged from 0.000 to 0.048 and none of the RST values was significant (Tables 3 and 4).

Table 3 Genetic distance, FST, and Nm (in parentheses) between Anopheles gambiae across Nigeria. Values below the diagonal were calculated over 10 loci and those above the diagonal were calculated over seven loci
Table 4 Genetic distance, RST, and Nm (in parentheses) between Anopheles gambiae across Nigeria. Values below the diagonal were calculated over 10 loci and those above the diagonal were calculated over seven loci

Single-locus comparisons between all localities revealed that genetic distances were generally highest and Nm lowest at three loci: AG2H26, AG2H79 and AG2H637 (data not shown). When these three loci were removed from the data-set, genetic distances calculated over the seven remaining loci were reduced to very low or insignificant levels even between the localities in forest and savanna zones (Tables 3 and 4). The range of FST values was 0.001–0.024 and all RST values were no longer significant (P > 0.05) in any comparison. These observations indicate that the three loci above were responsible for nearly all of the genetic differentiation.

Isolation by distance

Tests of isolation by distance gave seemingly equivocal results. The test based on FST/(1 – FST) was highly significant (R2=0.51, P=0.0001) (Fig. 3A), whereas the RST-based test was not significant (R2=0.002, P=0.798) (Fig. 3B). Following Bossart & Prowell (1998), the FST-based test was further analysed to determine if the significant result was due to the influence of one or a few localities. The analysis was performed by removing, in a stepwise procedure, all the data associated with each locality or ecological zone and then repeating the test. The test was no longer significant when both Benin and Lagos were removed from the data-set (R2=0.29, P=0.11). Moreover, the analysis performed on data derived from FST calculated over seven loci (excluding AG2H26, AG2H79 and AG2H637) was not significant (R2=0.02, P=0.440) (Fig. 3A).

Fig. 3
figure 3

Isolation by distance in Anopheles gambiae across Nigeria based on (a) FST calculated over 10 loci (bold line) and seven loci (dotted line); and (b) RST calculated over 10 loci.

Discussion

Three loci, AG2H26, AG2H79 and AG2H637 were responsible for nearly all of the genetic differentiation in this study. AG2H26 and AG2H637 are located within inversions 2Rb and 2La, respectively; like the microsatellite loci, the frequencies of these inversions vary clinally from north to south in Nigeria (Coluzzi et al., 1979, 1985). AG2H79 is located within inversion 2Ra (Coluzzi & Sabatini, 1967), the frequency of which varies clinally across Nigeria in A. arabiensis, but not in A. gambiae (Coluzzi et al., 1979). Removal of the three loci from the data-set resulted in low or insignificant estimates of differentiation even between localities in savanna and forest zones. The preceding observation suggests that gene flow is extensive across the country but that selection on genes located within some inversions on chromosome II counters the homogenizing effects of gene flow. It is likely that the three microsatellite loci above merely hitch-hike on nearby genes that are under selective pressure. It is unlikely that entire inversions are the units of selection because AG2H637 and AG2H523 are both located within inversion 2La, yet the modal allele at AG2H523 was the same across localities. Local selection, which probably results in adaptation to the ecological zones (Coluzzi et al., 1979), can result in differentiation by reducing survival and fecundity in immigrants. If, for example, an immigrant does not carry a particular inversion, it may experience reduced survival and reproduction. The extensive genetic exchange measured by parts of the genome that are located outside inversions suggests that migrants survive and reproduce. Furthermore, there is probably recombination among regions outside inversions, such that inversion heterozygous offspring give rise to a mixture of gametes. However, only zygotes that possess the inversion survive and reproduce. The present study is in agreement with Lanzaro et al. (1998), who concluded that selection on genes located on chromosome II, but not on the other chromosomes, is responsible for genetic differentiation between the Bamako and Mopti chromosomal forms in Mali.

Removal of AG2H26, AG2H79 and AG2H637 from the data-set significantly reduced the genetic distances in some comparisons within ecological zones (Benin × Lagos in the rain forest and Bida × Lafia in the southern Guinea savanna). These observations suggest that selection is not simply zone-dependent. The correlation of some inversions with indoor vs. outdoor biting and resting behaviour (Coluzzi et al., 1979) and with higher infection rates with malaria parasites (Petrarca & Beier, 1992) suggests multiple roles for selection.

Over all loci, genetic differentiation was low among localities within each of the savanna and forest zones, although further sampling within the forest zone is required. The largest genetic distances were measured between localities in the savanna and forest zones, so that Nm ranged from 2.2 to 16.4 (from both FST and RST). This level of gene flow exceeds the threshold (Nm < 1) at which substantial differentiation by genetic drift may accrue (Slatkin, 1987). The findings above are consistent with chromosome inversion data from Nigeria (Coluzzi et al., 1979, 1985): A. gambiae samples from forest zones (Forest chromosomal form) are virtually uniform for the standard arrangement on chromosome II (but see della Torre et al., 2001), whereas samples from savanna zones (Savanna chromosomal form) show a diversity of floating inversions on the same chromosome. However, forest and savanna samples do intergrade. The high levels of gene flow over all loci in this study among savanna localities (Nm at least 4.9) is also consistent with the observation of extensive gene flow across Africa within the Savanna chromosomal form (Lehmann et al., 1996).

Although tests of isolation by distance gave seemingly equivocal results, geographical distance does not appear to limit gene flow in A. gambiae across Nigeria (Fig. 3). If, in addition to selection, isolation by distance limits gene flow, parts of the genome that are not under selective pressure would be expected to measure significant levels of differentiation that increase with geographical distance. The data show instead that differentiation estimated over seven of 10 loci was low or insignificant and was not correlated with geographical distance.

It is not clear what strategy will be employed for releasing transgenic mosquitoes. Assuming a transposable element is found that is capable of germ line transformation (reviewed in Curtis, 1994; Ashburner et al., 1998), the current study suggests that its spread throughout the A. gambiae genome will be rapid as long as the insertion event is not biased towards parts of the genome that are located within inversions. Similarly, if a stable transformed mosquito (reviewed in Curtis, 1994; Ashburner et al., 1998) is released, for example, the present study suggests that the spread of the transgene will be rapid, provided that it is located outside an inversion.