Introduction

The genetic relationship between the two most important malaria vectors in the Anopheles gambiae complex, An. gambiae and An. arabiensis, has been examined on the basis of inversion karyotypes (Coluzzi et al., 1985), allozymes (Miles, 1978; Collins et al., 1988) and ribosomal DNA (Collins et al., 1987). Although fixed inversions may be used to differentiate these species reliably, other methods have yielded results that are consistent with a relatively recent divergence of the two lineages. In the present study, we conducted a population genetic analysis using short tandem repeats (STRs) to compare sympatric An. gambiae and An. arabiensis populations from coastal Kenya.

STRs or microsatellites are relatively short (<100 bp) tracts of tandemly repeated DNA with repeat lengths of 6 bp or less (Beckman & Weber, 1992). Their exceptionally high mutation rates make them informative for working out relationships among closely related species as well as among subpopulations of a single species (Bowcock et al., 1994). In addition, they are co-dominant and relatively easy to score. A survey of STR loci in An. gambiae (Lanzaro et al., 1995) demonstrated that such loci are superior to allozymes for population genetic studies, because they revealed greater variability. Similar conclusions were drawn by Estoup et al. (1995), who observed a high level of divergence between populations within lineages of the honey bee Apis mellifera using microsatellites but virtually none using allozymes. In contrast, differentiation between An. gambiae populations from Kenya and Senegal measured by STRs was not higher than that measured by allozymes (Lehmann et al., 1996).

In the present study, genetic variability is compared in An. gambiae and An. arabiensis, with two specific goals: first, to determine the overall level of genetic divergence between these species, as measured using this novel category of genetic marker; and, secondly, to determine whether the location of loci in the mosquito genome, i.e. chromosomal location or location within paracentric inversions, affects the level of differentiation measured. Pairs of chromosomes may carry different chromosomal inversion arrangements, referred to as standard or inverted forms. Paracentric inversions are said to be fixed in a species if only one of the arrangements is found in that species and polymorphic if both arrangements may be found. We suspect that markers with greater power to distinguish these closely related species may be more useful for future studies of intraspecific differentiation. Such markers are less likely to have constraints on allele size (Deka et al., 1994).

Materials and methods

Specimen collection and microsatellite analysis

Mosquitoes were collected using the pyrethrum spray collection method on 16–17 February 1996 from Paziani village, 20 km south of Malindi town (3°10′S; 40°7′E) in Kilifi District, coastal Kenya. Sixteen An. gambiae and 16 An. arabiensis specimens were collected over a period of 3 days. Genomic DNA was extracted from mosquito abdomens by the alcohol precipitation method, described by Collins et al. (1987), and identified as either An. gambiae or An. arabiensis by the method of Scott et al. (1993). Microsatellite amplification was performed using standard polymerase chain reaction (PCR) in 8-μL reaction volumes run in a Perkin-Elmer 9600 Cetus Gene Amp thermocycler with primers obtained from Zheng et al. (1996). The thermal cycling conditions were an initial hold at 94°C for 5 min, followed by 30 cycles of 94°C for 15 s, 55°C for 20 s and 72°C for 30 s and a final extension at 72°C for 5 min. Analysis of the PCR product was performed using the high-resolution horizontal polyacrylamide gel electrophoresis technique described by Budowle & Allen (1991). Bands were visualized after silver staining (Cairns & Murray, 1994). A total of 30 microsatellite loci were analysed Table 1.

Table 1 Location of loci and FST/RST values for comparisons between Anopheles gambiae and An. arabiensis

Data analysis

Conformance to Hardy–Weinberg expectations was tested by exact goodness-of-fit tests using GENEPOP 1.2 (Raymond & Rousset, 1995). The sequential Bonferroni procedure (Holms, 1979) and the binomial probability test were used to evaluate the overall significance of the values obtained. F-statistics were calculated using the method of Weir & Cockerham (1984), whereas RST was calculated according to Slatkin (1995). Differences between FST/RST values for different loci combinations were tested using the Wilcoxon rank sum test and the median test. Migration indices (Nm) were calculated from FST values by rearranging the formula of Wright (1969), FST=1/(4Nm+1), and from RST by substituting RST for FST in the same formula. A measure of genetic distance between the two populations was obtained using Nei's unbiased genetic distance (Nei, 1978.)

Results

For An. gambiae, all 30 loci were amplified for most of the 16 specimens. Fourteen loci were amplified for all 16 specimens, eight loci were amplified in 15 out of 16 specimens and the remaining eight loci were amplified in at least 10 specimens. In contrast, 25 out of 30 loci were amplified for An. arabiensis in most specimens, whereas five loci (37, 1002, 503, 19 and 815) could not be amplified satisfactorily even after two further attempts.

Allele frequency distribution

Loci were considered polymorphic if they had more than one allele present. Conversely, a monomorphic locus was one that had only one allele. Anopheles arabiensis was polymorphic at 92% of the loci that were amplified and An. gambiae at 93.33% of all loci. Locus 9c was monomorphic in both species, whereas loci 127 and 91 were monomorphic in An. arabiensis and An. gambiae, respectively.

Allele frequencies were calculated from genotypes scored for each of the 32 mosquitoes studied. Allele frequency distribution between the two sibling species varied from locus to locus, with some loci having similar distributions and others very varied patterns. Fig. 1

Fig. 1
figure 1

Allele frequency distribution in the Anopheles gambiae and An. arabiensis populations studied. Alleles increase in size from allele ‘A’ to allele ‘I’. The actual size of the alleles was not determined. DNA from the specimens can be made available to interested individuals.

Conformance to Hardy–Weinberg equilibrium

Excluding five loci that could not be amplified and two that were monomorphic in An. arabiensis, 23 out of 30 loci were analysed for conformance to Hardy–Weinberg equilibrium. For An. gambiae, two loci were monomorphic, so 28 loci were analysed for the Hardy–Weinberg test as described above. Both the sequential Bonferroni test and the binomial test showed significant deficiencies in heterozygotes in both species. Such deviations indicate violation of one or more assumptions of Hardy–Weinberg.

Deficiency of heterozygotes may be caused by the presence of null alleles. Whether the effect of null alleles skewed the distribution of alleles in these two populations to any significant extent was tested by correlating the FIS, which is a measure of the reduction in heterozygosity, with the number of specimens that could not be amplified. Lack of PCR product was assumed to be caused by the presence of either one null allele in the homozygous form or two different null alleles at a given locus. No significant correlation was found (r=0.29; d.f.=26; P=0.1361), suggesting lack of sufficient evidence to attribute the observed heterozygote deficiencies to the presence of null alleles.

Genetic variability and differentiation

Genetic variability and differentiation measures were calculated based on the 24 loci that were amplified in both species. Locus 9C was monomorphic in both species and was therefore excluded from analyses of differentiation. Genetic variability measures for these loci in each species are presented in Table 2.

Table 2 Genetic variability at the 25 loci studied in Anopheles arabiensis and An. gambiae (standard errors in parentheses)

Locus-specific FST/RST values varied greatly, but there was general agreement between the two indices (Table 1). The mean FST and RST values were higher than their corresponding median values (FST mean=0.249, median=0.136; RST mean=0.197, median=0.153), indicating that, although most values were low, the high values were considerably higher, with a significant influence on the mean. To avoid this bias, median values rather than mean values were analysed. The median FST and RST values were partitioned in two ways: first by the location of loci on the three chromosomes and then by the location inside or outside chromosomal inversions. Among loci used were those located on the X chromosome, which has different fixed inversions in the two species (Coluzzi et al., 1979), and those within inversions 2La, 2Rd and 3Ra. Inversion 2La has been found to be polymorphic in An. gambiae and fixed in An. arabiensis from coastal and western Kenya and from West Africa, whereas inversion 3Ra was polymorphic in An. arabiensis and fixed in An. gambiae from western Kenya and West Africa (Coluzzi et al., 1979; Mosha & Petrarca, 1983; Petrarca & Beier, 1992). Although the 2Rd inversion has not been documented in coastal or western Kenya, it has been shown to be polymorphic in both species in West Africa (Coluzzi et al., (1979). Loci located within chromosomal inversions were first analysed without distinguishing between the fixed and polymorphic inversions and then after this distinction Table 3. An inversion was considered fixed only if both species were fixed for the inversion and polymorphic if at least one of the species was polymorphic. We tested whether differences between loci located on different chromosomes were significant by using the Wilcoxon rank sum test and the median test. Differences between chromosomes were insignificant (based on FST: Wilcoxon's test P=0.1497, median test P=0.0784; based on RST: Wilcoxon's test P=0.1405, median test P=0.2343). However, significant results were obtained for both tests when inversions were considered (P<0.05). Significant differences were observed when loci were compared in three groups — those within fixed inversions, those within polymorphic inversions and those outside inversions — and also when loci were compared in two groups, i.e. those within inversions vs. those outside inversions. Results remained significant irrespective of whether RST or FST values were used. Whether loci on the X chromosome (which contains the fixed inversions) were significantly different from those on chromosomes II and III (which contain some polymorphic inversions) was then examined. Differences were significant with the median test (P=0.0279; Wilcoxon's test P=0.0527). These results suggest that, although the chromosomal location of loci in itself does not affect the level of differentiation measured between these two species, chromosomal inversions are important.

Table 3 Median FST/RST values between the two species of Anopheles for the different loci

When median Nm values were compared, only the Nm value for loci outside the inversion region was greater than 1 (median Nm for loci outside inversion regions based on FST=1.5 and based on RST=2.3). This suggests a significant level of exchange of genes, because genetic differentiation results from genetic drift only if Nm<1 but not if Nm1 (Slatkin, 1987). Genetic drift would be expected to have a stronger effect on loci located on the X chromosome because of the smaller Ne of these loci (3/4) associated with a single copy of this chromosome in males. We tested whether the differences in levels of differentiation are attributable to inversions and not to the smaller Ne of X-linked loci by adjusting Nm values for loci on the X chromosome upwards by a factor of 1/4. When loci within inversions were compared with those outside inversions after this adjustment, differences were significant for RST with both the Wilcoxon and median tests and for FST with the Wilcoxon test. It is therefore concluded that differences in gene flow are not attributable to differences in Ne.

Discussion

Anopheles gambiae and An. arabiensis are two of the morphologically similar six members of the An. gambiae complex. The use of microsatellite loci across species or sibling species depends on the conservation of priming sites within flanking sequences to enable amplification and the maintenance of repeated arrays long enough to promote polymorphism (Weber et al., 1990). STRs have previously been shown to be conserved in closely related mammalian species. Bowcock et al. (1994), for example, found that many human microsatellites are also present in the great apes. Even in more divergent families, such as primates and rodents, persistence of STR arrays has been reported (Stallings et al., 1991). But the proportion of loci developed for a certain species that can amplify in another decreases rapidly with increasing evolutionary divergence (Irwin et al., 1991). In the present study, 83.8% of the loci that were amplified in An. gambiae were also amplified in An. arabiensis, providing a strong indication that the two species are closely related. The very similar percentage of polymorphic loci (93.3% in An. gambiae and 92% in An. arabiensis) further supports this notion.

Allele frequencies at several loci did not conform to Hardy–Weinberg expectations. The number of loci that did not conform was greater than that expected by chance alone, indicating violation of some Hardy–Weinberg equilibrium assumptions. Lack of conformance was associated with heterozygote deficiencies, indicating excess homozygosity. Such excess homozygosity could be caused by the Wahlund effect, i.e. sampling individuals from different demes. As mosquitoes were collected from the same village during the same time period, this seems unlikely. Lehmann et al. (1996) have suggested that the minimum distance associated with a deme is more than 50 km in diameter. Another possible cause of excess homozygosity is the presence of null alleles (Chakraborty et al., 1992). Although FIS values were not significantly correlated with the number of specimens that did not amplify, we believe that null alleles may have contributed to heterozygote deficiency, and our test did not detect this because of low statistical power associated with small sample sizes.

Genetic distance between any two species or sibling species is a measure of the genetic differences that have accumulated in the two groups since their divergence. Nei's unbiased genetic distance between An. gambiae and An. arabiensis in this study was 0.202. This value is similar to that published for the two sibling species using allozymes (Nei's D=0.15; Cianchi et al., 1983) and is consistent with published values for sibling species in other insect groups, which often have Nei genetic distances between 0.142 and 0.437 (Brussard et al., 1985).

The low estimate of Nm between the two species based on all loci studied signifies extremely low or no interchange of genes. Indeed, in wild sympatric populations of An. gambiae complex mosquitoes, interbreeding rarely occurs, with between 0.1% and 0.2% hybrids being found in nature (White, 1971). This is thought to be caused by the existence of some form of physiological or behavioural barrier preventing cross-mating. Laboratory crossings between the two species produce sterile hybrid males (Davidson et al., 1967), suggesting partial reproductive isolation.

In enzyme electrophoresis studies, Gillespie (1991) found that certain groups of loci are highly polymorphic, whereas others are almost always monomorphic. It therefore follows that using loci from one group may reflect the genome poorly and may give a less accurate level of differentiation. Our data concur with this notion because FST/RST values obtained in the present study span a very wide range (FST 0.00–0.87; RST 0.00–0.73), indicating that some microsatellite loci are better at differentiating between these two species than others. Our results suggest that loci within chromosomal inversions have greater power to distinguish between these species than loci outside the inversion regions. Chromosomal polymorphism caused by paracentric inversions is common in An. gambiae and An. arabiensis (Coluzzi et al., 1979). Because chromosomal inversions consist of gene sequences protected from recombination, they may preserve certain gene associations. It is thought that such genes may be subject to different evolutionary processes, leading to the development of isolation between populations carrying different arrangements. The present study supports this line of thought, because higher levels of differentiation were observed for loci within inversions compared with loci outside the inversions. We found no evidence of gene flow between the two species based on loci within inversions, suggesting that the two species may not be very closely related. Such an inference was made by Coluzzi et al. (1979) based on chromosomal inversion studies. However, recent results based on DNA sequence analysis indicate that these two species are much closer and suggest that gene flow does occur between them (Besansky et al., 1994).

Loci outside inversion regions suggested a significant level of gene flow between the two species. The observed gene flow may be that which occurred between the two species in recent evolutionary times because of common descent, or continuing gene flow/introgression may play an important part in the evolution of these species. Such possibilities may be used to explain the similar and even identical paracentric inversions between these species (Coluzzi et al., 1979).

Our data suggest that the location of loci within paracentric inversions influences the level of differentiation they can measure, at least for interspecific studies of An. gambiae and An. arabiensis. As no karyotype information on the specimens studied was available, we were unable to tie our conclusion to particular inversions. Nevertheless, these results have serious implications on the selection of loci for genetic differentiation studies of An. gambiae populations. Loci within inversion regions may give different, even conflicting, results from loci outside inversion regions.