Introduction

Globally, there are approximately 273 million annual malaria cases, and greater than 2.1 billion people are at risk of malaria (WHO/TDR, 2004). There are an estimated 960 000 cases of malaria reported in the Americas, and approximately 41% occur in Brazil (99% in the Amazon region) (PAHO, 2002). One of the factors determining the degree of malaria endemicity in a susceptible geographic region is the species of mosquito vector present. Vector species and population differences within species influence biting times, feeding and resting sites, and anthropophily (Lounibos and Conn, 2000), and these behaviors determine human mosquito contact. The factors that affect the vector's capacity to transmit the plasmodium parasite vary with species and population, and include mosquito abundance, infection rate, anthropophily, and longevity (Foster and Walker, 2002).

In the Neotropics, relative abundances of important malaria vectors have changed temporally, such as Anopheles albitarsis in southeastern Brazil (Forattini et al, 1993), Anopheles marajoara in northern Amazonian Brazil (Conn et al, 2002; Lehr, 2003) and A. darlingi in western Amazonian Brazil (Soares Gil et al, 2003) and in the city of Belém (Póvoa et al, 2003). Anopheles albitarsis has emerged in southeastern Brazil possibly due to the development of irrigated land (Forattini et al, 1993). Anopheles darlingi's recent resurgence, in places such as Iquitos, Peru has been linked to increased malaria cases (Aramburu Guarda et al, 1999; Schoeler et al, 2003). The resurgence of A. darlingi is proposed to be a result of human migration and land use changes, which often result in invasion of its primary breeding sites along warm lowland rivers and subsequent increased abundance (Charlwood, 1996; Conn et al, 1999). Such anthropogenic changes highlight important aspects of targeted malaria control: the possibility of altering locally important vectors, and complicated interactions with human populations (Conn et al, 2002; Póvoa et al, 2003).

Anopheles darlingi is the most important malaria vector in the Amazon Basin (Deane, 1947, 1988). In addition to high rates of infection, A. darlingi is also a good malaria vector owing to its anthropophilic (Charlwood and Alecrim, 1989) and endophagic (feeds indoors) behavior (Lourenço-de-Oliveira et al, 1989); although recently it is thought to have evolved to be more exophilic (rests outdoors) as a result of prolonged residual insecticide use in many regions (Charlwood, 1996; Soares Gil et al, 2003). Anopheles darlingi has an extensive distribution from southern Mexico to southern Brazil (Forattini, 1962). Although there are no documented barriers to gene flow for A. darlingi, rDNA data that detected a fixed insertion/deletion in samples from Belize, but not in those from South America (Manguin et al, 1999), are suggestive. Anopheles albimanus and A. pseudopunctipennis, with distributions similar to A. darlingi, are differentiated between Central and South America (De Merida et al, 1995, 1999; Manguin et al, 1995), possibly due to vicariance (Krzywinski and Besansky, 2003) or barriers to gene flow in Costa Rica and Panama (Molina-Cruz et al, 2004).

Even though A. darlingi is considered a single species (reviewed in Manguin et al, 1999; Lounibos and Conn, 2000), it does exhibit heterogeneity in body size (Charlwood, 1996) and in some genetic markers such as polytene chromosomes (Kreutzer et al, 1972), mtDNA (Freitas-Sibajev et al, 1995; Conn et al, 1999) and rDNA ITS2 sequences (Malafronte et al, 1999). The observed heterogeneity can affect important behavioral determinants of vector efficiency, such as endophily (rests indoors) and possibly dispersal ability (range expansion) (Lounibos and Conn, 2000; Fairley et al, 2002). Within species differentiation can also affect the efficacy of control techniques; for example, variation in biting times can affect usefulness of personal protection measures (Zimmerman and Voorham, 1997).

Owing to their relationship to human populations, many anopheline mosquito species are likely to violate mutation drift equilibrium (MDE) assumptions (Donnelly et al, 2001). One result of this is inaccurate estimates of gene flow, which can confound potential choices of target populations for the introduction of refractory genes. Avise (2000) suggests that closely related species in the same geographic region should have similar demographic histories due to their phylogenetic relatedness. In the northeastern Amazon, Lehr (2003) demonstrated that A. marajoara, a close relative of A. darlingi (Sallum et al, 2000), has undergone a recent population expansion, and is therefore not at MDE.

In the present study, we analyze the population structure of A. darlingi from 19 localities throughout Central and South America, including Brazil, Peru, Belize, Colombia, French Guiana, and Guatemala. Using sequences of the mitochondrial cytochrome oxidase subunit I (COI) gene, we test the hypothesis that A. darlingi will have a similar demographic history to A. marajoara in the northeastern Amazon (Avise, 2000) and determine whether there is a division in the gene pool between Central and South America, as proposed by Krzywinski and Besansky (2003).

Materials and methods

Mosquito collections

Adult A. darlingi were collected outdoors between 19:00 and 21:00 h by human landing catches and identified morphologically using the key of Deane et al (1946). The human landing catch protocol was reviewed and approved by the Institutional Review Board of the New York State Department of Health and by the Biosafety Committee of the Instituto Evandro Chages, Belém, Brazil. Table 1 depicts the 19 localities, longitude and latitude of each site, and number of mosquitoes sequenced. All mosquitoes were maintained in 95% ethanol at −80°C until use. DNA from each specimen used herein has been retained as a frozen voucher at −80°C in the Conn Laboratory.

Table 1 Anopheles darlingi collection localities

DNA extraction and sequencing

DNA was isolated from the head, thorax, or legs using the DNeasy tissue kit, following standard DNeasy Tissue Handbook protocol for isolation of total DNA from animal tissues (Qiagen, CA, USA). A 1300 bp fragment of the COI gene was amplified using the forward primer UEA3 and the reverse primer UEA10 (Lunt et al, 1996). Each individual PCR reaction was performed using a Ready-To-Go-PCR bead (Amersham Pharmacia/Biotech, NJ, USA) and run on a PTC-200 thermal cycler (BioRad, Inc.). The PCR products were cleaned with CentriSpin 40 columns (Princeton Separations, NJ, USA), and sent to the Wadsworth Center Molecular Genetics Core for sequencing. The forward and reverse sequences were aligned using Sequencher 3.0 (Gene Codes Corp, MI, USA), grouped together by site and trimmed in PAUP, version 4.0 (Swofford, 2003), creating a 978 bp fragment of the COI gene. Unique haplotypes were determined using MacClade, version 3.0 (Maddison and Maddison, 1997); identical sequences were considered to be a single haplotype.

Phylogenetic relatedness

The number of mutational steps necessary to link any two haplotypes with 95% confidence level was determined in ParsProb 1.1 (Posada et al, 2000). A minimum spanning network of the A. darlingi haplotypes was created using TCS 1.12 (Clement et al, 2000). Phylogenetic relationships among the haplotypes were estimated with PAUP using maximum likelihood and maximum parsimony (Swofford, 2003), and with Mr Bayes using a Bayesian approach (Rannala and Yang, 1996; Mau and Newton, 1997; Mau et al, 1999). Anopheles albimanus was used as an outgroup (Sallum et al, 2000). Pairwise population estimates of FST were computed using Arlequin 2.01 (Schneider et al, 2000), and the FST values were used as distance measures to create a neighbor-joining (NJ) tree using Mega V2.1 (Kumar et al, 2001).

Historical demography

The haplotype and nucleotide diversities were computed in Arlequin 2.01 (Schneider et al, 2000). Nei's GST was calculated to estimate the population differentiation based on differences in allele frequencies; Nei's NM was used to estimate gene flow based on GST (Nei, 1973). The tests of Tajima (1989) and Fu and Li (1993) were used to test the hypothesis that all mutations are selectively neutral (Kimura, 1983). Tajima's DT (1989) is based on the differences between the number of segregating sites and the average number of nucleotide differences. The D and F tests, proposed by Fu and Li (1993), are based on molecular polymorphism data. Fu's FS test (1997) and Strobeck's S statistic (1987) assess the haplotype structure based on the haplotype frequency distribution, and were used as additional tests of neutrality. These analyses were calculated using DnaSP, version 3 (Rozas and Rozas, 1999). The mismatch distribution (a frequency distribution of the observed number of pairwise sequence differences) was performed to distinguish between a smooth unimodal distribution and a multimodal, or ragged, distribution (Slatkin and Hudson, 1991; Rogers and Harpending, 1992; Rogers, 1995). The raggedness (r) statistic was calculated to quantify the smoothness of the mismatch distribution (Harpending et al, 1993). The Mantel analysis was used to test the null hypothesis of the independence of the geographic and genetic distance by a pairwise matrix of geographical and genetic distances (estimated by FST). The mismatch distribution and Mantel test (Mantel, 1967) were calculated in Arlequin 2.01 (Schneider et al, 2000), and the raggedness test in DnaSP, version 3 (Rozas and Rozas, 1999). Significance of the Mantel test was determined by a permutation test of n=1000.

Results

Genetic variation

Of the 36 unique haplotypes detected, seven were shared (A, B, C, J, T, W, and X); the remainder were unique to a single geographic location (Figure 1). All the sequences were A–T rich (combined frequency of 72.05%), which is expected within Insecta. Sixty-nine transitions and seven transversions were identified, and there were four nonsynonymous mutations. A mutation at position 45 resulted in a methionine present in haplotype V (A. darlingi in ARA) in place of a valine in all other haplotypes. Another southern Amazonian mutation resulted in an isoleucine in place of a valine in haplotype W (A. darlingi in ARA, BEL, and CAP). A nonsilent mutation at position 193 produced a threonine in haplotype N (A. darlingi in NEC) where an alanine is present in all other haplotypes; and a mutation in haplotype Q (A. darlingi in IQ) resulted in an alanine in place of a threonine. There were no nonfunctional genes (ie, pseudogenes) as shown by the absence of stop codons, the prevalence of synonymous substitutions, low pairwise divergence and clear electrophorograms. The two most common haplotypes were T (n=17) in Belize and Guatemala, and B (n=15) in northeastern Amazonian Brazil (Table 2). Localities south of the Amazon region in Brazil have the highest haplotype diversities (ITB, MOJ, PEX, and DOU), and localities just northeast and southeast of the Amazon have the most shared haplotypes within the populations (TAR, LI, ANT, TRP; TAI, BEL, ARA, PEB, and CAP) (Figure 1; Table 2). Populations in close geographic proximity have the greatest quantity of shared haplotypes, and populations that are farther apart do not share haplotypes. Interestingly, localities directly northeast (TAR, LI, ANT, and TRP) and southeast of the Amazon (TAI, BEL, ARA, PEB, MOJ, and CAP) do not share haplotypes, and localities northeast of the Amazon (TAR, LI, ANT, and TRP) only have two unique haplotypes (Table 2). Haplotypes in Peru and Colombia are not shared among any other population, which could be due to their geographic separation from other localities sampled and (or) potential geographic or climatic barriers. In general, these data would seem to support the isolation by distance model (Wright, 1951). Belize and Guatemala had low diversity with only four haplotypes identified from 21 mosquitoes analyzed. Haplotype X is shared among the greatest number of populations in Brazil, followed by B and C (Table 2). Sequences for A. darlingi and A. albimanus used in this study have been deposited in GenBank, accession numbers DQ298209 to DQ298244.

Figure 1
figure 1

Geographic distribution of unique haplotypes in Central and South America. The letters correspond to the haplotype(s) observed for each locality. The circled numbers correspond to the locality in Table 1, positioned on the map according to longitude and latitude of the site. Underlined bold letters depict shared haplotypes, and plain letters depict unique haplotypes.

Table 2 Summary of haplotypes and diversity measures

Phylogeographic relatedness

The minimum spanning network illustrates the mutational relationship of the A. darlingi haplotypes (Figure 2). All haplotypes differed by less than 13 mutational steps, so they could be connected parsimoniously. The Central American haplotypes are separated by seven mutational steps from the Colombian haplotypes (M and N), which are separated by an additional seven mutational steps from a Brazilian haplotype, DA. Many southern Amazonian Brazil haplotypes (X, C, W, V, Y, K, I) differed by only one or two mutational steps, suggestive of a demographic expansion (Slatkin and Hudson, 1991; Fu, 1997). Haplotype X was the most common interior haplotype, so is most likely the oldest haplotype (Castelloe and Templeton, 1994). The majority of haplotypes were tip alleles, and are considered as more recently derived and geographically restricted (Crandall and Templeton, 1993; Castelloe and Templeton, 1994).

Figure 2
figure 2

Parsimony network of the 36 haplotypes. The letters correspond to the haplotypes observed for the 19 localities in Table 2. The solid circles represent a single mutational event.

The phylogenetic relationship among the haplotypes using the maximum-likelihood, maximum-parsimony, and Bayesian analyses were all very poorly resolved and not informative because the sequence variation contains insufficient phylogenetic signal (data not shown).

The FST pairwise estimates of differentiation ranged from 0 to 1, and were used to create the NJ tree. Two primary clusters were found: (I) South America, and, (II) Central America plus NW Colombia (Figure 3). There appears to be a secondary division within cluster I between the southern Amazon and southern South America (IA) and the Northern Amazon (IB). The pairwise comparisons of FST within clusters I and II were 52.5 and 55% significant, respectively, and the FST comparisons between clusters were 100% significant (FST values shown in Table 3) (Donnelly et al, 2004). Additional support for the primary division is found from samples of the conserved nuclear white gene that were cloned and sequenced for over 200 samples of A. darlingi from most of the same localities as in the current study (Mirabello and Conn, unpublished).

Figure 3
figure 3

FST distance-based neighbor-joining tree for Anopheles darlingi from COI mtDNA sequences. The mean pairwise FST values are proportional to the branch lengths (see scale bar). The pairwise FST comparisons within each cluster were significant (P<0.05) for 52.5% of cluster I, and 55% of cluster II. The pairwise estimates of FST were 100% significantly different (P<0.05) when samples from the two clusters were compared.

Table 3 Pairwise FST values between each locality

Historical demography

Nucleotide and haplotype diversities were used as measures of genetic diversity of A. darlingi. Haplotypes B and T had the greatest haplotype frequency (Table 2). ITB, CAP, and PEX had the highest haplotype diversities. ANT, TAI, and CAY had no detectable diversity measures because only a single haplotype was detected in each. Both of the diversity measures were high in ITB, DOU, and IQ. The NJ cluster with the greatest haplotype diversity was IA, and IB had the highest nucleotide diversity (also having the greatest average number of nucleotide differences). Cluster II had low haplotype and nucleotide diversity measures.

Nei's GST and NM (1973) were used to examine the pairwise genetic differentiation and gene flow, respectively, between the population clusters (IA, IB, and II) (Table 4). The highest level of genetic differentiation (GST=0.3109) was detected between Central America (II) and northern Amazon (IB) (Table 4). The estimates of gene flow (NM) were moderate among South America populations and between Amazonian and southern South America and Central America. Only the gene flow estimate between Central America plus NW Colombia (II) and northern Amazon (IB) was below 1. Population comparisons with NM values less than 1 are considered to have no gene flow (Nei, 1973, 1975).

Table 4 Pairwise differentiation (GST, below the diagonal), and gene flow between population clusters (NM, above diagonal)

Tajima's (1989) DT and Fu and Li's (1993) F and D neutrality tests found that cluster I and IA have significant negative D and F values, and nonsignificant negative DT values (Table 5). The results allow rejection of the neutral model in these regions, as a result of two possible factors: (1) a relatively recent population expansion, which can raise the number of low-frequency variants, and (2) natural selection. The neutral model is also rejected in cluster IB, where the DT, D, and F values were all significantly positive (cluster II also had positive values, but nonsignificant), which suggests possible balancing selection or population subdivision (Table 5).

Table 5 Summary statistics for polymorphisms of Anopheles darlingi

Fu's FS test (1997) and Strobeck's S statistic (1987) determined that both cluster I and IA have significantly negative FS values and positive S of 0.995 and 1.000, respectively, indicating possible population expansion. Fu (1997) states that FS is the most powerful test for detecting population expansion and genetic hitchhiking, followed by Tajima's DT, and Fu and Li's D and F tests (Fu and Li, 1993). Cluster IB has significantly positive FS and a low S statistic of 0.0001, indicating possible background selection. Cluster II also had a positive FS and low S-value, but they were not statistically significant.

Through a graphical illustration of the mismatch distribution it is possible to determine if there is a smooth unimodal distribution following the Poisson distribution characteristic of a recent bottleneck or population expansion, or a multimodal distribution indicating a population at MDE. In contrast to the previous demographic history analyses for A. darlingi, the mismatch distribution did not demonstrate the expected unimodal distribution for all the localities together or for either primary or secondary clusters. The observed mismatch distribution for cluster I differed significantly from the simulated distribution of a demographic expansion model (Figure 4). The distribution for all the localities throughout Central and South America showed a multimodal distribution typical of populations at MDE.

Figure 4
figure 4

Mismatch distribution of all the localities in South America (cluster I). The bar graph is the observed mismatch distribution for South America. The line graph is the expected distribution of a recently expanded population. The error bars are 95% confidence interval estimates of the observed values.

The raggedness statistic for clusters I and IB are both very small (0.0174 and 0.0105, respectively), suggesting a population expansion. Cluster IB has a large r-value (0.7395), indicating a population at equilibrium. Time since the population expansion can be estimated from t=τ/2μ, where μ is the mutation rate per site per generation (Slatkin and Hudson, 1991). The Drosophila mutation rate of 10−8/site/year (Powell et al, 1986) and 10 generations/year (Walton et al, 2000) were used in the calculation. The estimate of τ, from the raggedness calculation, is 4.951 for cluster I, and 3.176 for cluster IA. Therefore, the time to expansion for A. darlingi in South America is approximately 253 119 years ago (95% CI, 85 554–402 419), and in Amazonian and southern South America is approximately 162 372 years ago (95% CI, 54 882–258 144). Both expansion times are during the late Pleistocene.

The Mantel analysis to test the null hypothesis of the independence of the geographic and genetic distance between each population was conducted to test for isolation by distance. The correlation was not significant (R2=0.011, P=0.489).

Discussion

In a review of evolutionary studies Donnelly et al (2002) claim that the genetic population structure of primary malaria vectors is shallow with a weak effect of distance on differentiation, and this has been supported by studies of the major African vectors A. gambiae (Lehmann et al, 1996) and A. arabiensis (Donnelly and Townson, 2000). Our analyses detected considerable population structure, and isolation by distance was detected by Conn et al (1999) for A. darlingi in South America. A similar pattern was also found in neotropical vectors A. aquasalis (Fairley et al, 2002) and A. albimanus (De Merida et al, 1999), although in a more recent study in the latter species the effect of distance on differentiation was weak (Molina-Cruz et al, 2004).

In our study of A. darlingi, the inference of range expansion was well supported but isolation by distance was not. Isolation by distance and the low nucleotide diversity observed in Central American populations of A. albimanus and low nucleotide diversity in A. darlingi could be due to small effective migration rates and effective population size, and/or genetic drift (De Merida et al, 1999; Molina-Cruz et al, 2004). The nucleotide diversity estimates for A. gambiae are also low, which is consistent with a large panmictic population (Besansky et al, 1997; Lehmann et al, 1998). These differences may be due to lack of isolation by distance in the African Anopheles, as compared to the Neotropical Anopheles, and their high dispersal ability, large effective population size, and/or recent range expansion (corresponding to the human population expansion during the arrival of agriculture in West Africa) (Lounibos and Conn, 2000; Donnelly et al, 2001, 2002).

The NM estimates between Central America and the northern Amazon indicated little or no recurrent gene flow (NM<1). Contemporary gene flow estimates are important in malaria vectors because they are used to predict the spread of genes, essential information for effective introduction of transgenes for Plasmodium resistance (Cohuet et al, 2005; Tripet et al, 2005). No gene flow combined with high genetic differentiation may be the result of vicariance or obstruction by natural barriers (ie, topography or habitat that a migrating individual cannot cross) existing between Central America and northern Amazon. Anopheles albimanus data suggest that a single barrier exists within Central America, which may be the mountain range that crosses Costa Rica and Western Panama (Molina-Cruz et al, 2004). The phylogeographic break between Central American plus NW Colombia and South American A. darlingi is consistent with studies of other Neotropical taxa, such as Neotropical butterflies (Brower 1994), toads (Slade and Moritz, 1998), bats (Hoffmann and Baker, 2003) and trees (Dick et al, 2003). However, Manguin et al (1999) observed substantial gene flow among populations of A. darlingi throughout its geographic range using four nonmitochondrial markers (isozyme, RAPD-PCR, ITS2, and morphology). These conflicting results may be due to the differences in mitochondrial and nuclear DNA inheritance and evolutionary histories which can affect estimates of gene flow (Presa et al, 2002). Mitochondrial DNA evolves relatively rapidly, it is maternally inherited, and has an extremely low level of recombination as compared to the nuclear genome (Avise, 2000).

Manguin et al (1999) found that A. darlingi from Belize differed from all the South American populations by a three-base deletion in the rDNA ITS2 marker. There is no evidence of clinal variation, and the range of A. darlingi in Central America is narrow (Manguin et al, 1999). Therefore, Manguin et al (1999) suggested the differentiation is either due to a recent introduction event that may have been caused by humans, or an extrinsic factor. One interpretation of our data is a possible introduction event from Colombian A. darlingi populations into Central America. More ancestral and diverse haplotypes were observed in Amazonian and southern Brazil populations, so they are likely older, and Colombian haplotypes are, of the South American samples examined in this study, most closely related to the Central American haplotypes.

In general, older populations have a higher diversity than younger populations (Kambhampati and Rai, 1991; Molina-Cruz et al, 2004). In our study, the older populations with the highest diversity are ITB, CAP and PEX, and cluster IA (Amazonian and southern South America). According to Avise (2000), high haplotype diversity relative to the nucleotide diversity suggests a population bottleneck followed by a rapid population expansion and buildup of mutations; nearly all the populations (except TAI and PEB) of A. darlingi in cluster IA, and all the localities together fit this criterion.

We hypothesized that the Amazonian populations of A. darlingi had undergone a population expansion similar to A. marajoara in northern Brazil (Lehr, 2003), based on regional patterns determined for many organisms reviewed in Avise (2000). The large proportion of shared haplotypes and lack of unique haplotypes in the Amazon region in Brazil, and most of the demographic history analyses support an expansion in Amazonian and southern South America populations of A. darlingi. However, mismatch analysis shows a multimodal pattern characteristic of populations at MDE. Harpending (1994) states that a population expansion that is too recent, for example at the end of the Pleistocene epoch, would not result in a smooth unimodal mismatch distribution. MDE is thought to be achieved when the effective population (Ne) size has stayed constant for 2Ne–4Ne generations (Nei and Li, 1976). A. darlingi population size probably fluctuates due to its dependence on specific climatic and environmental conditions, which have changed dramatically over the last millennia (Prance, 1982; Absy et al, 1991; Cavalli-Sforza et al, 1994; Nicholson, 1995; Donnelly et al, 2001). The estimated expansion time for A. marajoara was much more recent (36 400 years ago in the eastern Amazon; Lehr, 2003) as compared to A. darlingi (estimated time to expansion is >100 000 years). This difference may reflect the cycles of Amazonian savannah contraction and re-expansion, and thus habitat availability owing to the differences in habitat preference between A. marajoara (breeding sites associated with savannah and agricultural habitat; Conn et al, 2002; Lehr, 2003) and A. darlingi (primary breeding sites along warm lowland rivers; Rozendaal, 1990; Roberts et al, 2002).

If A. darlingi, like other primary vectors (eg, A. gambiae) has only recently become a human pest, it would be expected to show a recent population expansion. However, the historical human population expansion in the Amazon region is predicted to have occurred between 10 000 (Roosevelt et al, 1991) and 2500 years BP (Willis et al, 2004), and a rapid decline in indigenous human population size occurred after colonial contact in the 18th–19th centuries (Willis et al, 2004). Brazilian government settlement policies, still in effect today, have caused nonindigenous human populations in many areas of the Amazon in Brazil to expand significantly during the 20th century (Cruz Marques, 1987; Alecrim, 1992; Schmink and Wood, 1992). Our estimates support a very different history for A. darlingi in South America compared with A. gambiae in subSaharan Africa, despite both being primary vectors. Anopheles darlingi being a more opportunistic feeder (Charlwood, 1996) is not nearly as dependent on human blood meals compared with A. gambiae (White, 1974; Coluzzi et al, 1979; Besansky et al, 2004). Therefore its populations are more likely to fluctuate based on climatic conditions (that would influence breeding site availability) compared with A. gambiae's noteworthy dependence on human population densities (Coluzzi, 1982; Donnelly et al, 2001).