Species colonizing new areas disjunct from their original habitat may be subject to novel selection pressures, and exhibit adaptive genetic changes. However, if colonization occurs through a small number of founders, the genetic composition of the colonized population may differ from that of the original population simply due to genetic drift. Any selection will affect particular regions of the genome, whereas demography, migration and the mating system will affect the whole genome in the same way (Cavalli-Sforza, 1966). Although genetic drift will affect all loci in the genome, the consequences for different loci may be different because of stochastic effects. Thus, the problem still is to disentangle the effects of drift due to a small founding population from those of selection after colonization. A neutral microsatellite may show apparent selective effects if it is in linkage disequilibrium (LD) with other loci subject to selection. When populations are exposed to different environments, diversifying selection at some loci can lead to increased levels of differentiation for those loci relative to neutral expectations, whereas spatially uniform or balancing selection affecting particular loci can lead to decreased differentiation relative to neutral expectations.

In recent years, loci affected by selection have been identified using multilocus simulation tests. These methods, which use summary statistics to identify ‘outlier’ loci (those affected by selection or selective sweeps), may be divided into two classes of interlocus comparisons: (1) those based on levels of differentiation between populations, and (2) those based on comparisons of diversity within populations (Storz, 2005). In addition, multiple regression methods may be used to test for associations between allele frequencies (and heterozygosities) and spatial and climatic variables. Although associations with spatial variables could be due to drift and/or migration, any associations with climatic variables after correction for geographic location suggest effects of differential selection in different populations (Endler, 1986).

Here we apply these methods to investigate possible adaptive differences among Drosophila buzzatii populations in Australia. D. buzzatii is a cactophilic species, feeding and breeding in the rotting tissues of Opuntia cactus (prickly pear) species, and is restricted to this niche in Australia (Barker and Mulley, 1976). The Chaco region of Argentina is the presumed center of origin of D. buzzatii (Vilela et al., 1980), and its introduction to Australia was apparently from this region in the period 1931–1936 (Barker, 1982; Barker et al., 1985), during the program for the biological control of prickly pear (Mann, 1970). Cactoblastis cactorum (Berg) (Lepidoptera: Pyralidae), which was introduced from Argentina in 1925 as a potential control agent, proved extremely effective—the larvae destroying plant tissue, leading to rot and death of the plants. By the time of the inadvertent introduction of D. buzzatii, C. cactorum had been released extensively throughout the Opuntia infestation (Dodd, 1940). By 1931, the density of C. cactorum was estimated at 2.5 × 107 larvae per hectare, over the thousands of square kilometers of dense prickly pear (Dodd, 1940). Thus, D. buzzatii then had extremely large areas of suitable habitat, probably spread rapidly to all the Opuntia-infested area, and must have had an enormous population size (Barker, 1982; Sokal et al., 1987). By 1940, as the Opuntia was controlled and its distribution reduced more or less to the habitat islands found today, D. buzzatii also contracted to the spatially isolated populations that still exist. Colonization of this extensive territory in Australia is likely to have been accompanied by genetic adaptation to the new environments.

Our objectives, using 15 microsatellite loci, were to assess genetic variation and population structure, to determine the size of the Australian founding population, and to determine if any of the microsatellite loci mark genome regions showing selective effects. As such effects may depend on loci located within inversions, which are known to vary in frequency among Australian populations (Knibb and Barker, 1988), we mapped the microsatellites used in this study.

Materials and methods

Flies were collected at nine localities in eastern Australia (Table 1; Figure 1) in March–April 2002, by netting from banana baits, and stored in absolute alcohol.

Table 1 Geographical coordinates for the Australian populations (listed from north to south) and genetic variability of each population (s.e. in parentheses)
Figure 1
figure 1

Map of eastern Australia showing collection localities, and the distribution of the main Opuntia infestation in 1920 (hatched area).

DNA extraction and microsatellite analysis

DNA extraction and genotyping methods were as given by Frydenberg et al. (2002). We used the 10 microsatellites identified by Frydenberg et al. (2002) (annealing temperature in parentheses): Db052 (50 °C), Db087 (45 °C), Db122 (55 °C), Db142 (55 °C), Db223 (60 °C), Db225 (58 °C), Db290 (45 °C), Db411 (60 °C), Db493 (60 °C) and Db681 (40 °C), together with 5 additional ones (Table 2): Db003 (60 °C), Db013 (60 °C), Db034 (60 °C), Db090 (52 °C) and Db109 (60 °C). For some analyses, we included data for two populations in Argentina (Otamendi and Catamarca), using the results for the first 10 microsatellites from Frydenberg et al. (2002), together with the 5 additional ones genotyped for this study.

Table 2 Properties of the five new microsatellites

In situ localization of the microsatellite clones

Most of the microsatellite clones were in situ localized to the polytene chromosomes of D. buzzatii. DNA from clones Db052, Db090, Db122, Db142, Db223, Db225, Db290, Db411 and Db493 was labeled using the PCR DIG labeling mix (Roche, Basel, Switzerland) and M13 universal forward and reverse primers. For microsatellite clones Db087 and Db681, similarity searches against the D. mojavensis genome sequence database ( as of 6 December 2004) were carried out using their sequences as a query. The sequence with the highest similarity for each of the two microsatellites was used to design primers and to amplify fragments of 1.5 kb (5′-GCTACATGTGGTACCATAAC-3′; 5′-ATAACATCATGTGGGGTCTG-3′) and 2 kb (5′-GTGGAACATGGAATTGCTAG-3′; 5′-CACGTGGACATCAAGACTGA-3′) including the microsatellite sequences of clones Db681 and Db087, respectively. DNA was amplified using genomic DNA of D. mojavensis (as a control) and D. buzzatii, and the latter PCR product was used as a probe. Fluorescent in situ hybridization was carried out as in González et al. (2000), with detection using anti-digoxigenin antibodies labeled with fluorescein-5-isothiocyanate. Chromosomal localization of the hybridization signals (Figure 2) was determined using the cytological map of D. buzzatii (González et al., 2005). Microsatellites Db003, Db013, Db034 and Db109 were blasted against the D. mojavensis genome sequence, and localized using data on gene localization in D. buzzatii (Ranz et al., 2003).

Figure 2
figure 2

Localization of the microsatellite clones (except Db003 in chromosome 5) in the D. buzzatii polytene chromosomes, with inversion positions also marked. Db003 is located between the genes Obp56g and Toll-7 in scaffold 6496, which corresponds to D. buzzatii chromosome 5. Further localization is not possible, as few markers are available. Microsatellites Db013, Db034 and Db109 are located in scaffold 6540, which corresponds to D. buzzatii chromosome 2. Db013 is in the chromosomal segment delimited by bands A2f-C4h between genes jar and Atpalpha. Db034 is located in the chromosomal segment delimited by bands F1h-F2a between genes amon and fkh. Finally, Db109 is located in the chromosomal segment delimited by bands B2c-B4d between genes ry and Act87E.

Allelic frequency, heterozygosity and linkage disequilibrium

Genotype and allele frequencies were estimated using GENEPOP version 3.4 (Raymond and Rousset, 1995; GENEPOP data file for all 11 populations available as Supplementary Table S1), and alleles per locus and observed and expected heterozygosity (gene diversity) were estimated using GENECLASS2 (Piry et al., 2004). Tests for deviations from Hardy–Weinberg equilibrium were carried out using the exact tests of GENEPOP (default values for the Markov chain method). Significance levels for each test were determined by applying to the probability estimates calculated by GENEPOP, the sequential Bonferroni procedure (Hochberg, 1988; Lessios, 1992) over loci within each population. The number of alleles per locus, and observed and expected heterozygosity were compared among populations with the Kruskal–Wallis nonparametric test (Sokal and Rohlf, 1981). Pair-wise differences among populations for expected heterozygosity were tested using a t-test on arcsine transformed values (Archie, 1985), with the sequential Bonferroni procedure applied over the set of eight comparisons for each population.

Pair-wise linkage disequilibria (D′) between loci within each population were estimated using PowerMarker version 3.25 (Liu and Muse, 2005). Statistical significance was evaluated using the exact tests implemented in PowerMarker, with P-values obtained using both permutation and the Markov chain Monte Carlo approaches. Both approaches gave similar results, and only the former are presented. P-values were not adjusted for multiple comparisons. In addition to mean D′ estimates, we use for each population the percentage of locus pairs that had significant (P<0.05) D′ values. As 5% of pair-wise LD are expected by chance to be significant, higher percentages indicate more LD than would be expected (Schug et al., 2007).

Population differentiation

As one locus in one population showed a significant deviation from Hardy–Weinberg equilibrium, both genotypic and genic differentiation among populations were tested using GENEPOP, for overall and pair-wise differentiation (default values for the Markov chain method). The sequential Bonferroni procedure was applied over population pairs for the latter in determining significance levels. F-statistics (Weir and Cockerham, 1984) and their significance were determined using FSTAT version 2.9.3 (Goudet, 2001), not assuming Hardy–Weinberg equilibrium and with 5000 iterations, and the sequential Bonferroni procedure was applied over loci to determine significance levels. Isolation by distance was tested using estimates of FST (Weir and Cockerham, 1984) for each pair of populations, with pair-wise FST/(1−FST) values regressed on log (geographic distance) between each pair of populations (Rousset, 1997), and the significance of the association was determined using Mantel's (1967) permutation test.

Population history: bottlenecks

On its introduction to Australia, D. buzzatii is presumed to have been subject to a bottleneck (Barker et al., 1985; Halliburton and Barker, 1993), followed by a rapid and massive expansion in population size. Subsequently, populations were reduced to a patchy distribution, but with substantial variation in the number of plants (and rotting tissue) at each locality. The number of founders of the colonized Australian population was estimated using the maximum likelihood method of Ramirez et al. (2006). This method computes the likelihood (L) that a particular combination of alleles is observed in the colonized population as a function of the number of founder gametes (2N), given the allelic frequencies in the source population. Then 2N is estimated as the value that maximizes the likelihood function (L). The variance of the estimate is calculated as the inverse of the amount of information, which is equal to the second derivative of ln L.

The BOTTLENECK (Cornuet and Luikart, 1996) and M (Garza and Williamson, 2001) programs were used to determine if any signal of past bottlenecks could be detected. The BOTTLENECK analysis tests whether there is excess heterozygosity as compared with the heterozygosity expected from the observed number of alleles at each locus, assuming mutation–drift equilibrium. Bottlenecked populations are predicted to show an excess of heterozygosity, as the number of alleles is more severely affected than heterozygosity by a bottleneck in population size. Expected values were determined using the two-phase mutation model (Piry et al., 1999), with model options 80% single-step mutations, a variance among multiple steps of 12, and 5000 iterations. The probability of significant heterozygosity excess was determined using Wilcoxon's signed-rank test.

M is defined as the ratio of the total number of alleles (k) to the overall range in allele size (r). With genetic drift in populations reduced in size, the loss of any allele will reduce k, but only the loss of the largest or smallest allele will reduce r. Thus, M is expected to be smaller in recently reduced populations than in equilibrium populations. The ratio M is estimated for each locus, averaged over loci, and then statistically tested by comparing the estimate to a critical value (Mc), estimated for a specific mutation model using 10 000 replicates. The parameters of the model used here were fraction of mutations larger than single step=0.2, mean size of non-single-step mutations=3.5 and θ=4Neμ (where Ne=effective population size at equilibrium before the bottleneck and μ=mutation rate) equal to 2 (assuming Ne=5000, μ=10−4 or Ne=50 000, μ=10−5). The value assumed for θ is consistent with the mean expected heterozygosity (He) of the Argentine populations. Although Piry et al. (1999) recommend 95% single-step mutations, and Garza and Williamson (2001) recommend 90%, we used 80% as 10 of the 15 microsatellites are interrupted or compound. Although there is uncertainty about the appropriate percentage of single-step mutations, our test is conservative as higher percentages and also smaller values of θ (lower Ne of the pre-bottleneck Argentine population) increase the value of Mc (Garza and Williamson, 2001).

Geographical variation in genetic diversity and allele frequency: Australian populations

For each locus, effects of geographical location and climate on expected heterozygosity and allele frequencies were analyzed. In Australia, D. buzzatii has been collected at 97 localities, and for each of these, climatic variables were estimated using the BIOCLIM program of the ANUCLIM 5.1 package (Houlder et al., 2000). With the position of a locality described by latitude, longitude and elevation, all 35 climatic variables that can be produced by BIOCLIM were estimated for each locality (Barker et al., 2005). Principal component analysis applied to these data for each locality (SAS Institute, 1985) provided a summary of the climatic environment for each. The first four principal components accounted for 93% of the variation, and the scores for the nine localities sampled here were considered for use. But with only nine localities, degrees of freedom for multiple regression analyses were limiting. Following preliminary testing, the model used included latitude, longitude and the first two principal components (which accounted for 75% of the climatic variation), and analyses were carried out using the statistical package R (R Development Core Team, 2003). Again with only nine localities, finding a suitable model is problematical. Box-Cox plots were computed for each dependent variable to check possible transformations, but none was found to be appropriate. Normal QQ plots of residuals were used to check the assumption of normality for the fitted models. For the analysis of allele frequencies at each locus, only the two alleles that were at highest frequency over all localities were used. Thus, a significant effect for only one of these alleles means a reciprocal effect for the pooled remaining alleles. All tests for individual terms were adjusted for other terms in the model (type II tests).

Multilocus simulation tests for selection

Two multilocus selection tests were used, referred to as the Schlötterer and Beaumont tests. The Schlötterer tests (Schlötterer, 2002; Kauer et al., 2003) use test statistics based on the ratio of observed variances in repeat number (of microsatellites) (ln RV test) or expected heterozygosity (ln RH test) in two groups of populations. The rationale for the tests is that positive selection at a locus will lead to a reduction in variability at the selected locus and flanking regions, so that a microsatellite locus linked to a selected locus is expected to have reduced variability compared to neutral expectations (Schlötterer, 2002). We compared the colonized populations in Australia with two populations in Argentina, using the expected heterozygosity (ln RH test) only, as Kauer et al. (2003) show the ln RH test to have higher power to detect selected loci than the ratio of observed variances in repeat number, due to smaller variance of the former, and sensitivity of the latter to non-stepwise mutations. Further, heterozygosity is expected to return to its expected equilibrium more rapidly than variance in allele size (Kimmel et al., 1998), so that ln RH should have more power to detect selective sweeps that have occurred in the relatively recent past, such as during the colonization of Australia. The Beaumont test aims to identify outlier loci (low or high levels of genetic differentiation) by comparing observed FST to a null distribution, conditional on heterozygosity, generated by the coalescent simulation model of Beaumont and Nichols (1996). We used a 100 island model of population structure and assumed the stepwise mutation model, with 50 000 paired values of FST and heterozygosity generated using the program FDIST2 ( Results are visualized by plotting FST vs heterozygosity for each locus, together with the 0.025, 0.50 and 0.975 quantiles of the null distribution of FST.


Gene diversity and linkage disequilibrium

All loci were polymorphic in all populations (allele frequencies are available in Supplementary Table S2). Some populations showed markedly different allele frequencies for some loci, with the overall most frequent allele not the most frequent in one or more populations for 7 of the 15 loci, and 31 of the alleles were unique to one population. Of these, 11 were unique to Mulambin Beach, with 4 unique of the 17 alleles at Db290.

The mean numbers of alleles per locus (Table 1) were significantly different among populations (P<0.01), but not significantly different among the three southern populations, nor among the six northern populations. The three southern populations generally had significantly fewer alleles. Observed heterozygosities for each population were not significantly different, but expected heterozygosity (gene diversity) differed significantly among populations (P<0.05). Tests for pair-wise differences in expected heterozygosity were not significant among the three southern populations, nor among the six northern populations. After Bonferroni correction, pair-wise differences were significant for Isla Gorge and Grandchester with Bulla. Only Db109 in Baradine showed significant (P<0.05) deviation from Hardy–Weinberg equilibrium (F=0.136) after Bonferroni correction.

Linkage disequilibrium is present in the Australian populations (Table 3), with mean D′=0.260, but significantly higher (P<0.05) in the northern populations (0.274) than in the southern ones (0.232). Five populations had more than 5% significant pair-wise comparisons, whereas 58 (55.2%) of the 105 pair-wise D′ were not significant in any population. A maximum of three populations had significant LD for any one locus pair and 36 locus pairs were significant in only one population. Considering all pair-wise tests (105 × 9 populations=945), 59 (6.24%) were significant, of which 31 were between loci on the same chromosome, 21 between loci on different chromosomes and 7 involving the unlocated Db003 (Table 3). Of the 23 significant tests for loci on chromosome 2, 14 involved the two loci (Db034 and Db052) that are located within the inversion segment. Similarly, for loci on different chromosomes, three of the eleven for chromosomes 2 and 4, and two of the six for chromosomes 2 and 5 involve these two loci. Mean D′ is significantly higher (P<0.05) for the Argentine populations (0.462), but neither population had more than 5% significant D′ values. As in the Australian populations, a high proportion (7 out of 8, 87.5%) of the significant LD involve loci on different chromosomes (Table 3).

Table 3 LD analyses

Population differentiation

Analysis of F-statistics (Table 4) showed highly significant population differentiation (FST) for all loci and overall, whereas no FIS estimates were significantly different from zero. Genic and genotypic differentiation among all populations also were highly significant (P<0.001) for each locus and overall.

Table 4 F-statistics analysis, with significance determined by permutation tests in the FSTAT program

All pair-wise FST estimates were significant, except for TAM/ISG. Mean pair-wise FST estimates were lower for the northern populations (0.016±0.010) than for the southern (0.094±0.023), and means for the northern vs the Argentinian populations (0.130±0.012) also were lower than for the southern vs Argentine (0.211±0.030). The regression of FST/(1−FST) on log (geographic distance) for the Australian populations (b=0.019) was significant (P<0.01), indicating isolation by distance, but some populations that are widely separated were not different (for example, MUB/GER) whereas others that are geographically closer were very different (for example, MAL/BUL) (Figure 3).

Figure 3
figure 3

Isolation by distance—relationship of population differentiation (FST) to geographic distance for the nine Australian populations.

Estimation of the number of founders and bottleneck tests

The allele frequency distributions of the overall Australian and Argentinian samples (Figure 4) show the reduced variability of the former as compared with the latter, consistent with the postulated founder effect at colonization. Here we estimate the number of founders using the method of Ramirez et al. (2006). The Australian sample (all nine populations, 540 genomes) included 174 alleles, whereas the source sample (two Argentinian populations, 80 genomes) included 187 alleles. The number of alleles shared between the two samples was 134, with a further 40 exclusive to Australia and 53 exclusive to Argentina. Assuming the Argentinian sample as representative of the source population, the estimated number of founder gametes (2N±s.d.) is 32.4±3.6. But this neglects the 40 alleles exclusive to Australia—implicitly assuming that they were produced by mutation after colonization. Alternatively, we consider that these 40 alleles are present in Argentina, but not included in our sample of only 80 genomes. This is more appropriate, given that the sample size is much larger for Australia than for Argentina (540 vs 80). Thus, taking the combined Australia+Argentina sample of 227 alleles as a better representation of the source population, the estimated number of founder gametes is 74.8±8.3. Given the small size of the Argentinian sample, alleles additional to those detected are likely to exist there, so that the number of founder gametes would be less than our estimate. Thus, this analysis indicates a moderate bottleneck of about 30–40 individual founders on colonization of Australia.

Figure 4
figure 4

Allele frequency distributions for the combined Australian populations (black), and for the combined Argentinian populations (white). Allele size=base pairs. Some rare alleles are omitted for Db142 and Db290 to clarify presentation.

The three southern Australian populations showed lower genetic variability than the six northern populations (Table 1), and of the 174 alleles in the total Australian sample, 168 were present in the northern populations and 112 in the southern. The number of alleles shared between the two was 106, with 62 exclusive to the north and 6 to the south. This suggests a secondary colonization of the south from the north, supporting the expectation from historical records that show that C. cactorum (and thus habitat for D. buzzatii) was deliberately spread only through central and northern New South Wales and Queensland (Dodd, 1940). Assuming the northern population as the source, the estimated number of founder gametes is 43.4±6.0. But again this assumes that the six alleles exclusive to the southern populations arose after colonization. Thus, taking the total Australian sample of 174 alleles as representative of the source population, the estimated number of founder gametes of the southern population is 61.7±8.6, indicating a secondary bottleneck size of about 25–35 founding individuals.

BOTTLENECK analysis (Table 5) showed heterozygosity deficiency for all of the Australian populations. With fewer than 20 loci, the Wilcoxon test is the most powerful (Piry et al., 1999), indicating significant heterozygosity deficiencies for MUB, GRD and TAM. In contrast to the BOTTLENECK results, the M analyses (Table 5) indicate significant bottlenecks for all of the Australian populations except ISG and GRD, with strongest evidence for the three southern populations (GER, MAL and BUL). Both BOTTLENECK and M methods assume no selection, but it is not known how the apparent selection at some loci (see below) would affect these results.

Table 5 Tests for past bottlenecks in population size using the BOTTLENECK and M programs: probability values for tests in the BOTTLENECK analysis, and average values of the M ratio for each population and critical values (Mc).

Geographical variation in genetic diversity and allele frequency

Expected heterozygosity (He) decreased with increasing latitude (P<0.01). In the multiple regression analyses, all four variables were significant for expected heterozygosity at the Db090, Db142 and Db681 loci (Table 6). The residuals from the models at these three loci showed no significant departure from normality. Allele frequencies at Db090, Db142, Db223, Db411, Db493 and Db681 showed significant effects generally for all four variables (Table 6). Residuals from the models for Db090, Db142 and Db411 did not depart from normality, but those at the other three loci did show significant departures. The significance levels for these three loci may be unreliable and should be interpreted cautiously. In all cases, the effect of longitude was of opposite sign to the effects of latitude and the two principal components.

Table 6 Multiple regression analyses that were significant for effects of geographical location (latitude and longitude) and climatic variables (PC1 and PC2) on expected heterozygosity and allele frequencies at each locus

Multilocus simulation tests for selection

To apply the ln RH test, we averaged the heterozygosities (H′) for the Australian and Argentinian populations for each locus, and used these averages to compute ln RH for each locus. The distribution of ln RH values under the assumption of neutrality is expected to be Gaussian (Kauer et al., 2003). Thus, to determine significance of the ln RH values, the mean and standard deviation of the empirical ln RH distribution were used, with loci significant if their ln RH value falls outside the 95% limits (±1.96 s.d.). Of the 15 loci, 1 (Db493) was significant (ln RH=1.864, P=0.018; Table 7). With only 15 loci, outliers may occur by chance, but accepting that the other 14 loci are neutral, and testing ln RH for Db493 against the normal distribution defined by these 14, the probability for Db493 is 0.004. Further, ln RH values and their distributions were computed for each locus for all 18 pair-wise comparisons of Australian and Argentinian populations. Although not an independent test, Db493 again was significant (ln RH=1.766, P=0.022; Table 7). Further, 7 of the 18 comparisons for Db493 had ln RH values outside the 95% limits. The probability of observing seven significant tests for a given locus, calculated from the binomial distribution, is P=1.5E−5. We note that Db493 showed more variation in Australian than in Argentinian populations, that is, not a selective sweep, but increased differentiation of the Australian populations.

Table 7 Results of the ln RH tests for selection

The Beaumont test was run separately for the nine Australian populations, and for these plus the two Argentinian populations. In both cases, Db087 showed significantly less variation than expected, falling below the 0.025 quantile limit (Figure 5 for the nine Australian populations). For the two sets of populations, the test statistics and probability values were (−3.432, 0.0004) and (−6.711, 0.0000). With only 15 points, one might be expected to be outside the 95% limits. However, for both sets, Db087 is well below the 0.005 quantile. Thus, there is strong evidence that Db087 is affected by forces that are different from those affecting the remaining loci, namely balancing selection that keeps alleles at similar frequencies in each of the populations.

Figure 5
figure 5

FST values estimated for 15 microsatellite loci in nine Australian populations plotted against heterozygosity, with 0.975, 0.50 and 0.025 quantiles of the conditional distribution, estimated from an island model with expected FST=0.047.


Our primary aims were to assess the apparent bottleneck at colonization of Australia and to determine if any of the microsatellite loci marked genome regions that were either subject to selection or affected by selective sweeps. The studied Australian populations, which span 14.5° of latitude and range from the coast to 345 km inland, cover much of the range of D. buzzatii in Australia.

Demographic history of D. buzzatii in Australia

Previous comparisons of Australian populations with original ones in Argentina support a bottleneck at the time of introduction, with the former showing reduced variability for allozyme (Barker et al., 1985), mitochondrial DNA (Halliburton and Barker, 1993), microsatellite (Frydenberg et al., 2002; Table 1) and other molecular (Piccinali et al., 2007) polymorphisms. Distributions of the overall Australian and Argentinian allele frequencies at 15 microsatellite loci (Figure 4) show not only the reduced variability of the former, but also dramatic shifts at many of the loci—clear signals of a founding bottleneck. Using the method of Ramirez et al. (2006), which makes no explicit assumption other than that sample sizes are adequate, we estimate that this was a moderate bottleneck of about 30–40 founders on colonization of Australia.

The BOTTLENECK and M ratio analyses gave different signals regarding past bottlenecks. Bottleneck failed to detect any evidence of a past bottleneck in any Australian population. However, this test is effective only for identifying populations that have recently experienced a severe reduction in population size, that is, <4 Ne generations ago, where Ne is the bottleneck effective size (Piry et al., 1999). As the colonization bottleneck in Australia occurred some 600–700 generations ago (assuming about 10 generations per year since introduction in the period 1931–1936), and the bottleneck effective size was 30–40 estimated number of founders, the failure to detect a past bottleneck is not unexpected. In fact, this test showed all populations to have a heterozygosity deficiency (significant for three of the six northern populations) suggesting that they are not at mutation–drift equilibrium, but that there has been a recent expansion in population size or a recent influx of rare alleles from genetically distinct immigrants. Given the relatively low (albeit significant) genetic differentiation among these populations, the latter possibility is most unlikely. However, the former is very compatible with the prediction (Barker, 1982; Sokal et al., 1987) of a population explosion of D. buzzatii immediately following its introduction in the 1930s.

Garza and Williamson (2001) argued that their M ratio should retain information about past demographic history for longer than methods such as BOTTLENECK, and our results appear to validate this. In addition, Garza and Williamson (2001) show that the M ratio test is suited to the demographic scenario expected for D. buzzatii, that is, a severe, single-generation (or just a small number of generations) reduction in population size, followed by a rapid recovery. The M ratio test gave clear evidence of a past bottleneck for all except two of the Australian populations, with strongest evidence for the three southern populations.

Population differentiation was highly significant for each of the 15 loci and overall (Table 4). Pair-wise estimates show that this differentiation is much higher for the three southern populations as compared with the other six, and high among these three. The three southern localities are outside the predicted core distribution range for D. buzzatii, as determined by climatic suitability (Barker et al., 2005), and the pair-wise FST estimates, the estimated number of founders and the reduced mean number of alleles per locus (Table 1) all suggest a secondary bottleneck for the southern populations.

The demographic history of D. buzzatii in Australia can be summarized as a bottleneck at colonization by founders from Argentina, rapid spread and expansion to an enormous population size over the area of the Opuntia infestation in Queensland and New South Wales, and then contraction to spatially isolated populations. Further colonizations within Australia with moderate secondary bottlenecks gave rise to the southern populations.

Tests for selection

Given the evidence for a bottleneck at colonization, we need to ensure that any evidence suggesting selection is not simply a function of the bottleneck. That is, can we distinguish effects of the founding bottleneck from later local adaptation to different environments after colonization? Clearly, the founding bottleneck changed allele frequencies in Australia as compared with the source population of Argentina (Figure 4). The populations studied here all derive from one original very large population, and the significant genetic differentiation among them may have been due to genetic drift. The significant isolation by distance and latitudinal clinal patterns might be consistent with drift, but the significant associations for some loci with climatic variables after correcting for geographical location suggest selective differentiation, that is, local adaptation after colonization. On the other hand, as latitude and longitude were significant after controlling for climate effects (PC1 and PC2), geographical variation in factors other than climate also apparently is affecting allele frequencies. As the climate variables, summarized as PC1 and PC2, refer to long-term averages, other factors may include extremes that occur sporadically, for example, stressful conditions, such as drought or heat waves (episodic selection, Gillespie, 1991), and climatic effects summarized in other principal components.

Although the frequency of the standard second chromosome arrangement decreases with increasing latitude in Australia (Knibb et al., 1987; Knibb and Barker, 1988) and in Argentina (Hasson et al., 1995), indicating selective differentiation of populations for inversion frequencies, only Db034 and Db052 are included within an inversion (Figure 2), and these loci showed no significant effects in any selection analysis. Although Db142 is near the proximal breakpoint of inversion 4s, this inversion has not been found in Australian populations (Knibb et al., 1987).

Although we had expected that some LD generated by the bottleneck at colonization may have been maintained, particularly for chromosome 2 loci within or near the inversion breakpoints, the finding of extensive LD between loci on different chromosomes was not predicted. Such LD is expected to be halved in each generation, and eliminated in about six generations. Thus, it could not have been generated by the small founding population, and subsequently maintained for 600–700 generations, unless due to epistatic selection. But if so, more consistency over populations in the expression of LD would be expected. Alternatively, the observed LD between loci on different chromosomes may be a transitory function of drift due to small Ne in the few generations before sampling. Again, however, more consistency would be expected, in this case over locus pairs within populations, with populations differing in the extent of LD. The BAR population, with 15.24% of significant pair-wise comparisons, may fit this explanation, as rots were very sparse at the time of collection, and much more collection effort was necessary at this site. Further, sampling of related individuals can lead to LD, a possibility for D. buzzatii as individual rots are founded by about 10 flies or less (Prout and Barker, 1989). Although a weak test, comparisons of observed and expected heterozygosities do not support this explanation.

Given this, we must question whether the apparent selection detected for seven loci also could be due to drift. For the loci where apparent selection was detected in the regression analyses, drift cannot be excluded, and further evidence will be necessary to demonstrate a causal relationship with any climatic variables. However, two of these loci (Db411 and Db493) are on chromosome 2. Of the 23 significant pair-wise LD among chromosome 2 loci, 14 were between one of the two loci within the inversion (Db034 and Db052) and one of the other five loci, suggesting selection on the inversions (standard vs 2j; 2jz3 is rare in Australian populations; Knibb et al., 1987), in addition to effects of drift. This inversion selection might then account for the apparent selection found for Db411 and Db493, and the significant ln RH test for Db493. However, the similar frequencies of the four common alleles of Db493 (Figure 4), but with each allele at highest frequency in a different population (Supplementary Table S2), are unlikely to have been generated by selection on the inversions or by drift. Db493 apparently is exhibiting local adaptation to the novel environments in Australia—such as, different species of the Opuntia host plants as compared with those in Argentina, and a broad range of climatic environments. Certainly, drift does not account for all genetic differentiation among these populations, as local adaptation has been shown for traits relevant for thermal adaptation (Sarup et al., 2006).

The Beaumont test identified another locus (Db087), which showed a different signature of selection: balancing selection keeping alleles at similar frequencies in different populations, including those in Argentina. This test assumes that the loci are unlinked or loosely linked. Although Db087 showed significant LD with other loci on chromosome 4—with Db142 in one population and with Db681 in two populations—these loci are far distant from Db087 (Figure 2), and the significant LD seem unlikely to bias the test. As with the LD between loci on different chromosomes, these LD on chromosome 4 may result from drift. But again, more consistency might be expected over locus pairs within populations. Apparently Db087 is marking selection over a substantial proportion of chromosome 4.

Our finding that different approaches identify different loci is not surprising. Balancing selection at Db087 could not be detected by the other two approaches. Loci showing geographically varying selection (in the multiple regression analyses) will not be detected by the Schlötterer ln RH test, unless like Db493, they also show more variation in the derived (Australian) than in the ancestral populations, and may well not be detected by the Beaumont test (showing higher than expected FST for a given heterozygosity, but not significant).

Linkage disequilibrium, expressed as the percentage of D′ values that were significant, is higher in the Australian populations than in the Argentine CAT population, which is likely representative of the source population(s) of the Australian founders. Thus, there is good evidence for LD generation in Australia by the bottleneck at colonization and/or by subsequent drift or selection. The one problematical case relates to Db223 and Db225, which are contiguous in chromosomal localization, but show significant LD in only one population (BAR), and Db223 allele frequencies show significant location–climate associations whereas Db225 does not. However, of the loci that show significant associations with location–climate effects, those for Db223 are weakest, and regressions including PC1 or PC2 with latitude and longitude showed significant effects of longitude for both Db223 and Db225. Selective effects at these juxtaposed loci must be considered tentative.

In addition to drift effects, directional selection in the Australian populations would further promote LD (Maynard Smith and Haigh, 1974; Thomson, 1977; Kaplan et al., 1989). Significant linkage disequilibria between loci that are not closely linked have been found in other species that have been subject to bottlenecks and/or strong selection (Ledig et al., 1999; Yan et al., 1999; Kohn et al., 2000; Sharbel et al., 2000; McRae et al., 2002; Sinervo and Clobert, 2003; Hansson et al., 2004). Although LD is expected to decay at a rate dependent on the recombination fraction, and to be maintained only for very closely linked loci, these results show that linkage disequilibria may be far more common in natural populations than is generally assumed, and the loci apparently affected by selection may well be marking selection in large genome regions including many loci that are not necessarily closely linked.

A more complete understanding of the effects observed here will be possible only when individuals sampled are assayed both for inversion type and the microsatellite genotypes, so that LD between inversions and microsatellite loci can be estimated, and when both colonized populations and the endemic populations of Argentina are studied.

Haddrill et al. (2005) compared noncoding DNA polymorphism at 10 X-linked loci in three African and two non-African populations of D. melanogaster. Although several features of the data rejected the neutral model, it was concluded that simple bottleneck models were sufficient to account for most, if not all, polymorphism features. However, the demographic history of D. melanogaster is not well known, and this conclusion depends on assumptions made regarding the times of colonization from Africa. D. buzzatii would provide a useful complementary model to D. melanogaster for studies of demography and selection, given the known and much shorter time since colonization, and well-defined demographic history.