Introduction

The link between environmental heterogeneity and habitat-based genetic divergence has been the main theme of evolutionary biology ever since the famous work on the finches of the Galapagos (Darwin, 1859). Studying habitat-competent loci facilitating this link can help to understand genetic underpinnings of adaptation, and can also provide insights into the mechanism of ecological speciation (Storz, 2005). Habitat-competent loci are particularly open to investigation in hybridising taxa that are diverging via adaptation to contrasting habitats. Habitat-based divergence in such taxa is expected to be more rapid at loci involved in habitat adaptation, compared to loci with same fitness effect across contrasting environments. Under conditions of such two-speed process, habitat-competent loci can be accessed using multilocus surveys of between-habitat genetic divergence, providing that this divergence is known to be a product of the extant balance between habitat-based divergent natural selection and gene flow (Kreitman and Akashi, 1995; Hedrick, 1998; Storz, 2005).

Population genomic surveys are increasingly common in host races, reproductively compatible but genetically diverged parasitic taxa associated with different hosts (Diehl and Bush, 1984; Nosil et al., 2009). Despite the absence of any significant barriers to gene flow, host races can co-exist in sympatry or parapatry without merging with each other, therefore their genetic integrity is often assumed to be a product of selection-gene flow balance (Dres and Mallet, 2002). Direct evidence of such balance is scarce, but can be obtained using gene flow perturbation techniques, for example, by testing for an increase in between-race divergence in response to experimental reduction of between-race hybridisation. A positive in such test is expected only if between-race divergence is a product of selection-gene flow balance (Riechert, 1993; Nosil, 2009).

This earlier method of gene flow perturbation can be limited by low level of within-population variance at host-competent traits and loci. As the local variance in fitness is depleted by local adaptation, selection response to gene flow reduction becomes weaker and harder to detect, potentially leading to false negatives (Lexer et al., 2003). In this study, we present, for the first time, an alternative method of between-race gene flow perturbation in which the variance in fitness and the detectability of selection response is increased by hybridisation. Under this new approach, the existing racial divergence is initially obliterated using multiple generations of hybridisation during the stage referred to as ‘fusion’. During the following ‘fission’ stage, hybrids invade and adapt to a bi-modal environment composed of a mixture of parental hosts. In different fission treatments, this process occurs under different levels of between-host gene flow. At the end of this fusion–fission procedure, the evidence of selection-gene flow balance is obtained by testing for host-based genetic split of the originally uniform population of fusion hybrids, and by comparing the level of fission divergence in different gene flow treatments. If the balance is present, we expect that, (i) host-based divergence obliterated by hybridisation during the fusion stage re-evolves during the following fission stage despite continuous gene flow; (ii) the depth of this ecologically induced genetic split is inversely proportional to the level of gene flow; and (iii) the pattern of the re-evolved divergence is incompatible with stochastic factors such as random genetic drift.

We have studied Aphidius ervi Haliday (Hymenoptera: Braconidae), a parasitoid wasp with free-living adults and parasitic larvae feeding on internal tissues of aphids. A. ervi is a tractable model for experimental studies of host-based natural selection because main factors of this selection in this parasitic species are confined within the host aphid—host plant complex that can be easily created and maintained in the laboratory (Henry et al., 2008). Multi-generation selection experiments with A. ervi are also facilitated by its short (10 days) generation cycle and lack of obligatory diapause. Traditionally, A. ervi has been viewed as a generalist parasitoid attacking a range of aphid hosts (Henry et al., 2005). However, there is mounting evidence that A. ervi is a complex genetic mosaic of more or less distinct host races (Stary, 1974, 1983a; Pennacchio and Tremblay, 1986; Henry et al., 2008). In our study, we use three such races: pea race (P), cereal race (C) and nettle race (N) associated, respectively, with Pea aphid Acyrthosiphon pisum (Harris) on broad beans Vicia faba L., Cereal aphid Sitobion avenae (Fab.) on barley Hordeum vulgare L., and Nettle aphid Microlophium carnosum (Buckton) on Common Nettle Urtica dioica L. P and N wasps show strong differences at a number of traits (Stary, 1983a; Pennacchio and Tremblay, 1986) and molecular marker loci (Atanassova et al., 1998), and are sometimes considered to be separate species (Pennacchio and Tremblay, 1986). By comparison, morphological, behavioural and molecular divergence between P and C wasps is less pronounced (Stary, 1983a; Pungerl, 1986; Daza-Bustamante et al., 2002), suggesting an earlier stage of host race formation (Cameron et al., 1984).

P, N and C wasps freely interbreed in the laboratory and are commonly found in sympatry or close parapatry in the wild where between-host migration and gene flow can occur (Stary, 1983b). It is reasonable to hypothesise, therefore, that racial diversity in A. ervi is maintained by a balance between host-based natural selection and gene flow. To obtain direct evidence of this balance, we conducted a laboratory fusion–fission gene flow perturbation study in order to test the following three null hypotheses: host race fusion is irreversible, and the racial divergence cannot be recovered under the fission conditions of our experiment (H01); any fission divergence that does occur during the experiment is not negatively correlated with gene flow (H02); this divergence is likely to be explained by random genetic drift (H03).

Our study has rejected all three hypotheses. In two independent experiments, one involving P and N races and another based on P and C pair, the host-based genetic structure obliterated during the fusion stage has re-evolved in response to the environmental bimodality of the fission environment. We show that the level of the re-evolved divergence as well as the rate of linkage disequilibrium (LD) at affected loci are both higher under restricted compared with unrestricted gene flow, as expected under selection-gene flow balance. We also demonstrate, in four separate tests, that the observed patterns cannot be explained by random genetic drift. We therefore conclude that divergence observed at the end of the fission stage is a specific product of host-based selection operating in the face of gene flow, and that loci affected by this divergence can be valid targets for a more detailed study of host adaptation genetics and ecological speciation in A. ervi.

Methods

Insect stocks

Parental populations used for fusion hybridisation in P × N and P × C experiments were represented by P, N and C inbred racial stocks, which were developed using pea, nettle and cereal race parasitoids harvested from pea aphids on broad bean, nettle aphids on nettles and cereal aphids on barley in Herfordshire (51°48′N, 0°23′W, P and N races) and Dorset (50°55′N, 1°55′W, C race), UK (the reasons for using inbred stocks rather than natural host race populations are explained at the end of this section). The host of each parasitoid in our study was known because harvesting was made by collecting mummies (mature parasitoid pupae encased within hardened skins of consumed aphid hosts and attached to the surface of host plants) rather than by capturing free-flying adults. Wild parasitoids were used to start large (>100 individuals) insectary populations which were maintained on their respective natural hosts for 15 (N and C race) and 35 (P race) generations; the racial stocks were isofemale lines derived from these insectary cultures. We used 10 generations of singe-pair sib mating to create P stocks, but only three generations for N and C stocks because the latter two races were relatively intolerant to inbreeding. To ensure that P and non-P (N or C) stocks were developmentally synchronised for P × N and P × C mating, a five-generation-old inbred P population was used to derive a series of sister stocks drifting independently for the next five generations of sib mating. Hybridisation was then conducted between a non-P stock and a developmentally matching P stock. All parasitoid cultures were maintained in watered and ventilated 0.5 m3 plexiglas microcosms containing host plants with large (thousands of individuals) populations of aphid hosts. All work has been conducted under conditions supporting continuous parthenogenetic reproduction of aphids (18 °C daytime, 16 °C night time, 60–70% RH, 16:8 h photoperiod), using native host aphid—host plant complexes composed of a standard set of aphid clones and plant cultivars.

The experiment

Our study consisted of two parallel fusion–fission experiments, in each of which a panmictic population of F4 fusion hybrids, either P × N or P × C, is invading a two-host fission environment composed of a mixture of either P and N or P and C hosts. Each experiment had two fission treatments, one with restricted and an other with unrestricted gene flow between parasitoid populations on contrasting hosts (Supplementary Figure S1). Each of the four generations of fusion consisted of 50 matings between individuals from the P and the non-P host (25 P ♀ × non-P♂ + 25 non-P ♀ × P♂), followed by randomised host assignment of egg-laying females, parasitisation of hosts and harvesting of progeny mummies. Fusion hybrids were used as an input for the restricted and unrestricted gene flow treatments, as well as to set up P × N and P × C control hybridisation lines, which were maintained for the duration of the experiment using the fusion protocol. Both gene flow treatments in both experiments were thrice replicated, resulting in a total of 12 fission replicates. Each restricted gene flow replicate was set up by harvesting 30 virgin fusion females (15 from each of the two contrasting hosts) and same numbers of different-host virgin fusion males, and by mating them in such a way that 28 females were paired with males from their own host, whereas two females hybridised with males from the contrasting host. Mated females were then split into two groups, each containing 14 homonymously mated females harvested from one of the two hosts and one hybridised female from the contrasting host. The groups were placed inside two microcosms each containing one of the two contrasting hosts, five pots per microcosm, in such a way that the native host of the 14 homonymously mated females matched the host present in their microcosm. Unrestricted gene flow replicates were created and maintained in the same manner, except that the 30 pairs of virgin mates, 15 pairs from each host, were placed inside a microcosm containing a mix of two contrasting hosts, five pots each, within which mating and egg laying occurred in accordance with individual mate and host choice preferences. Restricted gene flow treatments were thus broadly similar to the situation of parapatry with 6.7% hybridisation preceded by migration, and were designated ParaP × N and ParaP × C. In contrast, unrestricted gene flow treatments were similar to host-based divergence in close sympatry and were designated SymP × N and SymP × C. Hybridisation probability in the latter case was not preset, but was bound to be higher than in the restricted gene flow treatment and was likely, at least initially, to be close to 50%.

Molecular marker analysis and hypotheses testing

For genetic marker analysis we used amplified fragment length polymorphism (AFLP), a bi-allelic dominant marker system (Vos et al., 1995). Using AFLP in diploid organisms usually leads to data loss because heterozygotes and dominant homozygotes are both expressed as ‘+’ (band presence) phenotype. However, A. ervi haplodiploidy enabled us to bypass this problem by analysing only haploid males. AFLP analysis was carried out using a previously described multiplex protocol (Trybush et al., 2006). Inbred stocks, control fusion lines and gene flow treatments of the fission stage were analysed using a total of 45 combinations of AFLP primers (Supplementary Table S1). Genetic structure among replicates within gene flow treatments was initially tested using six of these primer combinations. The remaining 39 primer combinations were analysed in all replicates of the treatment if significant same-host between-replicate structure was found, or in one of the tree replicates if no structure was detected. Fragment electrophoresis, detection and peak scoring was performed using 3730 DNA analyser and Genotyper 4 software from Applied Biosystems by Life Technologies Corporation (Carlsbad, CA, USA), with subsequent manual checking of all bands. We used only polymorphic bands with standard deviation of fragment size and signal intensity not exceeding, respectively, 5 and 10% of the mean. In addition to this quality control measure, loci were taken into analysis only if 1:1 segregation of ‘+’ and ‘−’ alleles among male progenies of a single virgin F1 hybrid female was observed. AFLP loci that had the same primer sequence and fragment molecular weight were considered to be homologous. The probability of AFLP homoplasy is 8*10−3 for comparisons between diploid natural populations (Emelianov et al., 2004), but in our case wherein all populations within the experiment descended from the same pair of inbred racial stocks this value should be lower.

The irreversibility of fusion and the divergence-gene flow independence hypotheses (H01 and H02, respectively) were tested by estimating the level of between-host fission divergence in different gene flow treatments. These two hypotheses were rejected if genetic split between different-host fission populations was detected, and if restricted gene flow populations achieved higher level of divergence than populations connected by unrestricted gene flow. Fst estimates of divergence for these tests were obtained using Analysis of Molecular Variance as implemented in ARLEQUIN 3.01 (Schneider et al., 2006). The significance of Fst estimates was tested using the same software package by permuting male haplotypes at least 10 100 times between populations. Genetic partitioning of inbred stocks, control fusion lines and population samples from fission microcosms was performed by Principal Components Analysis using GenStat 11 (VSN_International, 2008).

Rejection of H01 and H02 was likely to be indicative of selection-gene flow balance. However, the sum of these two rejections was not sufficient to rule out the possibility of random genetic drift. To test for the role of drift in causing fission divergence (H03), we conducted the following four analyses.

Drift test 1

Within individual treatments, hybridising different-host, same-replicate populations are unlikely to drift faster than isolated same-host different-replicate populations. Therefore, drift was rejected in all comparisons where the estimate of between-host Fst exceeded the same-host value.

Drift test 2

In our study, for any two pairs of populations analysed with the same set of polymorphic markers, we will have certain numbers of loci diverged in both pairs (a), non-diverged in both pairs (b), diverged in the first but not in the second pair (c) and diverged in the second but not the first pair (d). The probability P of the corresponding 2 × 2 contingency table occurring by chance is expected to be greater than 0.05 if the observed divergence is a product of random genetic drift. However, when divergence is driven by selection targeting a limited group of loci, the fraction of loci appearing on ‘diverged’ list in both population pairs should increase above random expectations, leading to P<0.05 and to rejection of drift. To perform this test, we conducted the locus-by-locus Fst analysis (Schneider et al., 2006) for every possible pair of different host populations within each treatment, and made all possible pairwise comparisons between the pairs. In a thrice-replicated treatment this resulted in nine population pairs forming 36 pairwise combinations. For each such combination of population pairs we calculated the parameters a, b, c, d and determined the value of P using two-tailed Fisher exact test as implemented in GenStat (VSN_International, 2008). This analysis was supplemented by testing in each of the two experiments whether loci affected by divergence in the unrestricted gene flow treatment also diverged under restricted gene flow, and whether this observed pattern was due to chance. For this part of the test, we used the same protocol as in the within-treatment analysis, but this time pairwise combinations were formed between population pairs from different gene flow treatments.

Drift test 3

Between-host migration and hybridisation is likely to generate LD which peaks in F1 and then decays at a rate largely governed by the rate of recombination (Li and Nei, 1974; Barton and Gale, 1993; Kruuk et al., 1999). If host-associated demes of a hybrid population diverge via drift, the mean rate of LD decay among diverged and non-diverged loci is likely to be the same. This is likely to hold true despite the fact that recombination rate often varies among genomic regions (Butlin, 2005), assuming that drift occurs randomly across the genome. In contrast, if divergence is driven by selection, the rate of LD decay is likely to be lower for diverged compared with non-diverged loci because of a positive feedback between LD and selection (Kruuk et al., 1999). As a result, drift is rejected if the average pairwise LD among diverged markers at the end of fission is significantly higher than LD among non-diverged loci. There are two important caveats to this test. In the restricted gene flow treatment, LD among diverged loci can be inflated not only by selection but also by the divergence itself (Barton and Gale, 1993), as well as by heterozygote deficit generated in this treatment by the imposed 6.7% host-based mating assortativity (Kruuk et al., 1999). Therefore, drift can be safely rejected only when the difference in LD between diverged and non-diverged loci under restricted gene flow is accompanied by a similar pattern under unrestricted gene flow where no host-based mating assortativity was imposed. Second, the sensitivity of this test depends on the strength of selection and on the linkage pattern of host-competent loci. If selection is strong and/or if diverged host-competent loci are clustering within a region of reduced recombination (Feder et al., 2003), the difference between the rate of LD decay at diverged and non-diverged loci is likely to be greater than under weak selection and/or diffused genomic architecture. This means that unambiguous interpretation is likely when the test result is a positive, that is, in cases where significant difference between LD at diverged and non-diverged loci is detected, whereas true and false negatives can be harder to distinguish. To carry out drift test 3, we computed standardised r2 LD coefficients for all possible pairs of diverged loci in each of the four treatments, separately for each host-associated population within the treatment. Because only haploid males were used, this analysis was carried out using exact test of LD (Schneider et al., 2006). To control for the effect of genetic polymorphism on LD estimates, r2 was calculated by using only polymorphic loci with the frequency of the least common allele no <0.2. Non-diverged loci were analysed in the same manner, after which diverged and non-diverged averages were compared. The distribution of r2 coefficient is unlikely to be normal (P<0.001 in Shapiro–Wilkinson test of normality in preliminary analyses of our data), therefore confidence interval calculation for individual averages and significance testing between diverged and non-diverged r2 was performed, respectively, by bootstrapping the mean (MathSoft Inc., 1999), and by using Mann–Whitney U-non parametric procedure (VSN_International, 2008).

Drift test 4

In our study, any locus found to be diverged between different-host inbred stocks or fission populations was designated as a P or a non-P (N or C) marker if the ‘+’ allele at this locus had high frequency on P or non-P host, respectively. There is a certain probability that a locus with a particular (P or non-P) marker status in inbred stocks re-evolves the same status during the fission stage. Under drift, the frequency of this event should not exceed random expectations. However, if fission is adaptive and markers diverge by hitchhiking on host-competent genes, this frequency should increase above the chance value, leading to rejection of drift. To conduct drift test 4, we first determined the marker status for each locus affected by fission divergence, by calculating allele–host association index. Allele–host association index is a difference between the ‘+’ allele frequency on the non-P host and the same allele's frequency on the P host. This index thus had negative value for all P markers and positive value for all N or C markers. Allele–host association index was calculated for each locus twice: in microcosm populations at the end of fission and in inbred racial stocks prior to fusion. Loci with the value of the index equal 0.5 could not be used in this analysis and were excluded. We obtained loci counts in four classes: fission P markers that had P marker status in the racial stocks before the fusion; fission P markers with pre-fusion non-P status; fission non-P markers with pre-fusion non-P status; and fission non-P markers with pre-fusion P status. Resulting contingency 2 × 2 table was tested in each experiment for significance of association using two-tailed Fisher exact test (VSN_International, 2008).

Using inbred racial stocks

Inbred stocks were used primarily to facilitate drift testing, because using natural host race populations could make this testing in our experiment problematic. The main reason for this is that the pattern of host-based divergence in natural races is unlikely to conform to neutrality even in the absence of selection (Beaumont, 2005; Storz, 2005). Because natural ‘outlier’ loci with elevated between-race divergence are likely to have elevated variance in hybrids, and because the magnitude of random per-generation deviation δ2 of the allelic frequency q is directly proportional to the level of this variance (Li, 1976),

it is likely that a deviation from neutrality could occur during fission even if drift was the main cause of fission divergence. We by-passed this problem by fusing inbred racial stocks whose divergence pattern was likely to be neutral. The neutrality of between-stock divergence pattern is expected, because inbred stocks are products of generations of strong drift and should differ not only at host-competent loci but also at a large number of random host-neutral loci. To test whether this abundance of host-neutral drift-generated divergence in our inbred stocks has ‘swamped’ any neutrality aberrations caused by non-neutral divergence pattern in natural races, we compared the observed variance of Fst between P and non-P stocks with the Lewontin–Krakauer (Lewontin and Krakauer, 1973) expectation δexp for two populations:

Although caution is urged when Lewontin–Krakauer approach is used in tests for selection (Baer, 1999; Beaumont, 2005), it can be applied with confidence in our case because we test for any type of deviation from random expectations regardless of this deviation's nature. Parameter k in Equation 2 is a constant specific to the underlying distribution of allele frequencies among diverging stocks, with k2 expected at loci diverging predominantly by drift. The value of δexp for each pair of populations was thus obtained for k=2 by calculating mean \(\overline{Fst}\) averaged over loci polymorphic in at least one of the two populations. The value of δexp was then compared with the observed variance of Fst δobs for same loci. This is a categorical test; either our hypothesis about the absence of deviation from neutrality was not rejected (that is, δobsδexp) or it was.

Results

AFLP analysis of all populations involved in this study produced a total of 879 markers polymorphic within this global data set (Supplementary Table S2). Inbred racial stocks showed genetic variance loss, which was similar between sister P stocks but was greater in P compared with non-P stocks (Supplementary Figure S3). This pattern is expected, and is reflecting the underlying differences between P and non-P stocks in the number of generations of sib mating. Apart from the loss of within-stock variance, sib mating was expected to cause large amounts of drift, resulting in random allelic fixation across loci. In line with this expectation, the bulk of total variance in P × N and P × C racial stock pairs was found to be due to the variance between stocks (76.02 and 77.93%, respectively, P10−3 in both cases), with correspondingly high (74.75 and 72.37%) fractions of polymorphic loci affected by significant divergence (Supplementary Table S3). Such strong and wide-spread divergence of random loci was likely to ‘swamp’ any deviations from neutrality in the pattern of natural between-race divergence even despite the fact that P, N and C stocks were maintained on different hosts. Indeed, Lewontin–Krakauer analysis of divergence between P and N stocks has shown that the observed Fst variance δobs in this pair of stocks was well below the value of δexp expected under the critical k=2 (Equation (2)): δobs/δexp =0.121/1.155. Divergence between P and C stocks showed similar pattern: δobs/δexp =0.097/1.214. That these patterns are indeed indicative of neutrality is supported by the fact that a similar pattern (δobs/δexp=0.123/1.092) was observed in sister P stocks which were maintained on the same host, and which could have diverged only due to random drift.

Fusion of racial stocks was followed by fission. After sixth generation, we observed a decline of P × N populations in unrestricted gene flow treatment (Supplementary Figure S4), after which the experiment was terminated. Analysis of molecular variance of control fusion populations revealed no host-based structure (P=0.98 and 0.61 in P × N and P × C controls, respectively, Supplementary Table S3). In contrast, an initial analysis of all fission replicates using a total of 143 polymorphic markers generated by a subset of 6 out of the total of 45 primer combinations (Supplementary Tables S1, S2) has shown that significant host-based genetic split occurred in both experiments despite continuing gene flow, rejecting the irreversibility of fusion hypothesis H01 (Supplementary Figure S2 and Supplementary Table S4). Comparisons between gene flow treatments revealed that divergence increased as gene flow decreased, leading to rejection of the divergence-gene flow independence hypothesis H02. This pattern was highly significant in P × N microcosms. In particular, Fst averaged over all ParaP × N replicates was significantly higher than the SymP × N average (Supplementary Table S4). Similarly, significant divergence was detected in all nine ParaP × N between-host pairwise population comparisons, but only in five out of nine in unrestricted gene flow treatment (Fisher exact P=0.041). In contrast, the negative interaction between divergence and gene flow in P × C experiment was relatively weak and non-significant, both overall (Supplementary Table S4) and on the individual replicate level (three out of four, two out of nine Fisher exact P=0.161). Drift test 1 rejected the drift hypothesis H03 in both experiments by demonstrating that divergence occurred predominantly between hybridising populations on different hosts within replicates rather than between isolated different-replicate populations on the same host (Supplementary Figure S2 and Supplementary Table S4). The drift hypothesis was also rejected in drift test 2, showing that host-based divergence in different replicates and in different treatments affected same loci more often than can be explained by chance alone. Although this pattern was observed in both experiments, it was more robust among P × N compared with P × C replicates. In particular, the number of loci co-diverged in different replicates within treatments significantly exceeded random expectations in 31 out of 36 different-host pairwise population comparisons in ParaP × N, as well as in 5 out of 36, 1 out of 6 and 1 out of 36 between-host comparisons respectively in SymP × N, ParaP × C and SymP × C (Supplementary Table S5a). Similarly, significant co-divergence between treatments was observed in 3 out of 36 pairwise tests in P × N experiments but was absent in P × C microcosms (Supplementary Table S5b). The initial analysis of fission data thus provided tests for H01 and H02, as well as partial testing of H03. Drift tests 3 and 4 required larger number of informative markers than could be generated by the six primer combinations. These tests were thus conducted after analysing the remaining 39 primer combinations, as described below.

Because there was no significant same-host between-replicate structure in the initial six primer combinations data (Supplementary Table S4), the extended analysis of fission populations was completed using four randomly chosen replicates, one per treatment. This work revealed a picture broadly similar to the initial analysis. Significant host-based genetic split was detected under restricted gene flow, rejecting the fusion irreversibility hypothesis H01. This is illustrated by Figure 1 showing, in both experiments, the transition from the clear initial divergence of inbred stocks, through the hybrid swarm-like genetic uniformity of control fusion lines, to the recovery of host-based divergence in fission populations. Apart from the clear partitioning of different-host fission populations, Principal Components Analysis (PCA) detected a within-host structure involving two distinct categories of host-associated haplotypes in each parapatric microcosm. These were represented by locally ‘extreme’ haplotypes strongly partitioned between hosts, and intermediate haplotypes with weaker between-host differences. This pattern is compatible with interaction of host-based selection and between-host hybridisation, and seemed to be stronger in P × N experiment, wherein the partitioning between and within host was clearer compared with P × C microcosms. The tentative PCA evidence of selection-gene flow interaction is in agreement with the observation that host-based fission divergence increased as gene flow decreased (Figure 2; Supplementary Table S3). Both types of evidence are thus corroborating the initial rejection of the divergence-gene flow independence hypothesis H02. Similar corroboration was also obtained in this part of our work for the drift test 2 rejection of the drift hypothesis H03 in P × N microcosms, whereas in P × C experiment the number of loci co-diverged between gene flow treatments did not exceed random expectations (Supplementary Table S5c). The LD analysis uncovered two main patterns. Gametic associations observed under restricted gene flow were significantly stronger than those in unrestricted gene flow populations in all P × N comparisons and in three out of four P × C comparisons (Figure 3; Supplementary Table S6), thus lending further support to rejection of the divergence-gene flow independence hypothesis H02. On the other hand, LD-based drift test 3 rejected the drift hypothesis H03 in P × N experiment but returned an ambiguous result in P × C microcosms. In P × N, average pairwise LD was higher among diverged compared with non-diverged loci both under restricted and unrestricted gene flow, whereas in P × C experiment no significant differences was observed (Figure 3; Supplementary Table S6). The contrast between P × N and P × C data was also observed in drift test 4 which is asking if fission is restoring the ancestral configuration of host–allele associations disrupted by fusion. This test was significantly positive in P × N but only marginally so in P × C experiment (Fisher exact P<0.001 and P=0.042, respectively).

Figure 1
figure 1

Principal Components Analysis partitioning of parental inbred stocks, fusion, and restricted gene flow fission populations using all (a) or only diverged (b) loci, 45 primer combinations data. Individual males harvested from different hosts are represented by filled (P host) and empty (non-P hosts) small (inbred stocks) and large (fusion and fission populations) circles.

Figure 2
figure 2

Fission divergence under restricted and unrestricted gene flow conditions, 45 primer combination data. Bars are showing 95% bootstrap support limits, 20 000 permutations.

Figure 3
figure 3

LD in host-associated populations among loci that were or were not affected by fission divergence under restricted or unrestricted gene flow conditions in P × N and P × C environments. Bars are showing bootstrap 95% support limits, 10 000 permutations.

Discussion

Using gene flow perturbation and hybridisation is not new to studies of natural selection. Gene flow perturbation has been used in desert spiders Agelenopsis aperta (Riechert, 1993) and Timema walking sticks (Nosil, 2009), where artificially imposed between-race migration barriers resulted in an increase of between-race phenotypic divergence. On the other hand, experimental hybridisation has been commonly used to create variation for the traits that differentiate habitat-associated taxa in order to study selection on these traits (Lexer et al., 2003). In this study, we combined these two approaches to create a novel ‘fusion–fission’ method of gene flow manipulation in order to study the interaction between gene flow and selection in generating and maintaining host race diversity in A. ervi parasitoids. In contrast to the earlier work, we first obliterated the existing host-based divergence by multiple generations of hybridisation, and then subjected the resulting fusion population to ecologically driven fission under two contrasting gene flow regimes. By showing that racial divergence rapidly re-evolves in response to environmental bimodality despite gene flow, and by showing that the rate of this host-based fission is inversely related to the level of gene flow, we demonstrate, in two replicated experiments, that selection gene flow balance is the likely maintenance mechanism of racial diversity in A. ervi, and that host-based divergent natural selection can create this diversity in the first place.

The conclusion about adaptive nature of the observed genetic split is reinforced by our results indicating that this split could not have arisen by chance. The drift hypothesis is strongly rejected in our study by a series of tests showing that (i) different-host same-replicate hybrid colonies connected by gene flow diverged faster than isolated same-host populations from different replicates (Supplementary Figure S2 and Supplementary Table S4; (ii) the similarity between loci diverged in different replicates or different treatments is greater than expected by chance (Supplementary Table S5); (iii) the divergence is non-random in that the ancestral configuration of host-allele associations, initially disrupted by fusion, was restored by fission; and (iv) LD among loci affected by fission divergence is more inflated, in relation to LD at non-diverged loci, than could be explained by chance (Figure 3; Supplementary Table S6). The LD evidence was reinforced by the fact that it was observed both under restricted and unrestricted gene flow, ruling out any significant role of non-adaptive sources of LD. In addition to these findings, the rejection of drift was further supported by the pattern of Fst variance. In contrast to inbred racial stocks, in which divergence was mostly because of random drift, and where the observed Fst variance did not exceed Lewontin–Krakauer expectations, our fission populations showed a reverse pattern compatible with strong divergence at a small group of loci and lack of it elsewhere in the genome (δobs/δexp=2.025*10−2/7.716*10−3 and 2.519*10−2/1.165*10−2 in P × N and P × C restricted gene flow treatments, 2.172*10−3/1.480*10−4 and 1.813*10−3/1.450*10−6 in P × N and P × C unrestricted gene flow treatments). In principle, these deviations from neutrality can be a reflection of similar aberrations in parental races, but the use of inbred racial stocks excluded this possibility. Another non-adaptive source of deviation from Lewontin–Krakauer neutrality, the variance of type and rate of mutation among loci and genomic regions (Storz, 2005), is also ruled out, because our experiment would need to last thousands of generations for any detectable effect of this variance to occur.

The reported here host-based genetic split is intriguing not only because it evolved from the point of no divergence, but also because it occurred in the face of continuous 6.7% (restricted gene flow treatment) to up to 50% (unrestricted gene flow treatment) hybridisation. That selection gene flow balance was indeed present is evident from the fact that between-host split was accompanied by within-host structure indicative of active hybridisation (Figure 1). This balance was also indicated by the negative relationship between the level of divergence and the level of hybridisation, as well by a similar negative relationship between the level of gene flow and the rate of LD (Figures 2 and 3; Supplementary Tables S3, S4, S6). The antagonism between selection and gene flow is further suggested by the consistently reduced mummy count under unrestricted gene flow (Supplementary Figure S4), that is, where hybridisation and migration load was expected to be higher than in parapatry (Bolnick and Nosil, 2007). In addition to these four lines of evidence, selection-gene flow interaction is indicated by the pattern of genetic polymorphism in different gene flow treatments. Divergent selection functions to deplete local genetic variation at affected adaptation genes and associated markers (Barton, 2000), whereas gene flow should function to counter this trend, with the level of genetic polymorphism determined by the balance between the two forces (Spieth, 1979; Ross and Keller, 1995). Therefore, if divergence in our experiment was a product of selection-gene flow balance, and if the strength of selection imposed by host was similar in different gene flow treatments within an experiment, diverged loci at the end of fission should be less polymorphic than the non-diverged, with this effect being more pronounced under restricted compared with unrestricted gene flow. As Table 1 shows, this expectation is beared out by observation.

Table 1 Average frequencies of the least common allele at diverged and non-diverged polymorphic loci under restricted and unrestricted gene flow conditions

Another surprising aspect of fission divergence was its speed. Such rapid response may be accounted for by the sheer strength of host-based selection. For example, selection estimates of s=0.77 and 0.84 were obtained for dominant and recessive colour type alleles in the Timema study (Nosil, 2009), and could be at least as high in our case given the extremely intimate relationships between parasitic wasp larvae and their hosts (Doutt et al., 1976). The fact that locally deleterious recessive alleles should be strongly exposed to selection in A. ervi haploid males could be another contributing factor, although the role of this factor in adaptive evolution is not straightforward (Carriere, 2003). Adaptive divergence could also be accelerated by hybridisation-induced inflation of genetic variation and LD. The increased variation was likely to provide improved opportunities for selection, whereas the elevated LD was likely to have a role because of a positive feedback loop between LD and selection (Kruuk et al., 1999). An additional momentum to local adaptation and adaptive divergence could have been given by the phenomenon of induced host fidelity known to exist in A. ervi and other parasitoid hymenopterans (Vet and Dicke, 1992). Female A. ervi often show learnt host preference after experiencing a contact with the host (Guerrieri et al., 1997; Daza-Bustamante et al., 2002; Powell et al., 2003; Takemoto et al., 2009). Such host-induced behaviour is bound to provide an adaptive ‘foothold’ to a population colonising a novel host, and could be partially responsible for the previously observed rapid phenotypic divergence of heritable virulence in A. ervi (Cameron et al., 1984; Henry et al., 2008), as well as for the rapid host-based differentiation in our study. Some of the listed factors are common in the wild, whereas others, especially the inflated LD and variation are not, making it hard to speculate about the likelihood of a similarly rapid host-based divergence in the wild. However, our results are in line with gene flow manipulation data on natural populations, in which phenotypic divergence response to gene flow reduction was apparent after just one generation of isolation (Riechert, 1993; Nosil, 2009). This may suggest that the rates of sympatric or parapatric genetic fission observed here are common rather than exceptional in the wild.

One more observation warrants discussion. Genetic split in response to environmental bimodality appeared to be more pronounced in P × N compared with P × C microcosms (Figures 1 and 3; Supplementary Figure S2 and Supplementary Tables S3–S5). Interestingly, this pattern matches the one observed in the wild: divergence between P- and C-associated populations in our experiment and divergence between natural P and C races is relatively weak compared with bean- and nettle-associated populations here and in the wild (Pennacchio and Tremblay, 1986; Pungerl, 1986; Atanassova et al., 1998). One possible reason for the P × N−P × C disparity is a difference in the level of divergent selection. Some tentative evidence of such difference is provided by the analysis of interaction between selection and drift. The rate of drift is governed by variance (Equation (1)), therefore the correlation between the level of fission divergence and the level of pre-fusion polymorphism is likely if selection is weak. Under strong selection, the link between divergence and polymorphism is likely to be disrupted because the probability of divergence at individual loci in this case should be biased in favour of loci involved in host adaptation. Therefore, if selection was stronger in P × N compared with P × C microcosms, and assuming similar levels of drift, we expect a significant association between divergence and polymorphism in P × C but not P × N experiment, which seems to be the case (Table 2). On the other hand, the relatively high level of P × N fission divergence can be due to existence in this racial pair of intrinsic genomic conditions promoting creation and maintenance of locally adaptive genotypic clusters in the face of gene flow via reduced recombination among adaptive loci (Noor et al., 2001; Feder et al., 2003; Butlin, 2005). Our data seem to be pointing in this direction. Strong fission divergence in P × N experiment was accompanied by elevated resistance to LD decay (Figure 3) and robust clear-cut partitioning of host-associated haplotypes (Figure 1). In contrast, weaker fission divergence between P- and C-associated populations was coupled with evidence of rapid LD decay and fuzzy PCA partitioning, both compatible with elevated recombination. To test whether the rate of adaptive gene flow-moderated divergence in our experiment was indeed affected by genomic architecture of adaptation, we are currently conducting comparative linkage mapping of host-based divergence and introgression that took place in the two fusion-fission experiments reported here.

Table 2 Interaction between the level of polymorphism at loci in inbred racial stocks before the fusion and divergence of these loci during restricted gene flow fission