Introduction

Allopatric speciation is traditionally assumed to be the predominant mode of speciation (Mayr, 1963), but the evidence for sympatric speciation has become increasingly strong in the last decade (for reviews see: Via, 2001; Dres and Mallet, 2002). The debate today is not whether sympatric speciation is possible; rather it is a question of its relative importance. The number of well-documented cases is still small (Dres and Mallet, 2002) and further case studies are required to evaluate the processes causing divergence (Jiggins and Mallet, 2000; Schluter, 2001; Via, 2001). In fact, these underlying mechanisms rather than the geographical modes per se are the point of controversy. In sympatric speciation models, gene flow becomes restricted due to disruptive selection (eg Kawecki, 1997; Dieckmann and Doebeli, 1999; Kondrashov and Kondrashov, 1999) whereas, in allopatric speciation models, reproductive isolation is a passive consequence of different selection regimes and/or genetic drift (Turelli et al, 2001; Presgraves et al, 2003). However, the geographical mode of speciation does not necessarily implicate a particular mechanism. Sympatric populations might also differentiate due to differential gene flow (Stam, 1983). In phytophagous insects, for example, allochronic isolation due to differences in host-plant phenology can lead to restricted gene flow in sympatry even in the absence of selection (Butlin, 1990). Furthermore, speciation can be a combination of sympatric or allopatric conditions. For instance, sympatric host-race formation in Rhagoletis pomonella is likely to have been facilitated by genetic polymorphisms that evolved during an allopatric stage (Feder et al, 2003). Allopatric speciation in turn might be completed in sympatry via reinforcement processes, that is selection promoting prezygotic isolation after secondary contact (eg Howard, 1993; Noor, 1999). During the speciation process, therefore, different geographic settings might often be involved, with several mechanisms contributing to differentiation.

In the present paper, we address the contribution geography plays in driving genetic divergence in the tephritid fly Tephritis conura and, based on geographic distributions, the relative importance of genetic drift and selection in the diversification process. Tephritis conura infest at least nine thistle species of the genus Cirsium (Cardueae) (Zwölfer, 1988; Romstöck-Völkl, 1997). These univoltine flies mate and oviposite exclusively on host plants in late spring, synchronised with flower head phenology (Romstöck and Arnold, 1987). Larvae feed on the seeds until pupating and adults emerge in July/August, hibernating in places which are not yet known. Seitz and Komma (1984) found that flies emerging from Cirsium oleraceum (L.) SCOP. (cabbage thistle) differed from flies emerging from Cirsium heterophyllum (L.) HILL. (melancholy thistle) in frequencies at the allozyme locus hexokinase, indicating the existence of host races (we will henceforth refer to flies emerging from C. oleraceum as oleraceum flies and those from C. heterophyllum as heterophyllum flies). Mitochondrial DNA sequence analysis suggests recent diversification. Both host races share the most common mtDNA haplotype, which is central in a star-like genealogy (Diegisser et al, 2006).

The study area of Seitz and Komma (1984) was situated in the Bavarian Fichtelgebirge where C. oleraceum and C. heterophyllum occur parapatrically. Outside the Fichtelgebirge the host plants are largely allopatrically distributed, but there also exist areas of true sympatry, for example in southern Scandinavia (Figure 1).

Figure 1
figure 1

Sampling scheme and host-plant distribution. Numbers refer to sample sites (details given in Appendix A). Pale-grey shaded=allopatric C. oleraceum distribution; dark-grey shaded=allopatric C. heterophyllum distribution; diagonally hatched=sympatric region; vertically hatched=parapatric region.

The combination of a young system and multiple geographical distributions makes T. conura a promising target for studies to evaluate the contribution that geography makes in driving reproductive isolation. In this study, we analyse genetic divergence of oleraceum and heterophyllum flies from sympatric, parapatric and allopatric regions using allozyme electrophoresis. Basically, we expect to find one of three possible patterns, which differ in their implications for the speciation process. Firstly, host races might differ more in allopatry than in sympatry/parapatry. Differentiation among host races in contact areas might decrease due to oviposition into the ‘wrong’ host plant and/or hybrid matings. Such a pattern would indicate that the original diversification among heterophyllum and oleraceum flies was a passive consequence of either genetic drift or selection in isolation but that diversification is unfinished. The reduced differentiation in contact areas (compared to that in allopatric regions) would provide information about the degree of ongoing gene flow and the stability of the system in sympatry.

Under the second scenario, differentiation among host races would be higher in contact areas than in allopatry. This pattern would be coherent with selection against hybridisiation and/or oviposition into the alternative host. In an experiment, we forced oleraceum and heterophyllum females to oviposite into flower heads of C. oleraceum. The heterophyllum females produced significantly fewer progeny than the oleraceum flies, suggesting that selection against wrong host plant choice exists (T Diegisser, unpublished data). As a consequence, selection should strengthen traits, which reduce maladaptive events in contact zones, for example host choice and/or mating preferences. If allozyme loci are linked to such traits, they should exhibit higher differentiation in contact areas than in allopatric regions.

Finally, the third conceivable pattern is a uniform level of host-related differentiation in all geographic settings. In that case, the pattern could be interpreted in two ways. One possibility (scenario 3a) is that both the process of ongoing gene flow (scenario 1) and selection against maladaptive hybridisation/wrong host choice (scenario 2) occur in contact areas but they counterbalance each other, thereby producing differentiation similar to that in allopatric regions. Alternatively, gene flow might be so strongly restricted that allele frequencies are not significantly affected by hybridisation events (if these occur at all), hence suggesting geographically stable, reproductively isolated host races (scenario 3b).

Materials and methods

Host-plant distribution and sampling

C. oleraceum occurs throughout a large part of Europe (Figure 1), from eastern France to Russia. Its southern limit of distribution is from northern Italy to the northern Balkans; in the North it can be found up to southern Scandinavia and the West Siberian Lowlands. C. heterophyllum is distributed from Scotland, Scandinavia and the Baltic States to Siberia, occuring sympatrically with C. oleraceum in southern Sweden and parts of Russia. In addition, C. heterophyllum can be found in mountain ranges of Europe, for example Alps, Carpathians, Ore Mountains and the Bavarian Fichtelgebirge (Figure 1). In these regions, the host plants occur parapatrically; C. heterophyllum at high altitudes, C. oleraceum at lower altitudes.

In the present study, a total of 40 T. conura populations from all geographic host-plant settings were analysed (19 C. heterophyllum sites; 21 C. oleraceum sites). We sampled flies from southern Sweden, where host plants occur in true sympatry and often syntopically (Figure 1; site codes 9–13=heterophyllum populations, 14–19=oleraceum populations). Parapatric sites were collected in the Bavarian Fichtelgebirge (site codes 29–34=oleraceum populations, 35–40=heterophyllum populations). From here, three sites were included from the area of contact where both host plants occurred syntopically (sites 32–34 and 35–37, respectively). Flies from allopatric regions were sampled in transects to allow testing for geographic variation. The C. heterophyllum transect (sites 1–8) was located in Middle/North Sweden, the C. oleraceum transect (sites 20–28) was laid through Germany (Figure 1). The minimum distance between parapatric/allopatric sites was 180 km, between sympatric/allopatric sites 250 km. Details about sample localities are given in Appendix A.

Sampling took place in summer 2002 and 2003. At each site, flower heads from several plants were collected and placed in darkened plastic bottles with a hole in the lid, which was covered with a glass bottle. Emerging flies were attracted by the light and trapped in the bottle. Adults were collected on a daily basis and stored at −80°C until genetic analyses were performed.

Allozyme electrophoresis

Allozyme analysis was conducted using cellulose acetate electrophoresis (Hebert and Beaton, 1989). Whenever possible we analysed 24 individuals per site (12 males and females, respectively), resulting in a total of 950 flies. Thirteen loci stained consistently in all individuals: aconitate hydratase (Acon, EC 4.2.1.3), arginine kinase (Ark, EC 2.7.3.3), fumarate hydratase (Fum, EC 4.2.1.2), glyceraldehyde-3-phosphate dehydrogenase (G3pdh, EC 1.2.1.12), glutamate-oxalacetate transferase (Got, EC 2.6.1.1), β-hydroxybutyrate dehydrogenase (Hbdh, EC 1.1.1.30), hexokinase (Hex, EC 2.7.1.1), isocitrate dehydrogenase (Idh, EC 1.1.1.42), 6-phosphogluconate dehydrogenase (6Pgdh, EC 1.1.1.44), peptidase A (Gly-Leu) (PepA, EC 3.4.11.11), peptidase D (Phe-Pro) (PepD, EC 3.4.13.9), phosphoglucomutase (Pgm, EC 5.4.2.2) and trehalase (Tre, EC 3.2.1.28). All enzymes were run at 250 V for 30 min using the buffer systems Tris-Citrate pH 8.2 (Richardson et al, 1986) for G3pdh, Hex, Idh, 6Pgdh, Tre and Tris-Glycine pH 8.5 (Hebert and Beaton, 1989) for Acon, Ark, Fum, Got, Hbdh, PepA, PepD and Pgm. The most common allele at each locus was given the arbitrary score 100; the other alleles were named according to their relative electrophoretic mobility. As the most common allele at the Hbdh locus stained exactly at the point of application, we assigned the score 100 to the next most frequent allele.

Data analysis

Allele frequencies were calculated for each population with G-stat (Siegismund, 1993), and used to build a maximum likelihood phenogram with the PHYLIP (Felsenstein, 1993) subprogram CONTML (parameter settings ‘global rearrangements’ and ‘randomise input order of species’). This phenogram visualises the genetic relationships among populations, thereby providing evidence for the pattern of differentiation: under scenario 1, we expect heterophyllum and oleraceum populations from contact areas to be less separated from each other than those of allopatric regions, while the opposite should apply to scenario 2. If populations in the phenogram cluster due to host-plant affiliation and unrelated to geography, this points to scenario 3. For quantification of population differentiation we applied hierarchical F-statistics, using the method of Weir and Cockerham (1984) as implemented in Arlequin 2.0 (Schneider et al, 2000). The total interpopulation variance, FST, was partitioned into variance, which can be assigned to host affiliation (FHT) and host-independent variance (FSH). This quantifies to which extent the observed population structure is caused by host-race differentiation. The absolute (FHT) and relative (FHT/FST) degree of host-related differentiation was calculated for all populations and for each geographic setting separately. We used genetic identities (Nei, 1972) to estimate whether host-race differentiation is lower (higher) among heterophyllum and oleraceum populations in contact (sympatry/parapatry) than among heterophyllum and oleraceum populations that are geographically isolated from each other.

To exclude the possibility that geographic separation of populations may bias the genetic identity between allopatric vs parapatric/sympatric host-race populations, tests for isolation by distance (IBD) were performed in the two allopatric transects by comparing pairwise FST estimates between populations (Slatkin, 1993). Significant IBD within transects may indicate that the level of host-related differentiation depends on the geographic locality of a given population. In contrast, in the absence of IBD populations can be pooled within each geographic setting to create an average between-host-race population identity estimate. Significance for IBD was tested with a Mantel test implemented in Genepop (Raymond and Rousset, 2000).

Although the above analyses might allow conclusions about the likelihood of scenarios 1–3, they cannot discriminate between scenario 3a (ongoing gene flow counterbalanced by selection against hybridisation/wrong host-plant choice) and scenario 3b (cessation of gene flow independent of the geographic setting). To discriminate between 3a and b, we applied tests for linkage-disequilibria. If gene flow in contact areas is ongoing (ie hybridisation and/or oviposition into the wrong host plant), this should result in linkage disequilibria at those loci showing strong host-related differentiation. Linkage disequilibrium was tested with exact tests supplied in GDA (Lewis and Zaykin, 2001), pooling sympatric/parapatric populations of each host plant for stronger statistical support (only alleles with frequencies ≥0.05 were included).

Results

Nine of the 13 analysed loci were polymorphic at the 95% level (Acon, Got, Hbdh, Hex, 6Pgdh, PepA, PepD, Pgm, Tre; allele frequencies tables are available on request). Five loci (Acon, Hex, 6Pgdh, PepA, PepD) revealed (highly) significant allele frequency differences among heterophyllum and oleraceum populations (Table 1). Particularly, hexokinase (Hex) and peptidase D (PepD) showed extremely divergent allele frequencies: the most common heterophyllum alleles Hex_100 and PepD_100 were rarely found in oleraceum populations where the alleles Hex_107 and PepD_92 were predominant. By contrast, the allele frequency differences among the different geographic settings within the same host were relatively small. Only two alleles, and in heterophyllum flies only, differed significantly in frequency among geographic settings (6Pgdh_100 and PepA_100; Anova: F=7.87, P=0.004 and F=5.82, P=0.013, respectively).

Table 1 Significant allele frequency differences between oleraceum (olera) and heterophyllum (hetero) populations over all sites and for each geographic setting (t-tests)

The degree of host-related allele frequency difference was quantified by hierarchical F-statistics. Over all loci, 91.97% of the interpopulation variance was assigned to host-plant affiliation (FHT/FST). Only five loci contributed to significant FHT-values (Table 2). Particularly important for population structure were the loci Hex and PepD (FST=0.707 and FST=0.735, respectively) where host affiliation explained over 98% of the interpopulation variance. Host-associated differentiation is reflected in the maximum likelihood phenogram (Figure 2). Populations clustered exclusively due to host-plant affiliation, whereas the geographic origin of populations was little important. The only visible geographical effect in the phenogram was Bavarian heterophyllum populations (site codes 35–40) that were separated from most of the Swedish heterophyllum populations.

Table 2 Hierarchical F-statistics over all loci and for single loci
Figure 2
figure 2

Phenogram of population relatedness. The phenogram builds two clearly distinct groups due to host-plant affiliation (heterophyllum populations are shaded). Site numbers refer to Figure 1.

IBD tests along Swedish heterophyllum and the German oleraceum transects revealed no significant correlations between genetic and geographic distances (28 contrasts P=0.884 and 36 contrasts P=0.212, respectively). Hence, geographic separation of populations within allopatric areas did not affect the comparisons of genetic identity between allopatric and parapatric/sympatric populations.

To assess whether and/or how geographic settings affect the degree of host-related differentiation we compared absolute F-values and FHT/FST-ratios of allopatric, sympatric and parapatric populations. Over all loci, there was virtually no difference between sympatric and allopatric regions, neither in the amount of differentiation (FST) nor in the percentage of host-affiliated variance (FST/FHT) (Table 3). However, the latter was clearly higher in the parapatric Bavarian populations (98.44 %), while FST-values were similar. Analysing the loci separately revealed that this held for almost all loci, which showed significant FHT-values (Table 3). In addition to hierarchical F-statistics, we used genetic identities (Nei, 1972) as a measure of host-related population differentiation, which could be tested for effects of geographic setting. Figure 3a shows the mean genetic identities between heterophyllum and oleraceum sites of different geographic settings, for example the Bavarian heterophyllum populations had a similar mean genetic identity to parapatric and allopatric oleraceum populations (Ī=0.847 and Ī=0.845, respectively). The mean genetic identities between sympatric/parapatric populations and populations of the same region affiliated to the alternative host did not differ significantly from those between sympatric/parapatric populations and allopatric populations of the alternative host, that is host-related differentiation was not higher in sympatry/parapatry than in allopatry (Ī=0.851 vs Ī=0.854; Z39=0.725, P=0.47). Corresponding tests were also performed for single loci that had showed significant FHT-values. There were no significant differences at 6Pgdh (Ī=0.966 vs Ī=0.976; Z39=0.930, P=0.35), PepA (Ī=0.912 vs Ī=0.893; Z39=−1.683, P=0.09), PepD (Ī=0.126 vs Ī=0.102; Z39=1.703, P=0.09) and Got (Ī=0.970 vs Ī=0.975; Z39=1.477, P=0.14). However, the host-related differentiation at Hex was significantly higher in sympatry/parapatry than between sympatric/parapatric and allopatric regions (Ī=0.242 vs Ī=0.307; Z39=−2.647, P=0.008). This difference was found in the parapatric Bavarian region as well as the sympatric Swedish region (Figure 3b).

Table 3 Interbiotype variance estimates for different geographic settings
Figure 3
figure 3

Mean genetic identities between heterophyllum and oleraceum flies of different geographic regions. Heterophyllum and oleraceum populations are represented by black and open circles, respectively. At the locus Hex, for example, the Bavarian oleraceum populations have a mean genetic identity of 0.229 with Bavarian heterophyllum populations, but of 0.290 with heterophyllum populations from the Swedish transect. (a) all loci; (b) Hexokinase; (c) Peptidase D.

In addition to the level of differentiation between geographic settings we assessed linkage disequilibria between significant host-race loci in contact areas. Linkage at these loci might be interpreted in terms of oviposition into wrong host plants and/or hybridisation among host races. After Bonferroni correction, no locus combination revealed significant deviations from expected multigenotype distributions. However, for oleraceum flies from contact areas the uncorrected P-value of the locus combination Hex/PepD was significant (P=0.01). By contrast, there was no evidence for linkage among Hex and PepD in allopatric oleraceum populations (P=0.82), nor in any geographic setting in heterophyllum flies (contact area: P=1.00; allopatric region: P=0.70). The reason for this pattern can be inferred from Table 4: while there is virtual no difference between observed and expected multilocus genotypes in heterophyllum flies, there is a significant excess of the multilocus genotype Hex_100/107-PepD_92/100 (ie F1 equivalents) in the oleraceum flies from contact areas.

Table 4 Observed and expected (in parenthesis) multigenotype distributions at the host-race characteristic loci Hex and PepD in contact areas (shown are only alleles with frequencies ≥0.05)

Discussion

Allozyme differentiation among heterophyllum and oleraceum populations was very strong, suggesting the existence of host races or even sibling species. Populations clustered exclusively by host-plant affiliation. The results correspond largely to scenario 3 (see Introduction) with geographic uniform estimates of differentiation between host races. This holds true for both host races in sympatric, parapatric and allopatric regions. These findings indicate that gene flow between heterophyllum and oleraceum flies is significantly restricted and that the system was established before the present geographic distribution arose. Hence, geography in its present state plays, at most, only a fine-tuning role in diversification (see below).

Unfinished lineage sorting where both host races share a central haplotype suggests a rapid diversification process in T. conura (Diegisser et al, 2006). Was the primary force promoting allozyme divergence among host races genetic drift (in allopatry) or some kind of divergent selection (independent of geography)? If genetic drift was a major cause for population differentiation, all loci should be affected to a similar extent (McPheron et al, 1988). Thus, uniform differences at all analysed loci would indicate that populations have become differentiated by genetic drift (Itami et al, 1998). By contrast, if the differences among loci are highly heterogeneous, this would suggest that selection is acting on those loci which are highly differentiated (Slatkin, 1987). Heterophyllum and oleraceum populations revealed significant FHT values at five of 13 loci, and only two of these 13 loci were highly differentiated, making the degree of host-related differentiation highly heterogeneous. Of the five host-related loci, the interpopulation variances at three loci Got, 6-Pgdh and PepA were low (F<0.036) and only significant in some but not all geographic settings. By contrast, the two loci Hexokinase and Peptidase D showed highly significant FHT values in all geographic regions and, what was particularly remarkable, host-related differentiation at these loci was about 20 times higher than at any other locus. This pattern clearly suggests that selection is acting on Hex and PepD, or that the allozyme loci themselves are selectively neutral but linked to loci experiencing selection.

Feder et al (1993, 1997) could show that allele frequencies previously found to differ between R. pomonella host races correlated with eclosion time. The latter affects host choice directly and therefore contributes to prezygotic isolation between host races. There exist analogous conditions in T. conura. In Bavaria, C. heterophyllum provides flower heads suitable for oviposition about 3 weeks earlier than C. oleraceum, and Komma (1990) found evidence that the spring reappearance of flies on their respective host is synchronised with host plant phenology. Theoretically, differentiated allozyme loci could be linked to loci determining timing of oviposition. Alternatively, there could be linkage to loci affecting host choice. In experiments, T. conura showed significant host preferences (Romstöck-Völkl, 1997). This can be explained in part by genetically determined colour preferences (Komma, 1990), but olfactory traits might in general also be important for host-plant recognition (Linn et al, 2003). Each of the above-mentioned traits facilitates prezygotic isolation because assortative mating arises as a by-product (Dres and Mallet, 2002). However, differentiated allozymes could also be linked to a second class of loci, namely those affecting performance. Host plants provide habitat and food for larvae. Adaptation to one host might result in lower fitness on the alternative host (Diehl and Bush, 1989). Egg-laying experiments suggest that such negative genetic correlations for larval performance exist in T. conura, as heterophyllum flies developed only very weakly in C. oleraceum flower heads (T Diegisser, unpublished data).

Although we cannot infer yet whether Hexokinase and Peptidase D are under direct selection or linked to loci under selection, and which trait selection is acting on, the highly heterogeneous allozyme differentiation indicates that reproductive isolation between host races might primarily result from adaptive divergence. The question is whether reproductive isolation has evolved merely as a by-product of divergent selection (either in allopatry or in sympatry) or whether it has been driven by disruptive selection in contact areas, that is selection against intermediates. In the present study, we addressed this issue by comparing host differentiation in sympatry, parapatry and allopatry. Less differentiation in contact areas would have indicated that gene flow partly homogenises host-related divergence (scenario 1), while higher differentiation in contact areas (scenario 2) could have been interpreted in terms of selection against maladaptive events (ie hybridisation and/or wrong host choice). The latter process would be analogous to reinforcement (‘natural selection strengthening sexual isolation in response to maladaptive hybridisation following secondary contact of two taxa’ (Noor, 1999)), except that selection might act not only against hybridisation but also against oviposition into wrong hosts.

Neither of the above scenarios was clearly supported by the allozyme differentiation, which was mostly uniform among geographic settings. However, a uniform pattern might arise in contact areas in the presence of gene flow if it is counterbalanced by selection against maladaptive hybridisation/wrong host choice (scenario 3a). An indication for at least some ongoing gene flow was provided by linkage disequilibria among Hexokinase and Peptidase D in oleraceum flies. In contact areas, oleraceum flies had a small but significant excess of the heterozygote genotype Hex_100/107-PepD_92/100, which can be best explained by the existence of hybrid flies. Although double heterozygotes could results from intrahost-race matings, not a single fly (neither in heterophyllum nor in oleraceum populations) showed the most common double homozygote Hex/PepD-genotype of the alternative host race (see Table 4), hence the pattern suggest hybrids. There was no linkage disequilibrium in heterophyllum populations. Thus, gene flow among host races seems only to occur due mating between heterophyllum males and oleraceum females (which oviposite on C. oleraceum), while there is evidence neither for matings between oleraceum males and heterophyllum females, nor for oviposition into the wrong host plant.

The allozyme data also provide some, albeit weak, evidence for selection against maladaptive hybridisation/wrong host choice. The host-race divergent enzyme Hex was more divergent in contact areas but, generally, this differentiation was small compared to the uniform host-related divergence. The slight but significant difference may indicate that selection on Hex is in the final process of fine tuning host and/or mate preference. Although this process may counterbalance the effects of gene flow, the process itself does not depend on gene flow. Selection may continue against maladaptive traits even if there is no gene flow at all (Butlin, 1987). For example, even though there seems to be no gene flow via oviposition into the wrong host, this does not mean that such oviposition does not occur: if postzygotic isolation already has developed so strongly that oviposition into the wrong host plant is fatal to the larvae, no gene flow occurs but selection might still strengthening oviposition preference because it is maladaptive. If the (weak) reinforcement pattern is true for Hex, it may well be selection against habitat choice that is important.

The geographic pattern of differentiation suggests that T. conura host races are stable and that the major diversification process took place before today's geographical settings were established. However, although the allozyme data provide some evidence for ongoing gene flow and selection against maladaptive events, today both play only a fine-tuning role in diversification between the two host races that are largely reproductively isolated.