Introduction

Differential adaptation to natural or sexual environments can drive phenotypic divergence of populations, sometimes without any geographic isolation (Via, 2001, 2002; Kirkpatrick and Ravigne, 2002; Gavrilets, 2003). Although central to the models of sympatric speciation (Johnson and Gullberg, 1998), this concept is also essential for determining the effect on phenotypic differentiation when allopatric populations come into secondary contact. However, the extent to which natural selection and the adaptive responses of organisms can override the effects of gene flow and maintain species distinctness in spite of ongoing gene flow is far from clarified (Ehrlich and Raven, 1969; Schluter, 2001; Doebeli et al., 2005).

In the absence of selection, genetic drift between allopatric populations drives divergence of the whole nuclear genome at the same rate. When gene flow is not completely interrupted (incomplete reproductive isolation), it affects all genes to the same degree. Natural selection, on the other hand, increases the rate of divergence for loci under selection and acts against gene flow at those loci more than at neutral loci. As a result, the genes under selection (for example, those involved in local adaptation) may show a greater level of differentiation than the rest of the genome (Wu, 2001). A FST-based survey of genetically mapped loci might thus highlight the different genomic regions that are subject to strong disruptive selection or assortative mating (Beaumont, 2005) and help quantify levels of adaptive differentiation among populations and recently derived species. For example, in Drosophila pseudoobscura and some close relatives, some nuclear molecular markers vary greatly in their level of differentiation (Wang et al., 1997), an observation that is not compatible with a model of simple allopatric speciation. This species complex thus appears to fit a speciation model in which species continue to exchange genes at some loci and not at others (Machado et al., 2002; Noor et al., 2007). Comparing levels of differentiation between neutral markers and quantitative traits thus provides insight into the role of selection in maintaining phenotypic variation and ultimately into speciation mechanisms (Lande, 1992; Merilä and Crnokrak, 2001; Reed and Frankham, 2001; McKay and Latta, 2002). This approach can be used to test a null hypothesis of no influence of selection by determining whether the level of trait divergence attributable to additive genetic effects (as measured by QST) equals the level of divergence at neutral loci (as measured by FST) (Spitze, 1993; Steinger et al., 2002; Edmands and Harrison, 2003; Sanou et al., 2005; Volis et al., 2005). Although trait divergence should be measured as the among-population proportion of additive genetic variance (QST, Leinonen et al., 2008), the total phenotypic variance is often used as a surrogate when environmental and non-additive genetic effects cannot be separated. This index of phenotypic divergence is called PST (for ‘phenotypic QST’ or ‘pseudo-QST’, Saether et al., 2007). PST is particularly useful for organisms for which a quantitative genetics design is not possible, as long as the underlying assumptions are understood and discussed (Merilä et al., 1997; Storz, 2002; Saint-Laurent et al., 2003; Raeymaekers et al., 2007).

The large white-headed gulls constitute a promising model to study whether selection can maintain species distinctness in spite of ongoing gene flow. Although interspecific genetic differentiation at neutral nuclear loci is surprisingly low (microsatellites FST between 5–10%; Crochet et al., 2003), several phenotypic characters exhibit clear-cut differences even between those species that are most similar in allozyme or microsatellite allele frequency. Genetic differentiation is also higher in the mitochondrial than in the nuclear genome, which has been explained by a combination of recent species origin and ongoing interspecific gene flow (Crochet et al., 2003). Indeed, interspecific hybrids between large gull species have frequently been recorded (Pierotti, 1987; Olsen and Larsson, 2004; Gay et al., 2007). There is thus imperfect reproductive isolation between large white-headed gull species, with ongoing interspecific gene flow counteracting the effect of genetic drift. The very different situation for phenotypic traits suggests that selection is important in generating and maintaining species-specific differences in this group.

The aim of this study was to determine whether evidences of selection could explain phenotypic differences between large gull species. We investigated differentiation in phenotypic traits and neutral markers within and between species in two zones of secondary contact (two replicates differing in their respective level of divergence) between Larus argentatus and Larus cachinnans in Europe and between Larus glaucescens and Larus occidentalis in North America. We used patterns of neutral genetic differentiation at nine microsatellite loci (FST) as an indicator of expected changes under neutral processes and compared it with phenotypic differentiation (PST) for a large number of traits (size, plumage melanism and coloration of bare parts). We expect evidence of selection only on a subset of traits because a large number of loci under selection in the genome would counteract the homogenizing action of gene flow on all neutral markers through genetic linkage.

Materials and methods

Study sites: two secondary contacts between large gulls

L. argentatus and L. cachinnans came into secondary contact in Western Europe at the end of the 20th century, following concomitant range expansion southwards (Neubauer et al., 2006) and westwards, respectively (Faber et al., 2001). Field observations (Panov and Monzikov, 1999), combined with a recent study based on molecular markers (Gay et al., 2007), demonstrated that hybridization and introgression occur across this secondary contact zone. Four populations were sampled along a transect spanning the range of both L. argentatus and L. cachinnans, including their zone of secondary contact (Figure 1a). The population sampled in northern Poland (Gdynia) represents pure L. argentatus only, whereas populations sampled in southern Poland (Tarnow) and Ukraine (Molochnyy Lyman) belong to L. cachinnans only. Both species occur in the population from central Poland (Włocławek). Samples were taken from adult gulls trapped on the nest during the breeding season from 1998 to 2004 and included blood stored in EDTA buffer, growing feather quills or muscle tissue preserved in ethanol (Table 1). These data are a subset of the data used by Gay et al. (2007) to estimate molecular and morphological patterns of introgression in this zone of secondary contact. Only the populations for which morphological data were available were included (four out of seven).

Figure 1
figure 1

Distribution range and sampling sites for two secondary contact zones between large gulls (Olsen and Larsson, 2004): (a) in Europe between herring gull (Larus argentatus, in light grey) and Caspian gull (Larus cachinnans, in dark grey); N: northern Poland (Gdynia); C: central Poland (Włocławek); S: southern Poland (Tarnow); U: Ukraine (Molochnyy Lyman). (b) In North America between glaucous-winged gull (Larus glaucescens, in light grey) and western gull (Larus occidentalis, in dark grey). Sampled populations were numbered from 1 to 19; 1: Aleutian Island (AL, USA); 2: Kachemak Bay (AL, USA); 5: Vancouver Island North (BC, Canada); 6: Vancouver Island South (BC, Canada); 8: Tatoosh Island (WA, USA); 10: Destruction Island (WA, USA); 11: Grays Harbor, Goose Island (WA, USA); 14: Baker Bay at Columbia River mouth Clatsop, East Sand Island (OR, USA); 16: Yaquina Bay Lincoln (OR, USA); 17: Coos Bay near Coos head (OR, USA); 18: Sugarloaf Rock at Cape Mendocino Humboldt (CA, USA); 19: San Francisco Bay, Alcatraz Island (CA, USA).

Table 1 Summary statistics of molecular markers and phenotypic variables studied in 4 populations across the zone of secondary contact between Larus argentatu s and Larus cachinnan s and 12 populations across the zone of secondary contact between Larus glaucescen s and Larus occidentalis

The zone of secondary contact between L. glaucescens and L. occidentalis lies along costal Oregon and Washington, at the southern range limit of L. glaucescens and the northern range limit of L. occidentalis. Hybridization has been reported between these taxa since the beginning of the twentieth century. Previous studies suggested weak or no apparent barriers to gene flow between the two taxa within the 180-km long hybridization area (Bell, 1996, 1997; Good et al., 2000). Samples and morphological measurements were collected from 1985 to 1990 during the breeding season in coastal Alaska, British Columbia, Washington, Oregon and California (see Bell, 1996 for sampling details). All adult specimens were collected on nests or from the sea or land points near colonies. They were measured for qualitative and quantitative morphological traits in all 12 populations (Figure 1b). Tissue samples, including liver, heart, kidney or pectoral muscle, were frozen in liquid nitrogen and stored at −70°. Tissue samples for DNA analysis were later transferred to ethanol (Table 1). All individuals were sexed by looking at the gonads and measured by the same observer (DAB).

Morphological measurements

In Europe, 20 quantitative or ordinal variables characterizing morphometry, plumage melanism and coloration of bare parts were measured on adults in the four populations. Seven morphometric variables were measured with a calliper to the nearest millimetre, namely tarsus, toe, wing, head and bill length, bill depth at gonys and minimum bill depth. We characterized plumage melanism using the pattern of the outer and the central primaries. The outer primaries were described by four quantitative variables (length of black and of white on the 9th and 10th primary) and one ordinal variable (colour of the tongue on the inner web of the 10th primary). The central primaries were described by three quantitative variables (number of primaries with black, length of black on the 7th and 8th primary). The coloration of bare parts was characterized by five ordinal variables describing the leg colour (tarsus, toe and web colour), the orbital ring colour and the amount of dark spots in the iris. More details about morphological measurements can be found in Gay et al. (2007). Repeatability was significant and high for all traits, from 75 to 100% (Lessells and Boag, 1987). We summarized these highly correlated variables using multivariate analyses (PCA) on size, plumage melanism (separated into outer and central primaries) and leg colour in ADE-4 (Thioulouse et al., 1997) (Table 2). We used orbital ring colour and the amount of dark spots in the iris without transformation.

Table 2 Variables and number of individuals (n) included in the PCAs performed for morphometry (size and shape), plumage melanism (on mantle and wing tip) and coloration of bare parts in the two transects studied (a) in Europe between L. argentatus and L. cachinnans and (b) in North America between L. glaucescens and L. occidentalis

In North America, 14 morphometric variables were measured with a calliper to the nearest millimetre, namely length of wing, tail, 8th, 9th and 10th primary feathers, tarsus, middle toe, head, culmen and bill, bill width, bill depth at gonys, bill depth at nares (anterior border) and bill depth at nares (posterior border). To characterize the plumage melanism, we measured the coloration of the mantle and the dark elements of primary tips on wing preparations, and studied skins with a Munsell 37-step neutral value scale (Munsell, 1971). Five traits were measured to describe the coloration of bare parts, namely coloration of bill, iris, dark iris spots and orbital ring (measured independently for each eye). Measurements were made on freshly collected specimens using hand-held Munsell colour charts (matte finish) from the Munsell Book of Colour (Munsell, 1976). Each Munsell colour is defined by three characters—hue, value and chroma—from which dominant wavelength, percentage of spectral reflectance or brightness and excitation purity can be obtained with the aid of conversion tables (Munsell, 1968) (see Bell, 1996 for additional details on morphological measurements). We thus obtained a total of 15 measurements for the coloration of bare parts for each bird. As for the European data set, we used PCAs to summarize variables on size, shape, plumage melanism, orbital ring, bill, iris and dark iris spots colour. Table 2 recapitulates the correlated variables we grouped and summarized by PCA to form synthetic variables.

Microsatellite analysis

Total genomic DNA from blood samples was extracted using the DNeasy Tissue Kit (Qiagen, Courtaboenf, France) following the recommended procedure. Feathers and liver, heart, kidney or pectoral muscle tissues were digested in 10% Chelex 100 (Bio-Rad, Hercules, CA, USA) with 5 μl of proteinase K followed by two 15-min boiling steps following the procedure described by Walsh et al. (1991).

Nine microsatellite loci (HG27, HG25, HG18, HG14, HG16, K31, K32, K67 and K71) were amplified on 178 individuals from the four populations along the European transect, and 273 individuals from 12 populations along the west coast of North America. Primer sequences and amplification protocols were described elsewhere (Gay et al., 2007). Allele frequencies at the nine microsatellite loci for the populations of the two transects are available in Supplementary Material 1.

We performed probability tests for deviations from the Hardy–Weinberg equilibrium at each microsatellite locus (the results for each locus are presented as Supplementary Material). We estimated the genetic differentiation (FST) between the most distant populations for each transect (northern Poland and Ukraine in Europe; Alaska populations 1 and 2 vs California populations 18 and 19 in North America—referred to as ‘allopatric FST’) for each of the nine microsatellite loci and overall using the software GENETIX (Belkhir et al., 1998). Pairwise FST values along the two transects were estimated by the parameter θ (Weir and Cockerham, 1984) using GENETIX. We estimated the 95% confidence interval for FST by bootstrapping over loci. This gives us an estimate of the sampling variance on neutral differentiation. We also estimated the mean number of alleles per locus to ascertain that differentiation did not decrease artificially with the polymorphism of a given locus (Beaumont, 2005).

To check if our nine microsatellite loci provide an accurate assessment of the neutral drift variance among loci, we identified loci potentially affected by selection (outliers) by applying the method of Beaumont and Nichols (1996). The program FDIST2 (http://www.rubic.rdg.ac.uk/~mab/software.html) was used to simulate the expected relationship between heterozygosity and FST for the observed average FST and assuming selective neutrality. We defined 30 demes and ran 50 000 realizations with a stepwise mutation model. The relationships between observed per locus heterozygosity and FST were compared to this expected distribution (Luikart et al., 2003).

Phenotypic differentiation

Any evidence for selection uncovered by comparing differentiation between phenotypic traits and molecular markers should be supported by a quantitative genetics design to confirm phenotypic expression. The quantitative genetic differentiation coefficient (QST) is defined as follows (Spitze, 1993):

where and are the additive genetic variances between and within populations. In practice, QST can be measured by quantifying the additive genetic components of variance within and among populations in randomized common garden experiments controlling for non-additive components (for example, using a half-sib design). However, for large white-headed gulls as well as many other organisms, quantitative genetics designs are not practical and there are no long-term pedigree data available (Kruuk, 2004). One solution is to use PST, the phenotypic analogue of QST (Merilä et al., 1997; Storz, 2002; Saint-Laurent et al., 2003; Leinonen et al., 2006; Raeymaekers et al., 2007; Saether et al., 2007)

PST can be considered as a proxy for QST if we assume that the contribution of environmental and non-additive genetic variance to the differences between populations is low. More generally, if and are the between-population and within-population observed phenotypic variances, and h2between and h2within are between-population and within-population additive genetic proportions of differences, respectively, then

To avoid artefacts owing to strong environmental effects, we estimated the phenotypic differentiation for a range of heritability values. The between-population additive genetic proportions of differences (h2between) were allowed to vary between 0 and 1, whereas the within-population proportion (h2within) equalled 1; 0.75; 0.5 or 0.25. These estimates taking into account the uncertainty in heritability are further called ‘pseudo-QST’. We estimated the phenotypic variances among populations by carrying out a two-factor analysis of variance (ANOVA) (sex and population) using the statistics software R (R-Development-Core-Team, 2004) on the seven synthetic variables (four PC1 scores for Europe (Table 2a): size, melanism of the outer and central primaries, PC2 for outer primaries, tarsus colour and raw data for orbital ring and iris spots; seven PC1 scores for North America (Table 2b): size, shape, plumage melanism, orbital ring, bill, iris and pigment colour). We used the mean square estimates to calculate the between-population ( ) and within-population () variances. We calculated PST and pseudo-QST for each morphological trait, both between the most distant allopatric populations (northern Poland and Ukraine in Europe; Alaska populations 1 and 2 vs California populations 18 and 19 in North America—referred to as ‘allopatric PST’) and between all pairwise combinations. The methods available to estimate variance in QST estimates (delta method, parametric or non-parametric bootstrap, jackknife, simulation or Bayesian analysis) consider a global QST, necessitate a large number of populations and assume an island model. This assumption is not valid in the case of a zone of secondary contact. Instead, we use the 95% confidence interval estimated on FST by bootstrapping over loci to test whether the estimated QST significantly differ from neutral differentiation.

Effect of plasticity on differentiation in allopatry?

In parallel, we investigated the role of environmental variance on phenotypic divergence. First, we examined whether the morphological traits studied varied at the intraspecific level between homo-specific allopatric populations. To detect and exclude individuals with mixed ancestry, microsatellite genotypes of individuals from all populations (four populations in Europe and 12 populations in North America) were assigned to either L. argentatus or L. cachinnans (in Europe), L. glaucescens or L. occidentalis (North America) using the Bayesian assignment method (Pritchard et al., 2000) implemented in STRUCTURE (http://pritch.bsd.uchicago.edu/software.html). Assignment analyses were performed using microsatellite data, as detailed by Gay et al. (2007). Following recommendations by Pritchard et al. (2000), individuals were assigned to their putative species, L. argentatus, L. cachinnans, L. occidentalis or L. glaucescens, if their assignation probability was higher than 90%. Others were classified as ‘intermediate’. For each transect, we formed two data files containing only ‘pure’ individuals (L. argentatus and L. cachinnans in Europe; L. glaucescens and L. occidentalis in North America) and performed ANOVAs (using SAS) on each of the seven synthetic morphological variables (PCA scores). Pairwise post hoc Tukey's tests were performed to check whether there were significant differences between populations (using SAS). Levels of significance were adjusted using sequential Bonferroni corrections (Rice, 1989).

As a second approach to investigate the role of environmental variance, we compared the mean interspecific differentiation of morphological traits in allopatry and sympatry using a paired t-test, where sympatric populations played the role of a common garden (same environmental conditions). If the phenotypic traits analyzed exhibited low plasticity, we expected similar levels of interspecific differentiation in allopatry and sympatry. This comparison was conducted only in the European transect because none of the North American populations had a sufficient number of individuals assigned as both parental species breeding in sympatry.

Evidence for selection on phenotypic traits: FSTPST comparison

If the dominant force driving population divergence is genetic drift, the rate of divergence for both microsatellites and quantitative traits is expected to be similar, and the slope of the regression of QST against FST should equal 1. Regression analysis was carried out on pairwise genetic and morphological differentiation estimates for each synthetic variable (PCA scores) with the statistics software SAS version 8 (SAS Institute, 2000). Because a matrix of pairwise comparisons is by construction not independent, we selected a subset of independent values to perform the regression test (using every second value in the diagonal, that is n/2 values for n populations). We tested whether the estimated slopes were significantly higher or smaller than 1 (FST=PST) using a Z-test (Lebreton et al., 1992). This test could only be performed for the North American transect because the European transect did not include enough populations.

Along the North American transect, we tested the effect of latitude on plumage melanism using a generalized linear model in R (R Development Core Team, 2004) with a γ-link function to homogenize the residuals, using the PCA score for the plumage melanism as the dependent variable and latitude, sex and multiple correspondence analysis score on microsatellite genotypes as explanatory variables (see Gay et al., 2007).

Results

European transect

In the European contact zone, we observed significant deviations from the Hardy–Weinberg equilibrium in all four populations when considering all loci together (Table 1). Looking at each locus separately (see Supplementary Material), the deficit of heterozygote was significant for many loci in the sympatric population (central Poland), as expected after admixture of two differentiated gene pools, but not in northern Poland or southern Poland. In Ukraine, only HG14 locus presented a deficit of heterozygote. Overall, heterozygote deficiency should have a limited effect on the estimates of interpopulation differentiation, as FST is especially intended to separate deviations from the Hardy–Weinberg equilibrium owing to geographical structuring, as opposed to other causes. The original data for the nine microsatellite loci are available online as Supplementary Material. The genetic differentiation between L. argentatus and L. cachinnans allopatric populations (northern Poland vs Ukraine, Table 3a) was significant but low (FST=12.5% (0.07–0.17; P<10−3)). To identify loci deviating from equilibrium (potentially under selection), we plotted the expected relationship between heterozygosity at each locus and locus-specific FST (obtained by simulations) as well as the 95% distribution range and the observed heterozygosity and FST values for each of the nine loci (Figure 2a). One single locus (K32) fell outside the 0.95 quantile, but it was no longer significant after the Bonferroni correction (Rice, 1989).

Table 3 Pairwise FST values based on the nine microsatellite loci for the two transects
Figure 2
figure 2

Overall FST values estimated from nine microsatellite loci (a) between L. argentatus and L. cachinnans and (b) between L. glaucescens and L. occidentalis plotted against heterozygosity. Lower, middle and upper lines denote the 2.5, 50, and 97.5 percentiles, respectively, of simulation-based expected distributions assuming an infinite allele mutation model (Beaumont and Nichols, 1996). The simulations were based on FST representing the average for all microsatellites included.

Phenotypic differentiation in allopatry for central primaries was high and always exceeded genetic differentiation in neutral markers, even assuming very low levels of additive genetic variance (Figure 3a). In contrast, differentiation of leg colour was always lower. The results for the outer primaries (PC2), iris pigmentation and orbital ring colour were more equivocal: these traits showed high levels of differentiation, but only when assuming that the environmental component of phenotypic variance between populations was small (high between-population additive genetic proportions of differences h2between) (Table 4). As we do not have any data on the heritability of these traits in these populations, it is not possible to compare FST with PST for these traits. Nevertheless, post hoc Tukey tests on PC1 scores (ANOVA with sex effect) suggest that the variance due to broad-scale environmental differences is limited: only the outer primaries (PC1 only) are significantly variable in L. cachinnans (between southern Poland and Ukraine, P=0.004) but not in L. argentatus. We also found no significant difference in primary differentiation between L. argentatus and L. cachinnans in sympatry (central Poland) vs allopatry (northern Poland–Ukraine (Student's test, t=−0.0232, P=0.982); however, the central primaries and outer primaries (PC2) did show slightly lower differentiation in sympatry than in allopatry.

Figure 3
figure 3

Variation of pseudo-QST with the additive genetic proportion of differences between- (x-axis) and within-populations (four different values: 1; 0.75; 0.5 and 0.25). Pseudo-QST was estimated for each morphological trait between allopatric populations of (a) L. argentatus and L. cachinnans (northern Poland vs Ukraine) and (b) L. glaucescens and L. occidentalis (populations 1 and 2 vs 18 and 19). Estimate of neutral differentiation (average structuration index across nine loci) is represented by the black line±1 s.d. (dotted lines).

Table 4 Phenotypic differentiation in allopatry (PST, here assuming that the traits are fully heritable) for all synthetic traits summarized by PCA (see Table 2) for the two transects studied

North American transect

We observed a significant deficit of heterozygotes in only two populations (6 and 10) close to the centre of the hybrid zone (Table 1), which might be the consequence of sympatry. Most loci showed no or nonsignificant deviations to the Hardy–Weinberg equilibrium (see Supplementary Material 2). In addition, the original data for the nine microsatellite loci is available online as Supplementary Material. Genetic differentiation between allopatric L. glaucescens and L. occidentalis was high (FST=25.3% (0.162–0.323)), but highly variable depending on the locus considered (variance=0.023 vs 0.006 in Europe for the same loci). Although most loci exhibited rather high levels of differentiation (around 20–30%), one single locus showed greater structure (K71: FST=44.1%) and three other loci were particularly poorly differentiated (<10%). We did not find any significant decrease in the differentiation of a locus with increasing polymorphism, which excludes a Hedrick effect (Hedrick, 1999). Moreover, the locus K71 was identified as an outlier using the method of Beaumont and Nichols (1996), as illustrated in Figure 2b. The high differentiation at this locus resulted from a steep cline in the frequency of allele 150, one of the rare alleles that reached fixation in L. glaucescens populations while its frequency in L. occidentalis remained low.

Both plumage melanism and orbital ring colour showed remarkably high differentiation compared with neutral markers, even assuming very low levels of additive genetic variance (Figure 3b and Table 4). On the contrary, differentiation in the iris spots colour was always very low. In addition, the results for some traits were equivocal. The differentiation for iris colour, in particular, was high, but only assuming high between-population additive genetic proportions of differences (h2between). The results of the ANOVA test for intraspecific differences in phenotypes between populations strengthen the presumption of low plasticity for iris colour, as there was no significant difference between populations of either L. occidentalis or L. glaucescens. Overall, we found very few significant differences at the intraspecific level. Only two samples of very small size (<4 individuals) yielded significant differences in orbital ring colour and plumage melanism in L. occidentalis, and iris pigment and plumage melanism in L. glaucescens.

Pairwise estimates of neutral genetic differentiation (pairwise FST) explained a significant proportion of the pairwise differentiation (PST) for shape (R2=72.02%; P=0.033), plumage melanism (R2=69.53%; P=0.039), orbital ring (R2=78.34%; P=0.019) and iris colour (R2=94.13%; P=0.001) (Figure 4). The slope of the PST=f(FST) regression is significantly higher than one (PST>FST) for orbital ring colour (P=0.005), iris colour (P=0.0001) and plumage melanism (P=0.015). These latter slopes are closer to 2 or 3, indicating that these traits are two or three times more differentiated than neutral markers. For the colour of the iris pigment, the relationship is reversed (PST<FST) with a slope significantly less than 1 (P<0.0001). It should be noted that these results are based on PST estimates rather than on pseudo-QST. However, seeing that pseudo-QST estimates were relatively insensible to variations in heritabilities for orbital ring colour, plumage melanism and colour of the iris pigment (Figure 3), our conclusions are robust.

Figure 4
figure 4

FSTPST correlation for (a) plumage melanism, (b) orbital ring colour, (c) bill colour, (d) iris colour, (e) size, (f) shape and (g) iris spots colour between L. glaucescens and L. occidentalis for all pairwise comparisons between populations (12 populations), assuming complete heritability (h2between=h2within=1). Black dots stand for independent pairwise comparisons, used to test the significance of the correlation and to estimate the slope of the regression. The black line is the regression curve and the dotted line marks the y=x line.

Both latitude and genotype at neutral markers (summarized by the multiple correspondence analysis score) as well as their interaction had a significant effect on the plumage melanism (PCA score). This model explained 75% of the variability in plumage melanism among which 17% was explained by latitude, darker plumage being found at lower latitudes.

Discussion

Does morphological differentiation exceed neutral genetic differentiation?

Comparing the genetic differentiation in phenotypic traits and neutral molecular markers provides insights into the relative role of drift and disruptive selection in phenotypic divergence (Merilä and Crnokrak, 2001; McKay and Latta, 2002). In long-lived organisms such as large gulls, it is difficult to have access to the genetic component of phenotypic differentiation unless long-term data are available (Kruuk, 2004). Nevertheless, even assuming a very low heritability, the interspecific divergence between L. glaucescens and L. occidentalis in plumage melanism and orbital ring colour clearly exceeds neutral differentiation. Similarly, the melanism of the central primaries is highly divergent between L. argentatus and L. cachinnans. Such a difference cannot arise randomly and must be attributed to spatially varying selection.

How accurately does FST represent neutral differentiation?

To compare FST with phenotypic differentiation, it is necessary to ensure that the estimate of neutral genetic variance (FST on microsatellites) conforms to the expectations of neutral divergence. This is particularly relevant for the North American transect, where we observed a surprisingly high variance in differentiation estimates among microsatellite loci. First, size constraints (homoplasy) could explain this heterogeneity in single locus differentiation. Homoplasy may be most problematic when mutation rates are high, population sizes are large and when there are strong allele size constraints (Estoup et al., 2002; Adams et al., 2004). However, to create such heterogeneity in differentiation, homoplasy should affect certain loci more than others, for example if the mutation rate varies substantially between different loci.

Second, high levels of within-population variation, as frequently observed for microsatellites, could place an upper limit on the maximal value of FST (Hedrick, 1999), known as the ‘Hedrick effect’ (O’Reilly et al., 2004; Larsson et al., 2007). However, this effect occurs primarily when the distribution of genetic variation is largely determined by mutation rather than migration. Dispersal has been shown to be substantial in large gulls: in L. argentatus, 35% of birds initiated breeding outside their natal colony (Coulson, 1991) and dispersal distances up to 250 km have been documented (Cramps and Simmons, 1983). Moreover, we did not find any clear trend of decreased differentiation with the polymorphism of a locus in either transect. This suggests that the Hedrick effect may not be a major factor here.

Third, variation in the level of differentiation across loci could simply result from sampling variance caused by the finite sample of individuals and alleles (Whitlock and MacCauley, 1999). However, Whitlock and MacCauley (1999) estimated that the sampling variance in FST decreases with the real FST value (and their work focused at the intraspecific scale, with FST estimates close to 1%). We thus expect sampling variance to have only a limited effect on differentiation estimates between L. glaucescens and L. occidentalis. However, there is also some evidence that the variance in FST may be inflated in a secondary contact zone depending on the degree of gene flow (Bierne et al., 2003, but see also Murray and Hare, 2006). Without a wider sampling of neutral diversity, it is not possible to exclude sampling variance as an explanation for the high variance in FST estimates between loci observed here.

Finally, the fourth caveat in using differentiation on molecular markers as an indicator of expected changes under neutral processes is genetic hitchhiking (Maynard Smith and Haigh, 1974). In the L. glaucescens/L. occidentalis complex in North America, one single locus (K71) exceeded the mean level of differentiation and was identified as an outlier. We suspect that K71 could be linked to gene(s) undergoing divergent selection. In a secondary contact, we expect a stronger influence of hitchhiking in exacerbating the variance in differentiation because barriers to gene flow (any form of hybrid counter-selection) create linkage disequilibrium between the two populations in contact. The more genes under divergent selection (involved in the barrier to gene flow), the more likely the role of hitchhiking is. Hitchhiking with genes involved in the barrier to gene flow thus increases the variance in rates of introgression across the neutral genome, which further affects the variance in differentiation between loci. In the secondary contact between L. glaucescens and L. occidentalis, hitchhiking, enhanced by linkage disequilibrium, appears a reasonable explanation for the high variance in differentiation between microsatellite loci.

The high variance observed between loci in FST estimates emphasizes the need to use a large number of loci to get a decent estimation of neutral differentiation using molecular markers.

How accurately do our PST estimates reflect the partitioning of genetic variance?

Similar to the Hedrick effect on microsatellites, we could envision that within-population variability could set an upper limit to the differentiation of quantitative traits. However, this would not affect our conclusion as we found that QST estimates were higher than FST.

Ideally, FSTQST comparisons should be based on purely genetic components of phenotypic variation, removing the environmental component (Spitze, 1993; Podolsky and Holtsford, 1995; Petit et al., 2001; Edmands and Harrison, 2003; Palo et al., 2003; Gomez-Mestre and Tejedo, 2004; Knopp et al., 2007). Nevertheless, the ‘broad-sense’ QST estimates (Leinonen et al., 2008) could be inflated by unaccounted non-additive genetic effects (dominance and epistasis) or persistent maternal effects (see Merilä and Crnokrak (2001) for a more complete discussion on the assumptions of QST). However, maternal effects on plumage melanism are unreported in birds.

In this study, we did not have information on the genetic variance and were thus constrained to use the phenotypic variance between species (PST), an approach previously taken in a variety of organisms (Rogers and Harpending, 1983; Merilä, 1997; Grapputo et al., 1998; Storz, 2002; Saint-Laurent et al., 2003; Leinonen et al., 2006; Raeymaekers et al., 2007; Saether et al., 2007). The PST approach helps to detect selection, but assumes that both the environmental variance and the non-additive genetic variance between and within populations are small (Saether et al., 2007). To avoid overestimating population divergence if phenotypic variation is mainly a plastic response to the environment, we analyzed how much our PST estimates varied with the additive genetic proportion of differences between and within populations (using ‘pseudo-QST’). We are confident that the high levels of differentiation in plumage melanism and orbital ring colour between L. glaucescens and L. occidentalis in North America reflect disruptive selection. The melanism of the central primaries in L. argentatus and L. cachinnans in Europe is also very likely undergoing disruptive selection.

In general, it appears that comparing FST with QST—based on genetic variance—or PST—based on phenotypic variance—often yields similar patterns (Lynch et al., 1999; Schluter, 2000). There is also no evidence of any systematic bias in PST studies, as would be expected if phenotypic variation was mainly a plastic response to spatially varying environments (Leinonen et al., 2008).

We found no or low intraspecific phenotypic differentiation (genetic or environmental effects) among populations of each of the four species studied, which suggests that plasticity is not a major component of the between-population variance in the phenotypic traits investigated. The absence of decreased interspecific phenotypic differentiation in sympatry compared with allopatry in the European transect more directly suggests low environmental variance. Yet, some plumage melanism traits showed significant differences at the intraspecific level (in both transects) as well as a tendency of decreased interspecific differentiation in sympatry compared with allopatry (European transect). Plumage melanism was expected to be one of the least plastic traits investigated because its genetic determinism is well established (Theron et al., 2001; Majerus and Mundy, 2003). We suggest that this intraspecific clinal variation in plumage melanism in both transects is shaped by a selection gradient (see below). Overall, plasticity was not sufficient to explain the higher divergence in plumage melanism compared with neutral markers in the two transects studied. Larger sample sizes and ideally controlled breeding experiments are ultimately needed to assess the relative importance of genetically and environmentally based variation in trait values (Storz, 2002).

Nature of the selective pressures

The FSTPST comparison provided evidence that selection could have facilitated divergence in plumage melanism in Europe and North America and in orbital ring colour in North America. Disentangling the selective pressures is, however, a difficult question without a proper experimental design (McDonald et al., 2001) or long-term pedigree data (Przybylo et al., 2000; Merilä et al., 2001; Hosken and Balloux, 2002). Indirect approaches, such as regression models, can reveal some of the environmental factors (habitat, diet, and so on) potentially involved in the phenotypic divergence (Storz, 2002; Palo et al., 2003).

According to Gloger’s rule, sun and humidity could generate strong selective pressures for more resistant, darker melanin-rich feathers (Hayes, 2001) because sun and humidity promote bacterial degradation of feathers and damage birds’ plumage (Burtt and Ichida, 2004). Other hypotheses state that the damage of heat stress could be reduced in dark-coloured species (Ward et al., 2002). In gulls, melanin pigments are responsible for the intensity of light grey to black in the feathers of the mantle and wing tips. Species living at lower latitudes would thus be more likely to exhibit darker plumage than similar species living at northern latitudes. This is compatible with the observed pattern of darker plumage in southern L. occidentalis in North America and L. cachinnans in Europe vs their more northerly counterparts. Clinal variation of plumage traits observed at the intraspecific scale (Bell, 1997) also matches the selective gradient of sun and humidity. We found that latitude explains another 17% of variation in plumage melanism in addition to the effect of drift expressed as the variation in neutral markers. Melanin pigments are also deposited in the iris (dark spots). This trait measured in L. argentatus and L. cachinnans similarly formed a clear latitudinal gradient from L. argentatus to L. cachinnans. Both melanin-based traits thus support Gloger’s rule, in Europe as well as in North America.

The second trait for which we found evidence of selection is the carotenoid-based orbital ring colour in North America. The amount of carotenoid pigments available is known to vary with diet and body condition (Blount, 2004), which could generate high within-population phenotypic variance. However, we found no significant difference between populations for either species. It was shown in a closely related species (Larus marinus) that the intensity of carotenoid coloration of the bare parts is an honest signal of individual quality (Kristiansen et al., 2006) and that the coloration of bare parts, especially bill and foot coloration, may be involved in mate choice in seabirds (Pierotti, 1987). Bell (1997) found significant assortative mating between gulls in mixed colonies of L. glacucescens, L. occidentalis and their hybrids. We suggest that the orbital ring colour could have diverged through sexual selection, if females of each taxa exhibit different preferences for their partner's orbital ring colour. In sympatry, such disruptive selection resulting from assortative mating could maintain the differentiation despite interspecific gene flow.

Speciation with gene flow

As it is generally the case for most birds, species in the large white-headed gull complex probably initially diverged in allopatry (Liebers et al., 2004). A previous study suggested that the interspecific gene flow is involved in the low level of genetic differentiation between the recently diverged large white-headed gull species, based on comparisons of mitochondrial and nuclear markers as well as geographic arguments (Crochet et al., 2003). However, even very little gene flow between populations is sufficient to maintain genetic equanimity (Wright, 1931). Only selection can override the effects of gene flow on some traits and maintain species distinctness (Ehrlich and Raven, 1969). Thus, closely related taxa in which phenotypic cohesion and mate recognition are caused by a small number of loci are expected to share much of their genetic variation. This pattern has frequently been observed in birds despite highly structured phenotypic variability (Grapputo et al., 1998; Freeland and Boag, 1999; Podos, 2001; Clegg et al., 2002; Hosken and Balloux, 2002; Chan and Arcese, 2003). Our comparisons of differentiation levels in phenotypic traits and neutral markers confirmed that plumage melanism and orbital ring colour likely diverged through disruptive selection. Even if evidence of disruptive selection is not sufficient to conclude that there is speciation with gene flow, the absence of disruptive selection would have definitely ruled out this scenario. Our results thus substantiate a scenario of speciation with gene flow for the large white-headed gulls. More generally, some of the recently diverged species that hybridize when coming into secondary contact might actually only differ in restricted regions of the genome that, by virtue of their phenotypic effects, are undergoing strong disruptive selection and/or are subject to assortative mating. This emphasizes the benefit of multilocus approaches in a population genetics framework for studies of speciation, especially to assess the role of gene flow and natural selection in species divergence.