Introduction

Prezygotic barriers to reproduction between taxa appear to evolve in a rather capricious manner (Butlin and Tregenza, 1998). They may arise on a surprisingly short time-scale and cause rapid speciation, especially when divergence of the mating system is driven by natural selection (Schluter, 2001) or sexual conflict (Rice, 1998). However, it is also possible, at the other extreme, for prezygotic barriers between two taxa to remain incomplete despite ecological differentiation and the accumulation of genetic incompatibilities, as seen in many hybrid zones (Harrison, 1993). Knowledge of the nature and strength of prezygotic barriers to gene flow is key to understanding differentiation between two taxa, and yet methods of estimating the magnitude of prezygotic isolation in the wild are poorly developed. Here, we present two novel approaches to estimate the strength of assortative mating in natural hybrid populations from genotypic data at marker loci. We illustrate these methods with a data set from the Bombina hybrid zone in Croatia.

Depending on the biology of the particular hybridising taxa, it may be possible to infer levels of assortment from field observations of mated pairs (eg in insects: McLain, 1985; Hewitt et al, 1987; snails: Johannesson et al, 1995; Ribi and Oertli, 2000; and birds: Moore, 1987; Grant and Grant, 1992; Sætre et al, 1997), or from the data of one (usually the maternal) parent and its offspring (Hewitt et al, 1987; Mallet et al, 1998). However, these sampling regimes are often impractical, for example, in the case of sporadic reproduction throughout an extended breeding season and external fertilisation without parental care. Nonetheless, as we demonstrate in this paper, information about mating patterns can still be extracted from genotypes of field-collected offspring (eg egg batches) and from adults that are associated with a given breeding site. Our approach is akin to genetic methods of paternity assignment, but differs from these in that the populations under study may deviate from Hardy–Weinberg and/or linkage equilibrium. It also extends the study of assortative mating in the field to hybrid populations that contain a wide range of recombinants rather than only a few genotypic classes such as parentals, F1s and backcrosses (eg Rieseberg et al, 1995; Goodman et al, 1999).

Our methods are intermediate between two distinct approaches to the analysis of offspring genetic data. The first, which also require data on samples of adults, are the techniques for assigning parentage (reviewed by Jones and Ardren, 2003) either categorically (eg Sanchristobal and Chevalet, 1997; Marshall et al, 1998) or fractionally, for example, to infer the relative reproductive success of a particular subset of the adult population (Morgan and Conner, 2001; Nielsen et al, 2001). One can also use nongenetic (eg behavioural) data to estimate prior probabilities of paternity given known mother–offspring pairs (Neff et al, 2001), or in cases where neither parent is known (see also Marshall et al, 1998). The second approach involves estimating how many parents contributed to a given batch of offspring in the absence of an adult sample (Emery et al, 2001) and, more generally, clustering samples of independent offspring into sibships (Lynch and Ritland, 1999; Thomas and Hill, 2000; Smith et al, 2001).

The development of our approach was motivated by our work on the hybrid zone between the fire-bellied toads Bombina bombina and B. variegata. During about four million years of independent evolution, these taxa have undergone substantial ecological and molecular divergence (Nei's genetic distance: 0.37–0.59; Szymura, 1993). Nevertheless, a wide spectrum of fertile hybrids is found in typically narrow hybrid zones, wherever their parapatric ranges adjoin in Central and Eastern Europe. The hybrid zone is stabilised in part by endogenous selection against hybrids (Kruuk et al, 1999) and, presumably, also by selection against toads in the wrong habitat: B. bombina typically reproduce in semi-permanent ponds in the lowlands, whereas B. variegata lay their eggs in ephemeral water bodies at higher elevations. The mating systems of the pure types differ in a number of features, including the size and structure of male aggregations and the male call (Lörcher, 1969). Thus, in principle, the stage is set for prezygotic barriers to gene flow.

The differential use of breeding habitat carries over into the hybrid zone, where even adults of mixed ancestry show a habitat preference that correlates with their individual hybrid index, with more B. bombina-like hybrids being more likely to be found in ponds and B. variegata-like hybrids in puddles (MacCallum et al, 1998). This results in partial assortative mating by habitat. Its effect on the genetic structure of hybrid populations is seen most clearly in the transect near Pešćenica, Croatia, which has at its centre a mosaic distribution of ponds and puddles. Most toads are found in the sites that are predicted from their marker genotype, but individuals in the ‘wrong’ habitat inflate linkage disequilibrium (D) and heterozygote deficit (F) locally (Kruuk, 1997; MacCallum et al, 1998). But, judging by the difference in allele frequency between habitats, this effect can explain only part of the observed heterozygote deficit. The remainder may be due to nonrandom mating within sites (Kruuk, 1997; MacCallum et al, 1998). Here, we want to estimate the net contribution of the divergent mating systems of the pure taxa to prezygotic isolation in the Bombina hybrid zone. Laboratory experiments, for example, on female preferences for homotypic male calls, would complement our study and might reveal the traits that mediate any effect seen at the population level (Doherty and Gerhardt, 1984; see also Smajda et al, 2004).

Our approach to inferring the mating pattern starts with samples of field-collected egg batches consisting of full sibs. Given a sufficiently large offspring sample, we can accurately reconstruct the parental genotypes at a given diagnostic marker locus: the occurrence of only ‘bv’ and ‘vv’ genotypes in the offspring (where b stands for alleles of bombina and v for those of variegata origin) implies that one parent was a heterozygote and the other a B. variegata homozygote. Next, the parental genotypes inferred at each marker locus are listed across loci. For example, we might infer the joint parental genotype ((bb,bv),(bb,vv),(bv,bv)) for three loci. But, from these data, we cannot determine the multilocus genotype of either parent: in this example, the parents might be ((bb,bb,bv),(bv,vv,bv)) or ((bb,vv,bv),(bv,bb,bv)).

Under ideal conditions of exhaustive sampling of adults and high genotypic variance, there may be only one matching male–female pair to which parentage of the clutch can be assigned. Often, however, limited allelic diversity at the marker loci and/or unsampled adults will render this approach impractical (Marshall et al, 1998). Nevertheless, the adult sample does contain information that we can use to determine the composition of the population of mated pairs. Although we may not be able to determine the precise genotypes of the parents, or the full set of genotype frequencies in the parental population, we can estimate parameters such as the mean allele frequency, heterozygosity, and linkage disequilibria. We use two complementary methods. First, we estimate the multilocus genotype frequencies in the population of adults, under the simplifying assumption that it was formed by the mixing of two differentiated populations (Barton, 2000). We use this information to fit a model to the offspring data, which jointly specifies the most likely parental genotypes for each sampled clutch and the maximum likelihood degree of assortative mating for the whole sample. Second, we estimate the composition of the parental population solely from the offspring data, using an extension of the method set out by Barton (2000). This approach provides information about the degree of assortment from levels of correlations between the allelic states of genes in mother and father. Both approaches are outlined below (see Materials and methods) and presented in more detail in Barton (2000) and in Supplementary Appendix II. We apply these methods to a hybrid population from the Pešćenica transect in Croatia and demonstrate that they provide surprisingly robust estimates of assortment even for a relatively small data set. Mathematica (Wolfram, 1996) notebooks which implement these methods are available from www.helios.bto.ed.ac.uk/evolgen/.

Materials and methods

Field collections

This study concentrates on a hybrid population (site no. 103) in the Pešćenica transect, which is located 20 km south-east of Zagreb, Croatia (see MacCallum et al (1998) for a detailed description). In this area, the distance across the hybrid zone between pure B. bombina populations on arable flood plains and pure B. variegata populations in low, forested hills was approximately 10 km. Site 103 is located on the B. variegata side 1.34 km from the estimated centre of the cline (MacCallum, 1994). It is a large, shallow puddle (7.0 × 0.7 m2, depth: 0.1 m) that formed on compacted soil on a logging track in lowland forest, and has little aquatic vegetation. Egg clutches and adults were sampled during 10 visits between 29 April and 11 June 1995. Toads were anaesthetised in 0.2% MS222 (3-amino benzoic acid ethyl ester, Sigma) before a single toe was removed as a tissue sample and immediately transferred to liquid nitrogen. Each individual was given a code in which the first digit refers to the year of first capture (eg ‘2’=1992) and the next three digits represent the site code, followed by a running index (eg 3103.5 would be a toad first seen in 1993 at site 103, being the fifth individual in that year's sample from site 103). To maximise the number of families sampled and to minimise disturbance to the populations, egg collections were made on different dates and from batches several metres apart. Whenever possible, 12 eggs from each batch were taken, but some batches contained smaller offspring numbers, for example, after the removal of unfertilised eggs. Batches from which eggs had been removed were flagged to prevent resampling on subsequent visits.

Eggs were transported to the field laboratory, and tadpoles were reared to the age of 3 weeks before being anaesthetised and preserved in liquid nitrogen. This procedure was chosen to allow for both allozyme and PCR-based genotyping. Rearing was carried out at low density and with ad libitum food to minimise any confounding effects of selection.

Molecular analyses

Upon return to the laboratory, the samples were stored in a −60°C freezer prior to processing. DNA extraction followed standard protocols. Briefly, the tissue samples were digested overnight with Proteinase K. After centrifugation, the supernatant was extracted once with chloroform. DNA was precipitated in two volumes of ethanol, washed once in 70% ethanol, air-dried and then resuspended in ultrapure water (Merck). Stock solutions were stored at −20°C.

The genetic analysis was based on one microsatellite (Bv12.19), one simple length polymorphism (Bb6.14), and four SSCPs (Bv24.11, Bb3.34, Bb3.36, Bb7.4). For the present analysis, we categorise all alleles at these diagnostic, codominant markers as either b or v, and thus disregard intraspecific variation (1–3 alleles per taxon and locus). The loci are part of a larger set of mapped markers in the Bombina genome (Nürnberger et al, 2003). They are unlinked except for one locus pair (Bv24.11 and Bb3.34), which shows weak linkage (r=0.36), and one tightly linked pair (Bb7.4 and Bb6.14, r=0.06). We treat the former pair as unlinked and take linkage into account as required in the latter pair. Given the sensitivity of the statistical analysis to even single isolated errors, the entire data set was assembled twice from independent dilutions of the DNA stocks. Locus/sample combinations that failed to give consistent and unambiguous results even after additional amplifications were left blank.

PCR reactions were set up in a total volume of 30 μl with 50–100 ng template DNA, 50 mM KCl, 10 mM Tris (pH 9.0 at RT), dNTPs (0.2 mM per nucleotide), 10 pmol of each primer, and 0.5 U Taq polymerase (rTaq, Amersham Biosciences). MgCl2 concentrations varied among loci between 1.5 and 2.5 mM. Amplification was carried out on a Hybaid Touchdown thermocycler with oil overlay. After initial denaturation for 3 min at 94°C, the cycling profile was as follows: 15 s at 94°C, 30 s at x°C and 30 s at 72°C for 32–35 cycles, where x is the locus-specific annealing temperature (53–59°C).

The PCR products were separated on polyacrylamide gels and visualised with silver staining. Length polymorphisms were separated in vertical denaturing gels (acryl-bis ratio 19:1) in 1 × TBE buffer at constant voltage. The SSCPs were electrophoresed on native horizontal polyacrylamide gels (acryl-bis ratio 37.5:1) at constant low temperature and constant voltage for 3–4 h (MultiPhor gel rigs, Amersham Biosciences) with an electrode buffer of 2 × TBE and a gel buffer of 1 × Tris-acetate. The locus-specific parameters are available upon request from BN.

Statistical analysis

We apply two methods of inferring the mating pattern, both of which are based on correlations of allelic states within and between individuals or, more precisely, on the moments of the genotypic distribution. Within individuals, correlations between alleles either at a single locus (heterozygote deficit or excess) or at two loci inherited from the same parent (gametic disequilibrium) are examples of second-order moments. Higher-order moments represent correlations between alleles at more than two loci and may exist within and across the haploid (ie maternal and paternal) genomes of a diploid. When correlations arise from the mixing of two differentiated populations, all moments cj,k involving j loci from the maternal and k loci from the paternal genome are expected to be the same, when scaled relative to the differences in allele frequencies between the source populations. This assumption greatly reduces the number of parameters that need to be estimated (Barton, 2000).

We extend this approach to allelic correlations within mated pairs. They are of the form ci,j,k,l, where the subscripts indicate the number of loci in the female (i,j) and the male (k,l) of a pair. For example, terms of the type c0,0,0,2 represent the standard pairwise linkage disequilibrium and those of type c0,0,1,1 the heterozygote deficit (=Fpq, where F is the inbreeding coefficient and p and q are frequencies of the B. variegata and B. bombina alleles, resp.). Terms of the type c0,1,0,1 represent the covariance between the state of a gene in the mother and a gene in the father, or, in other words, the covariance between the proportion of B. variegata genes in each parent. Other moments give more complex associations between the two parents – for example, between the level of heterozygosity in one and of linkage disequilibrium in the other (such as c1,1,0,2). We do not attempt to estimate all these parameters: only a few lower order coefficients are of interest.

In our first approach, the aim is to find the likelihood of a model of mating patterns, given the observed offspring sample, and given an estimate of the composition of the adult population. Direct estimation of adult genotype frequencies from the data is not an option, because the number of possible genotypes is very large (729 for six biallelic loci), such that most of them are unobserved. Generally, the expected frequency of any multilocus marker genotype is a function of the allele frequencies and of the linkage disequilibria in the population. Note that, for a population in Hardy–Weinberg and linkage equilibrium, the computation is greatly simplified, as all correlations are zero and expected multilocus frequencies are just products of the allele frequencies. In the present context of a hybrid population, however, reliable estimates are only possible if disequilibria including those of higher order are taken into account, even though most of them will be small in magnitude. They can be obtained from a sample of adults at the breeding site, and, in the present study, were estimated by maximum likelihood, assuming a model of population admixture (Barton, 2000).

The second method infers correlations in allelic state in the parents from the moments in the sibships alone, that is, without recourse to the adult sample. While we remain ignorant about the genotypes of any particular parent pair, we can draw inferences about the correlation structure within parent genotypes and, critically, between mated adults. Consider the six possible mating types for a biallelic locus, whose genotypes we code by the number of B. variegata alleles (0, 1 or 2) in each parent. For example, a mating between a heterozygote and a B. variegata homozygote (denoted by (1,2)) produces ‘1’ and ‘2’ offspring in equal proportion (Figure 1). Note that we conflate (1,2) and (2,1) since these are indistinguishable. The variance within sibships informs us about the heterozygosity in the parents, since only heterozygotes can generate variance within families (horizontal axis in Figure 1). The heterozygosity in the offspring is inversely related to the correlation between parental scores (vertical axis).

Figure 1
figure 1

Correlations between offspring and parent genotypes. The six possible mating types at a biallelic locus are ranked in terms of the proportion of heterozygous offspring and the allelic variance within sibships. For example, two heterozygous parents (mating type (1,1)) produce all three possible genotypes among their offspring (→ 0,1,2). Boxed mating types behave identically. Note that nondistinguishable mating types (eg (1,2) and (2,1)) were conflated. See text for the numeric coding of genotypes.

More generally, we make use of the fact that the moments among offspring are given by a linear combination of the moments among genes in the parents. For example, if b and v alleles are coded as 0 and 1, respectively, the variance within families depends linearly on the variance in the population of parents, and on the covariance between parents. We are mainly interested in the parental heterozygosity (H) and pairwise linkage disequilibrium (D) and in correlations between the parents in the hybrid index (HI), which we define as the number of B. variegata alleles across all marker loci. These latter correlations provide estimates of assortment. A full account of the computations is provided in Supplementary Appendix II and at http://helios.bto.ed.ac.uk/evolgen/barton/index.html.

Results

The adult sample

The adult sample collected in 1995 from site 103 consisted of 23 toads. Most of these individuals were seen only once. However, some individuals were recaptured, and one male in particular was seen on most sampling dates (individual 2103.10). Genotypes are available for 20 adult individuals and gave a mean frequency of B. variegata alleles, , of 0.31. As it became clear that the pool of adults contributing to the egg clutches was much larger than the set of sampled adults (see below), we added the genotypes of another adult sample (n=8, =0.27) collected in 1995 at the nearest ecologically similar site (183). Site 183 is located at a distance of 800 m from site 103 (ie within the estimated annual adult dispersal range of maximally 1.5 km) and at the same distance from the centre of the hybrid zone as site 103 (MacCallum, 1994). In the following, we base our analyses on the pooled set of 28 genotyped adults (Table 1).

Table 1 Genotypes of the pooled adult sample (sites 103 and 183) at six molecular loci

The distribution of the HI, defined as the sum of B. variegata alleles across loci per individual, is bimodal. B. bombina-like individuals predominate. Eight of them have a HI of zero, whereas no individual has a pure B. variegata marker genotype (HI=12, Figure 2a). The number of heterozygous loci per individual has a bimodal distribution as well (Figure 2b). Here, the right mode indicates matings between opposite genotypes that are brought into spatial proximity in this mosaic hybrid zone and produce highly heterozygous offspring. The structure of the adult population is characterised by a strong deficit of heterozygotes (F=0.342, Table 2) and strong pairwise linkage disequilibria (standardised ). Owing to the statistical associations among loci, it is difficult to test for significant heterogeneity among pairwise estimates of R. On the whole, they are broadly similar and, surprisingly, even the tightly linked pair Bb6.14/Bb7.4 does not stand out from the rest. In all these measures, there are no appreciable differences between the sexes.

Figure 2
figure 2

Genetic composition of the adult sample. Shown is the distribution of (a) the hybrid index (=the sum of B. variegata alleles across loci per individual) and (b) the number of heterozygous loci per individual. In plot (a), the data have been binned in groups of two: (0,1), (2,3), etc. The poorly characterised individual 5103.5 has been excluded from these plots.

Table 2 Mean B. variegata allele frequencies (), heterozygote deficit (F) and standardised pairwise linkage disequilibria () for the adult sample

The egg sample

We genotyped 168 tadpoles from 18 batches of eggs (Supplementary Appendix I). As a first step, some checks on the data set are required. Given the oviposition habits of Bombina, two problems might arise: (a) an apparently homogeneous batch of eggs may consist of two separate sibships and (b) a given pair may distribute its offspring in several batches across the site. We investigate these possibilities at the appropriate stages of the analysis.

Especially in sparsely vegetated temporary habitat, it is possible that more than one female attaches its eggs to a given plant stem. This may not be apparent in the field, but can be inferred from the genotypic data (Vines and Barton, 2003). For unlinked loci, we expect no correlations between genotypes at two different loci within a family. Even for two linked loci, a correlation can only arise when at least one of the parents is heterozygous at both loci. However, mixing of families does generate correlations, which increase in strength the more disparate the genotypic compositions of the two families are. We use two measures to investigate mixing: the variance in the hybrid index (ie the sum of pairwise covariances across loci) and the squared covariance between loci, summed over all pairs of loci. The former quantity should reveal mixing of genotypically different families, whereas the latter can detect mixing even when the two families have a similar overall hybrid index. In order to obtain null distributions, the observed genotypes at each locus were randomised 1000 times among the offspring of a family. The tightly linked pair of loci was excluded.

Batches 5, 8 and 18 showed significant correlations based on the variance in the hybrid index (adjusted for multiple tests; Supplementary Appendix I). The squared covariance method identifies only batch 5 as mixed, because the single aberrant genotypes in batch 8 and 18 only generate rather small covariances. We conclude from these tests that the egg batches 5, 8 and 18 are in fact mixtures of different sibships. In the following, we omit the single outliers from batches 8 and 18 and split batch 5 into two (5a and b).

If mixing does occur, it may go unnoticed: in the extreme case, a mixing of families with the same parental genotypes would necessarily be undetectable. Following Vines and Barton (2003), we investigated the power to detect mixed batches by simulations of random mating, given the genotypic composition of the adult population (see below). Among 1000 simulated, randomly joined batches of 5+5 eggs, only a little more than 50% of them were detected as mixed by the covariance method, which is most efficient in this case. This moderate resolution, however, does not invalidate our approach. The batches of greatest concern to us are mixtures of B. bombina-like and B. veriegata-like matings, because they lead to an erroneous over-representation of hybrids in the inferred parent pool. Such incidences of mixing were relatively much better detected in the simulations. Moreover, assortative mating would tend to generate matings that are either rather similar or rather different, which increases the detection rate of ‘troublesome’ incidents of mixing relative to our random mating simulation. From that simulation, we determined the net effect of undetected mixed batches on the inferred parent pool by comparing sets of entirely unmixed batches with sets of batches of which 10% are undetected-mixed (Table 3). As expected, mixing increases the proportion of hybrids among the inferred parents, here mostly at the expense of the (0,0) class of matings, but the effect is moderate.

Table 3 Effect of undetected mixed egg batches on the inference of parental genotypes

Joint parental genotypes

As a first step towards inferring the parents, we determine the most likely set of parental genotypes per locus for all families (Table 4). This categorical assignment will, however, not be without error. Recall our earlier example of a sibship that yields for a given locus a mixture of heterozygotes (coded as 1, according to the number of B. variegata alleles) and B. variegata homozygotes (2). This segregation implies that one parent was a heterozygote and the other a B. variegata homozygote (mating type: 1,2). If a single ‘0’ score was added, the parental assignment would change to (1,1). The probability that not all possible offspring genotypes were sampled for a given batch is the highest for (1,1) matings, as they generate the most diverse set of offspring. For the average batch size of nine and assuming Mendelian ratios, there is a 2 × (3/4)9=15% chance that a (1,1) mating will be misclassified as either (0,1) or (1,2), because one of the two homozygotes was not observed. The only other source of error comes from (0,1) and (1,2) matings. In both cases, the probability that one of the two expected genotypes is not observed equals 2 × (0.5)9=0.4%. All of these misclassifications reduce the estimated heterozygosity of the parents in this preliminary assignment (see below). Note, however, that the mating model below explicitly deals with these uncertainties and takes the exact sample sizes per batch into account.

Table 4 Joint parental genotypes inferred from 19 egg batches (after the splitting of batch 5 and removal of single outliers from batches 8 and 18)

As highlighted in Table 4, there are several pairs and one triplet of batches with identical joint parental genotypes (1+2, 3+4, 7−9, 11+12, 13+14). While batches from three of these sets were sampled on the same date, samples from the remaining two sets were collected 2 days apart (cf. Supplementary Appendix I), and the later collections were at a more advanced developmental stage. These observations suggest that the eggs of some pairs were distributed in several batches. The chance occurrence of repeated joint parental genotypes is indeed unlikely: in 1000 simulations of 18 pairs drawn at random from the estimated adult population, two and four sets of identical joint parental genotypes were observed in less than 17 and 1.1% of the replicates, respectively. We therefore merge the batches within each set into a single family and arrive at a total of 13 separate families. This procedure is conservative, because we end up with the smallest possible number of matings, even though some egg batches with identical joint parental genotypes might actually represent distinct families.

Across the entire offspring data set, there are 10 cases of genotype scores which, if changed, would alter the joint parental genotype of the respective egg batch (highlighted in bold in Supplementary Appendix I). These singletons persisted after additional laboratory tests. As expected, they tend to occur in small clutches. Overall, the frequency of singletons is not greater than expected from Mendelian segregation (P=0.033, for bv*vv crosses, but P>0.2 for bv*bv and bv*bb). As a final check on the data, we inspected the segregation ratios based on the inferred mating type per family and locus. Out of 18 tests (ie six loci, and three kinds of segregating cross), only one significant deviation was found (b:v segregation=18:5, G=9.63, P=0.002) for locus Bv12.19 and segregations bv*vv. Given the large number of tests, we do not consider these significant overall.

Closer examination of the joint parental genotypes reveals that they are not a random subset of the adult population (Table 5). While the heterozygote deficit is very similar in parents and adults (F=0.34 versus 0.36, respectively), the mean frequency of B. variegata alleles is higher in the parents (0.55 versus 0.30). This difference is highly significant, because 1000 sets of 26 individuals that had been randomly sampled from a simulated adult population with the same genotype distribution as the observed sample yielded =0.31 (standard error: 0.002). It appears that B. variegata-like individuals have a higher propensity to mate at this site. We now turn to a more detailed statistical examination of mating patterns, which takes such associations into account.

Table 5 Comparison of adult and inferred parental genotypes

Inferring the structure of the adult population

Only three families could be produced by pairs from our adult sample. Moreover, the sampling dates for most of these adults do not agree with those of the clutches. The high proportion of families for which we do not find matching parents suggests a large pool of unsampled adults and rules out direct assignment of parentage to the sampled adults. The problem of unidentified parents persists if we conduct the search more rigorously with a likelihood approach that takes the potential sampling error in the list of joint parental genotypes (Table 4, see above) into account.

As described above, the central difficulty in analysing these family data is that the correct split of the joint parental genotype into a maternal and a paternal component is unknown. Nevertheless, we can determine the most likely parental genotypes of a family from their expected frequency in the adult population and from the observed genotype distribution in the egg sample. To this end, the sample of adult genotypes was used to obtain estimates of allele frequencies and allelic correlations (see Statistics section above). The marked effect of these correlations on the distribution of the hybrid index is illustrated in Figure 3.

Figure 3
figure 3

Expected distribution of the hybrid index in adults. Expected genotype frequencies were computed (a) from the allele frequencies at the six marker loci only (ie R=0 and F=0, black bars) and (b) by taking the estimated moments into account (grey bars). These represent associations among alleles at one or more loci.

The mating model

An estimate of assortative mating was derived using the following model (after Lande, 1981). The probability of observing our families is

where g[X] and g[Y] are the frequencies of the adult genotypes X and Y in the population, based on the estimated moments. P[FiX,Y] gives the probability that these individuals produced family Fi, given their genotype, Mendelian segregation, and the sizes of the egg samples. ZX and ZY represent values of an additive trait (here: the hybrid index) of the genotypes X and Y, respectively, Z̄ is the corresponding population mean. The strength of assortment is measured by α, and β represents the bias towards B. variegata-like genotypes in the parents relative to the adult sample. We sum over all potential parent pairs per family, and multiply across the 13 families.

The maximum likelihood estimates are α=0.007 (support limits: −0.031–0.046) and β=0.23 (support limits: 0.114–0.368). Thus, there is no evidence of assortative mating, but a strong bias towards B. variegata. Figure 4 illustrates these results. We can check the fit of the model by comparing the correspondence between mating types that are predicted from this model with those that are directly inferred from the offspring data. Simulation of pairs using the parameters of this model gives a distribution that is very close to the inferred mating types (Table 6). The greatest discrepancies are seen for mating types (0,2) and (1,1). Their relative over- and under-representation, respectively, in the joint parental genotypes follows the pattern that is predicted for errors in the categorical assignment (s.a.). Note that the mean allele frequencies agree closely (0.53 versus 0.55).

Figure 4
figure 4

The distribution of mating probabilities for a hypothetical pure B. variegata individual (HI=12). Shown are results for our study population according to the mating model (see text for details). The hatched, grey bars represent the case of random mating (α=0) and equal mating propensity for all adults (β=0). The maximum likelihood parameters (α=0.007, β=0.23) are illustrated by black bars. Note that α in this case does not differ significantly from zero. The plain grey bars show the model fit that ignores any variation in mating propensity (β set to 0) and so gives the erroneous impression of strong assortment (α=0.068).

Table 6 Comparison of the mating types, parental genotypes and allele frequencies inferred (a) from the joint parental genotypes, (b) from the maximum likelihood mating model, and (c) from the same model with β set to zero

Estimating assortment solely from the offspring genotypes

We now turn to our second approach and estimate the degree of assortment from the moments in the offspring sample, without reference to the adults. The results are shown in Table 7, together with the means and standard deviations from simulations. For the latter, 1000 replicate sets of 13 families with the same sizes as in our sample were generated from different base populations: first, one in Hardy–Weinberg proportions and linkage equilibrium, and, second, one with allele frequency, heterozygote deficit, and linkage disequilibrium equal to that estimated here from the offspring genotypes by the method of moments (ie pmacr;=0.542, F=0.263, D=0.191). For the latter, genotype frequencies were constructed by assuming a hypergeometric distribution for the number of B. variegata alleles in gametes, with variance determined by the linkage disequilibrium, and then combining a fraction 1−F of gametes randomly, and complementing a fraction F into entirely homozygous diploids.

Table 7 Composition of the populations of adults and parents based on the method of moments

As noted before, the allele frequency estimated for the breeding population was substantially higher than for the sampled adults. The heterozygote deficit in the breeding parents was similar in magnitude to that in the adults (Table 7: Fpq=0.065 versus 0.074, standard deviation: 0.042). Linkage disequilibrium is estimated as being substantially stronger among the breeders than among adults (D=0.191 versus 0.122). Estimates as low as 0.122 were observed in only 4.5% of simulated replicates, but, as with heterozygote deficit, it is hard to judge differences in linkage disequilibrium when allele frequencies also differ between breeders and sampled adults (standardised and 0.581, respectively). Simulations of the null hypothesis of Hardy–Weinberg and linkage equilibrium in the parents show that the observed values of F and D in the inferred parent population are highly significantly different from zero (Table 7).

The association between B. variegata alleles in each parent, c0,1,0,1, is small in magnitude, as are all the other associations between parents that reflect nonrandom mating. The standard deviation of c0,1,0,1 is 0.034, which corresponds to a standard deviation in the correlation of 0.034/pq=0.137. This shows that the method has the power to detect moderate correlations between mates even with only 13 families. On the whole, only one of the cross-parent correlations is notable: the association between linkage disequilibrium in the mother and in the father (D*D) is 0.013, a value exceeded in only 2.9% of the replicates. However, there are six such tests, so that the estimate is not significant overall.

We note that, for all estimators shown in Table 7, the simulations show some bias, which in most cases is statistically significant. However, the bias is small in all cases relative to the standard deviation, and so is no cause for concern.

Discussion

In this paper, we present methods that allow inference of the mating pattern in hybrid populations to be inferred from genotypes of egg batches and adults at a number of diagnostic marker loci. The composition of the breeding population that produced the egg batches could then be compared with the sampled adults in two ways. First, parameters of an explicit model of mate choice were estimated, by calculating the probability that the egg batches would be observed given that model and estimates of adult genotype frequencies. Second, the associations among breeding parents were estimated based solely on a set of full sibships. This approach should prove useful in any hybrid population in which the actual parents cannot be efficiently sampled, such that no direct assignment of parentage is possible. As we have demonstrated, the genotypic information also allows us to account for complicating features of the organisms' biology, such as the mixing and dispersing of egg batches.

Both methods are quite general, and can be applied to populations of any structure, regardless of whether or not they are in linkage disequilibrium. However, as allelic correlations weaken, the variance of hybrid index decreases and so assortment becomes increasingly difficult to demonstrate. At the other extreme, with strong linkage disequilibria, the population is composed of just a few genotypic classes such as parentals, F1s, backcrosses and so on. Simpler methods that compare the observed frequencies of these categories with those expected under random mating might then be used to determine the strength of assortative mating. Our approach should therefore be most valuable in those hybrid populations with abundant recombinants that are nevertheless structured by linkage disequilibria and in which heterozygote deficit suggests a role for assortative mating.

The tests have surprisingly large power even with our small data set, as the support limits and standard deviations are relatively narrow. This is in part because each family is derived from a sample of four haploid genomes. Moreover, we combine information from several loci and base our inference about the mating pattern on an aggregate measure of genotype, the hybrid index. These aspects contribute to the consistency of the results that both methods provide.

While pre-defined full-sib families are an efficient starting point to the inference of mating patterns, we showed (at the toads' prompting) that less than perfect information about family groups can be accommodated, because mixed egg batches generate detectable correlations in multilocus genotypes even in our minimally polymorphic markers. If a greater number of more variable markers is used, the power of detection is increased (Vines and Barton, 2003), and highly polymorphic markers offer the possibility to assemble sibships from genetic data alone (eg Thomas and Hill, 2000). Thus, our method of inferring patterns of parentage does not depend on recognisable sibling clusters in the field. Moreover while we have focussed on patterns of mating, the genetic composition of the offspring cohort could be affected by other processes as well, such as assortative fertilisations (Howard et al, 1998). Their impact could be quantified if, for example, independent mating observations in the field were available. As it stands, our method provides a means to estimate the net contribution of the mating system to the gene flow barrier in a hybrid zone.

In the present example of the Croatian Bombina hybrid population, both estimation methods provide no evidence of assortative mating, but do show a significantly higher mating propensity of individuals with a more B. variegata-like genotype. While our method provides similar estimates of heterozygote deficit in the adult and in the parent population, it suggests relatively stronger linkage disequilibrium among the parents. This implies that adults with more recombined genotypes are contributing less to the next generation than expected. As we mention above, it is difficult to assess the significance of the observed difference, since we are comparing estimates from populations with different mean allele frequencies.

The fact that we cannot identify potential parent pairs among the adults for most clutches suggests that only a relatively small proportion of the adults that visited site 5103 were sampled. Given the size and structure of this site, our capture efficiency of individuals known to be present during any one visit was likely to be at the high end of the range estimated previously for this transect (30–80%, MacCallum, 1994). Our data are therefore in agreement with earlier analyses, which demonstrated substantial within-season turnover in the adult population per site (MacCallum et al, 1998). Individuals seen early in the season were consistently absent later on, and even towards the end of a 6-week observation period new individuals were detected, while still others were caught repeatedly throughout (MacCallum, 1994). At site 103, the average residence time was approximately 5 days, and the majority of individuals were caught only once. Movement into and out of sites together with a sizeable group of residents within a season were similarly observed in B. variegata populations in Switzerland (Barandun and Reyer, 1998). In a recent study on B. variegata-like hybrid populations in Romania, the number of toads that were intercepted overnight at fences around five sites was up to four times larger than the mean number sampled within the water during the daytime (Vines et al, in preparation). Taken together, these data suggest that residence times are highly variable and, for a large fraction of the adult cohort per site, very short. Many of the parents that we missed at site 103 may have been such short-term visitors.

Our data indicate that the sample of adults at the study site was a mixture of two subsets of individuals: those that were accepting this habitat for reproduction and others that were simply passing through, perhaps on the way to more suitable breeding sites. Site 103 was a typical puddle, and the disproportionate contribution of B. variegata-like adults to the offspring cohort is in agreement with the documented habitat preference in this transect (MacCallum et al, 1998). The inferred parental genotypes covered a wide spectrum, such that moderately strong assortative mating should have been detectable if present: the SD of the estimated correlation is 0.137. Yet, our two methods of parentage inference do not provide evidence for any departure from random mating in hybrid populations, despite the anciently diverged mating systems of the two pure taxa. It also suggests that the observed high heterozygote deficit within local adult breeding aggregations in the Croatian transect is due to other causes, such as natural selection against hybrids. A more extensive analysis of assortative mating in hybrid Bombina populations is currently under way in our current study area near Cluj, Romania (Vines et al, in preparation) to investigate the generality of our findings across a wider range of breeding habitats.