Introduction

The self-incompatibility system of many flowering plants ensures that pollen cannot fertilize a plant's own ovules. In most self-incompatible species, this is controlled by alleles at a single S-locus (de Nettancourt, 1977). S-alleles determining rare specificities have a reproductive advantage over alleles for common incompatibility types, and many different alleles are expected to be maintained at approximately equal frequencies for long periods of time, even in finite populations (Vekemans and Slatkin, 1994; Clark, 1996). Very high levels of amino-acid and silent-site polymorphism are thus expected, and observed, at the S-locus (Clark, 1993).

As alleles may persist for long periods of time, large sequence differences can develop if recombination between the S-alleles does not occur, or is very rare, consistent with the extreme differences found between allele sequences at incompatibility loci (eg Richman et al, 1996; Awadalla and Charlesworth, 1999; Richman and Kohn 2000; Vieira and Charlesworth, 2002). Hypervariability in certain regions of the S-locus has been taken as indicating parts of the gene that encode the regions of the stigmatic S protein involved in specificity differences (reviewed in Awadalla and Charlesworth, 1999). In the absence of recombination, sequence variants within a functional allelic class (ie sequences with the same specificity) will indeed be associated with that specificity until separated by recombination (Strobeck, 1972), unless recurrent mutation occurs at the same site. If only a few sites determine specificity differences, peaks of variability are expected in regions close to these sites (Nordborg et al, 1996), as in the MHC loci (Takahata and Satta, 1998a, 1998b).

However, identifying local peaks of variability in sequences as regions under balancing selection (ie recognition sequences) is valid only if recombination or gene conversion occur, separating sites under selection from associations that arise by mutation. Without such exchange, higher and lower variabilities can arise, due to differences in selective constraints, but the balanced polymorphism at the S-locus will increase variability throughout the gene, as is indeed observed for silent and intron sites as well as for nonsynonymous sites (Awadalla and Charlesworth, 1999; Schierup et al, 2001; Vieira and Charlesworth, 2002). It is therefore important to determine whether S-loci show recombination or not. If recombination occurs, the number of peaks in variability could also help distinguish whether balancing selection acts at many sites in the sequence, or at only a few sites.

Until recently, the gametophytic self-incompatibility locus was thought not to recombine (Clark, 1993). In two species of Solanaceae, Lycopersicon peruvianum (Bernatzky, 1993) and Petunia hybrida (Entani et al, 1999), the S-locus maps to the centromeric region, and the organization of this region is thought to be conserved in other species of Solanaceae (ten Hoopen et al, 1998). Centromeric regions have suppressed crossing over in a wide range of species, including plants (reviewed by Charlesworth and Charlesworth, 1998). The S-loci of these species may therefore be in a low-recombination region of the genome. In Rosaceae, however, the data suggest a noncentromeric localization of the S-locus (Ushijima et al, 2001); recombination could nevertheless be suppressed in the region. Furthermore, even in low-recombination regions, exchange may occur by gene conversion. Crossing over and gene conversion rates need not be strongly correlated (Langley et al, 2000; Jensen et al, 2002).

Consistent with the view that S-loci rarely recombine is the observation that the flanking regions of S-loci of some species differ greatly in sequences between alleles with different specificities (Coleman and Kao, 1992; Chung et al, 1995; Matton et al, 1995). There are, however, few comparisons between variability in the S-locus region and those of flanking regions of other genes in the same species. Thus, it is not yet known whether diversity in the S-locus region is unusual in the genome. Explicit tests for genetic exchange (recombination or gene conversion) are thus needed at S-loci.

Attempts to test for recombination in the gametophytic S-locus have produced varying conclusions. Clark and Kao (1991) did not detect intragenic recombination in S-allele sequences of four species of Solanaceae, using two tests based on clustering of polymorphic sites (Stephens, 1985; Sawyer, 1989), but their sample size was small. However, some intragenic recombination at the S-locus has been inferred for several species of Solanaceae. S-locus sequence diversity is higher than at S-linked loci (unpublished results in McCubbin and Kao, 1999; Li et al, 2000) and inconsistent evolutionary histories were observed for the 5′and 3′ regions of the S-locus in two sets of four closely related P. inflata S-alleles, suggesting recombination (Wang et al, 2001). Schierup et al (2001) used the informative sites test (Worobey, 2001) and r2 test of recombination, and also found evidence for recombination in two species of Solanaceae, but not in P. inflata.

To test whether intragenic recombination is a general feature of the gametophytic S-locus, we here use the relationship between linkage disequilibrium and distance between variable sites (Awadalla and Charlesworth, 1999) to test for recombination in S-loci of 21 species of Solanaceae, Rosaceae and Scrophulariacae.

Methods

We obtained data from 21 species for which five or more cDNA S-allele sequences, more than 170 bp long were available. Most are partial sequences between conserved regions C2 and C5 (see Richman et al, 1996; Richman and Kohn, 2000; Vieira and Charlesworth, 2002). For each species, we combined the cDNA sequences with amino-acid sequences from exons deduced from genomic S-RNase gene sequences, where available (see Table 2). The amino-acid sequences were aligned using ClustalX v. 1.64b (Thompson et al, 1997). There are some alignment gaps, mostly in the hypervariable regions. Balancing selection acting on S-alleles should ensure that there is little differentiation between populations (Schierup et al, 2000), so allele samples sampled from the species as a whole, as here, are suitable for testing recombination.

S-alleles are under balancing selection, so the infinite sites model, which underlies most available methods for testing for or estimating recombination in DNA sequence data, is violated (see discussion in Awadalla and Charlesworth, 1999). The aligned amino-acid sequences were therefore tested for a relationship between measures of linkage disequilibrium and nucleotide distances between variable sites, using Spearman's rank correlation. Linkage disequilibrium measures depend on the variant frequencies at the sites compared (Lewontin, 1988; McVean, 2001). We therefore used both D′, which corrects for variant frequencies (Devlin and Risch, 1995; Jorde and Bamshad, 2000), and r2 values. The D′ and r2 values were calculated using DnaSP software (Rozas and Rozas, 1999). To obtain P-values, 1000 data sets were generated with the D′ and r2 values obtained, but with randomized distances between sites (Awadalla and Charlesworth, 1999). Sequential Bonferroni correction for multiple nonindependent comparisons was applied (Rice, 1989) to each type of test (see below).

Gene conversion or crossing over both lead to a decline of linkage disequilibrium with distance, provided that the length of conversion tracts are similar to the size of the region examined (Takahata and Satta, 1998a, 1998b; Wiehe et al, 2000). In our data sets, most sites are less than 700 bp apart. Although the average length of a typical plant gene conversion tract is not known, it is probably often less than this (Dooner and Martinez-Férez, 1997; Drouin et al, 1999; Fu et al, 2002). In Brassica S-loci, linkage disequilibrium was found to decay within 400 nucleotides (Awadalla and Charlesworth, 1999).

We did four analyses for species of Solanaceae, and three of them for the Rosaceae and Scrophulariaceae, whose introns lengths differ too much for the fourth analysis (see below). The first analysis (column labelled A in Table 1) used all nonsingleton polymorphic sites with two variants, excluding alignment gaps. Since selection might lead to concordant polymorphic amino-acid variants in functionally different alleles, which could mimic recombination (Sawyer, 1989), we also tested using third codon positions only, using all pairs of nonsingleton sites (column B in Table 1).

Table 1 Spearman's rank (ρ) correlations (× 103) of D′ and r2 with distance

Introns are known in gametophytic S-allele sequences of several species (reviewed in Vieira and Charlesworth, 2002). All S-allele genomic sequences so far obtained from species of Solanaceae (N=14), Rosaceae (N=18) and Scrophulariaceae (N=36) have one intron in the HVa region, and in the genomic sequences from Scrophulariaceae the intron lengths vary (Vieira and Charlesworth, 2002). Five of the 18 genomic S-allele sequences from Rosaceae have a second intron at the cleavage site between the signal peptide and the C1 region (Ma and Oliveira, 2000). In Rosaceae and Scrophulariaceae, the distances between pairs of polymorphic sites that are separated by introns therefore differ between different pairs of alleles in a species. Linkage disequilibrium should still decay with distance, but the relationship with distance may be obscured by the uncertainty of the distances, that is, will be weaker than if we knew the true distances. Our tests are therefore conservative as they reduce the chance of detecting recombination. We therefore did a third test using only pairs of polymorphic sites that are not separated by introns in any of the sequences compared (column C in Table 1). For Solanaceae, the 13 introns that have been described are of similar sizes (ranging from 87 to 125 bp, average 103.62; the error of the mean is 3.62). For sequences from this family, we also performed an additional test by adding the average size of the intron to the cDNA distances between sites that are separated by an intron (column D in Table 1).

Where possible, the analyses were also repeated using data sets excluding highly diverged sequences. Pairwise Ks values were estimated by Nei and Gojobori's (1986) measure with Jukes–Cantor correction, which is suitable for highly variable sequences. Sets of sequences were then formed in which five or more sequences remain after excluding all pairs with Ks>0.45. This analysis could not be carried out for P. hybrida, L. peruvianum, L. andersoni, S. carolinense, S. chacoense, N. alata, or any of the Antirrhinum species because all sequences were highly diverged. Two of the nine species for which sets could be formed had two suitable nonoverlapping sets of sequences (W. maculata and P. longifolia).

Results and discussion

We found significant negative correlations for both D′ and r2 for a number of species. There was no evidence for recombination in the data from Antirrhinum species (Scrophulariaceae). Although the correlations are very small, three from the Solanaceae and Rosaceae are significant after sequential Bonferroni correction (W. maculata, L. andersonii and Malus × domestica; Table 1, part I). None of the species gave significant negative correlations for both D′ and r2 with all the different analyses applied, but L. andersonii gave significant negative correlations for both measures with three of them (Table 1, part I).

These conclusions differ from those of Schierup et al (2001), who found no evidence for recombination in L. andersonii by either method used, while the r2 test suggested recombination for P. crassifolia and S. carolinense. There are several possible reasons for the differences. First, Schierup et al (2001) exclude segregating sites at frequencies below 30%. For S. carolinense, when the Schierup et al (2001) data set is used, our approach detects no significant correlations between either D′ or r2 and distance, while Schierup et al (2001) found weakly significant correlations. Second, for L. andersonii, Schierup et al (2001) analyzed more alleles (22, while our data set had 11), but a two-fold smaller region of the S-locus. Distances between the segregating sites compared were thus much shorter than in our data and the number of data points used in the correlations is 7.8 times smaller. Applying our methods to the data set of Schierup et al (2001), D′ declines significantly with distance (P<0.001). Third, in the data set of Schierup et al (2001), all pairs of segregating sites were less than 150 bp apart, so it is surprising that recombination was detected by them but not by us, although clearly larger numbers of sequences make it more likely that clear patterns will be detected, provided that the length of sequence is sufficient. Applying our methods to the P. crassifolia data set of Schierup et al (2001), a significant correlation between r2 and distance is observed (data not shown). Finally, different degrees of coadaptation between different amino-acid sites may cause differences between the two studies. If coadaptation is primarily between amino acids in different parts of the molecule, linkage disequilibrium could extend across considerable distances, and a decline with distance would be undetectable unless only closely segregating sites are analysed.

For P. inflata, both we and Schierup et al (2001) found only weak evidence for recombination, in disagreement with the results of Wang et al (2001). Highly divergent sequences in our data set, and that of Schierup et al (2001) might, however, obscure evidence for recombination. Evidence for exchange in the distant past could have been obliterated by subsequent mutations (Clark, 1993) and, since most S-alleles are old, the same mutation could have occurred twice at the same site. It is also possible that recombination rarely happens between very dissimilar S-allele sequences. Wang et al's (2001) approach of removing the most divergent sequences from the data sets could thus be preferable for testing for recombination. Although some known variants are omitted, there is no reason to think that this would falsely produce the appearance of recombination. Part II of Table 1 shows results of analyses of the data sets in which five or more sequences remain after excluding highly diverged sequences (see Methods). Negative correlations significant after Bonferroni correction were found for both D′ and r2 for three data sets (P. inflata, P. dulcis and Malus × domestica). P. inflata gave significant negative correlations for both D′ and r2 with all four different ways of analyzing the data (columns A, B, C and D in Table 1, part II), and P. dulcis with two of the three methods used for this species. Two other species yielded nonsignificant test results, whereas their sequences suggested recombination when all sequences were included. These were the two subsets of W. maculata and P. longifolia sequences (W. maculata 1, W. maculata 2, and P. longifolia 1 and P. longifolia 2, in Table 1, part II). The difference may be due to the small size of these data sets, with consequent low power to detect recombination.

Data sets that produced significant negative correlations of both D′ and r2 with true or estimated genomic distance (columns C and D in Table 1, respectively) are illustrated in Figure 1a and b, respectively. L. andersonii and W. maculata show marked decreases of r2 with distance, in the analysis using all sequences. For P. inflata and P. dulcis, our analysis suggests recombination only when the most highly divergent sequences are excluded. Wang et al's (2001) analyses used four of the five S-allele sequences included in our analysis, so the agreement with their conclusion is expected.

Figure 1
figure 1

Relationship between linkage disequilibrium, measured by r2, and the physical distance, measured in number of base pairs, using the data sets that showed significant results in which true (a) and estimated genomic distances (b) are known.

Our tests use related species, so that they are not independent, given that S-alleles may be maintained for very long evolutionary times. An S-allele from one species may therefore be more closely related to an S-allele from another species, or even from a different genus, than to another S-allele from the same species (sometimes called trans-specific evolution; Clark, 1993). Recombination events in an ancestor could therefore be detectable in more than one descendant species. Different results obtained for related species (eg P. avium and P. dulcis) may be due to true differences, or to low power to detect recombination in some data sets. Despite some inconsistent test results (perhaps not surprising, given the small sample sizes and sequence lengths available, and the well-known difficulties of detecting linkage disequilibrium as illustrated above), signs of genetic exchange are repeatedly found, and therefore seem difficult to ignore.

Although we cannot estimate the recombination frequency for gametophytic S-loci, the high level of silent site differences between S-alleles suggests that such recombination is rare. It is also not yet clear whether similar sequences experience much higher recombination rates than highly divergent ones. Nevertheless, even rare recombination could be an important factor in the evolution of these loci (Schierup et al, 2001), and in addition to mutation, could potentially generate new specificities (Wang et al, 2001).

Table 2 Accession numbers of the S-allele sequences of Solanaceae, Rosaceae and Scrophulariaceae used