Introduction

The self-incompatibility (SI) of flowering plants is a classic of genetics and population genetics. Self pollen, and pollen carrying ‘S’-alleles in common with a potential recipient plant is recognized and prevented from fertilizing ovules. This recognition system interests cell biologists, though the mechanism is not yet well understood at the cellular level (Goring & Rothstein, 1996; Kao & McCubbin, 1996; Li et al., 1995; Rudd et al., 1996), while the impressive S-allele polymorphism has attracted the interest of population geneticists. In this article, we review recent results that are helping further our understanding of the evolution of self-incompatibility, focusing particularly on sequence data from self-incompatibility loci of Brassica species. These data are helping to illuminate evolutionary processes, but it is still difficult to make inferences about function.

In ‘gametophytic’ SI systems, pollen incompatibility types are controlled by the grains' own haploid genotypes (reviewed by Kao & McCubbin, 1996). Gametophytic inheritance with a single incompatibility (S) locus is known in Scrophulariaceae, Onagraceaeae, Papaveraceae, Solanaceae, Rosaceae, and several other flowering plant families (reviewed in Weller et al., 1995), and systems with two or more loci occur in several families, including grasses (Li et al., 1995). Single-locus ‘sporophytic’ inheritance (pollen incompatibility type determined by the genotype of the diploid plant producing the pollen) exists in Brassicaceae, Asteraceae, and several other flowering plant families (Kowyama et al., 1980; Goodwillie, 1997). Dominance/recessivity of alleles is common in both pollen and pistil in these homomorphic SI systems (Sampson, 1967). In both types of SI, the S-loci have spectacular polymorphisms, with many S-alleles in populations (e.g. Sampson, 1967; Wright, 1939). Different populations sometimes, but not always, have have strongly overlapping sets of alleles (Nou et al., 1993; O'Donnell, et al., 1993). In terms of allele numbers, the S-loci are among the most highly polymorphic loci known, similar to mammalian MHC loci (O'hUigin, 1995), or fungal incompatibility loci (May & Matzke, 1995).

Evolution and maintenance of self-incompatibility

Unlike the origin of self-incompatibility (Holsinger & Steinbachs, 1997), the maintenance of variability at the self-incompatibility loci is well understood (Wright, 1939). Rare alleles have a fertility advantage because pollen carrying such alleles will not be rejected by incompatibility reactions of recipient plants. In a large enough population, this ‘frequency-dependent’ selection favours each new incompatibility-type allele until an equilibrium is reached with equal allele frequencies. The equilibrium number of alleles will thus be high, depending on population sizes (since alleles will be lost by chance in finite populations) and mutation rates to new specificities (since frequent mutation increases allele numbers, see Wright, 1939). The fertility advantage to low-frequency alleles also means that losses by genetic drift are soon restored if there is gene flow from other populations. Alleles should thus be maintained in species for long evolutionary times (Vekemans & Slatkin, 1994).

Until recently, high allelic diversity hindered study of alleles in populations, as probes based on one cloned allele rarely detect other alleles by Southern blotting, even under low stringency. Sequence-based approaches are now permitting renewed evolutionary and population studies of the self-incompatibility polymorphism. In several self-incompatible plants, alleles have now been cloned that segregate with the incompatibility types of plants in families, and encode sequences of cosegregating pistil proteins (Anderson et al., 1989; Li et al., 1995; Nasrallah et al., 1987; Walker et al., 1996).

An important result from sequencing studies is that self-incompatibility loci have evolved from independent origins several times in flowering plants. The gene family to which the S-locus belongs has now been identified in several angiosperm families. The pistil S-proteins in Solanaceae, Rosaceae and Scrophulariaceae are all related to RNases, though this may be due either to independent evolution or common origin (Sassa et al., 1996). A quite different gene product is implicated in Papaver rhoeas (Rudd et al., 1996) and the only pollen-expressed gene yet characterized (in the grass Phalaris coerulescens) shows thioredoxin activity (Li et al., 1995).

The system in the Brassicaceae, the only sporophytic system characterized at the molecular level, is quite different (Fig. 1a). It is unique in that two linked loci, SLG and SRK (encoding, a stigmatically expressed S-locus glycoprotein and a receptor kinase, respectively), probably play essential roles in incompatibility. The intronless SLG locus encodes an extracellular protein probably involved in recognition, consistent with the finding that it is highly polymorphic. The evidence for its necessary role in self-incompatibility is not yet complete. The polymorphism of the closely linked SRK gene could conceivably produce polymorphism at SLG (see below), so direct evidence of SLG being essential for functional incompatibility is needed. To date, the best evidence comes from B. campestris rendered self-compatible by mutation at an unlinked locus that affects stigma but not pollen function; the mutants have lowered SLG RNA and protein products (and of some other loci in the gene family), but SRK expression is unaffected (Nasrallah et al., 1992).

Fig. 1
figure 1

(a) Schematic diagram of the general features of the S-gene region in Brassica. Introns in the kinase domain of the SRK gene are not to scale. Numbers of genes in the S-locus region differ between species and some may be pseudogenes (Suzuki et al., 1997). The directions of transcription are for B. napus (Yu et al. 1996). In B. oleracea, transcripts in both directions are generated from SRK (reviewed in Pastuglia et al., 1997b). (b) Comparisons of synonymous and non synonymous differences per site between SLG and the S-domain of SRK-alleles, within haplotypes and between different haplotypes. The regions of the two loci compared are indicated by the horizontal bars in the figure.

The SRK locus contains an S-domain, homologous in sequence to the SLG locus, an apparent transmembrane domain and a domain with six introns that has homology to members of a serine/threonine kinase gene family (Stein et al., 1991) present in other plant species (Dwyer et al., 1994). The kinase is surmised to be an extracellular receptor (Stein et al., 1991). It also resembles an immunoglobulin-like repeat sequence (Glavin et al., 1994) and may be related to plant proteins involved in defence against pathogens, another plant recognition system (Pastuglia et al., 1997a). Mutational loss of the kinase function causes self-compatibility, so this gene product is essential for incompatibility (Goring & Rothstein, 1996), though not necessarily for the recognition function.

These findings strongly suggest that SI systems evolved independently in different families. This is consistent with information on the distribution of SI (Weller et al., 1995), but not with the view that incompatibility evolved in the ancestor of all angiosperms and was subsequently modified into gametophytic and sporophytic systems (Whitehouse, 1950; Beach & Kress, 1980).

Quantitative measures of the self-incompatibility locus polymorphism

The first sequence studies of small numbers of alleles laboriously and independently cloned from cultivated or laboratory strains of plants immediately revealed astonishing divergence between alleles for different incompatibility types, in both Solanaceae (Anderson et al., 1989) and Brassicaceae (Nasrallah et al., 1987). Both silent differences and multiple amino acid differences were found between S alleles. Relatively conserved regions were identified that can be used to design primers for PCR-based analysis of alleles. Combining PCR with restriction enzyme digestion has made it possible to obtain allele-specific bands, for typing plants of self-incompatible horticultural species (Brace et al., 1993; Janssens et al., 1995). This opens the way for sequencing and thus study of allelic diversity among plants in natural populations, even when, as is usual, they are heterozygotes.

Study of alleles from natural populations is needed in order to go beyond counting alleles, to describe quantitatively the amino acid and base sequence diversity between S-alleles, and to subject the polymorphisms to molecular evolutionary analyses that can test for recombination and selection at sites in these loci. Ideally, the incompatibility types of the alleles sequenced should be known, and the alleles should be randomly picked. No study has yet achieved this ideal, and so far, apart from the very interesting finding of silent sequence differences between P. rhoeas alleles of the same SI type (Walker et al., 1996), sequences have been compared only between S-alleles for different incompatibility types. These will on average differ more than randomly picked alleles. The first study of alleles from a natural population was done in a gametophytic system, in two species of Solanaceae, though the incompatibility types of the alleles sequenced are not yet known (Richman et al., 1996). Another large set of sequence data, from Brassica alleles of known incompatibility type, is now available from GenBank (Kusaba et al., 1997). As they are from cultivated strains, these sequences probably include only part of the natural diversity. We here review some of the results from analyses of SI sequences, focusing mainly on Brassica.

Measuring the polymorphism by the diversity per nucleotide site (Nei, 1987), differences between Brassica S-alleles are lower than in the natural populations of Solanaceae surveyed (Fig. 2) or the two P. rhoeas alleles compared (Walker et al., 1996), and similar to differences at MHC loci (Hughes & Nei, 1989). In all species so far studied, polymorphism is found throughout the S-locus sequences, particularly in certain (‘hypervariable’ or HV) regions, where diversity is spectacularly high (Table 1).

Fig. 2
figure 2

Distribution of pairwise diversity values between alleles within species, in S- and MHC-loci. (a) shows alleles in a populations with gametophytic SI, together with some MHC alleles. The data for sporophytic SI (b) are based on 40 GenBank-derived SLG alleles of known incompatibility types in Brassica (B. oleracea 21 alleles, B. campestris, 19 alleles). The sequences analysed include only coding regions of functional alleles, and only type 1 (dominant) incompatibility alleles, as most available sequences are from this type. The few type 2 alleles appear extremely different in sequence (Kusaba et al., 1997). After alignment, diversities were estimated separately for each codon, using pairs of alleles from within each species, or pairs consisting of one from each species (between-species values).

Table 1 Distribution of diversity values expressed as mean pairwise differences in different parts of the SLG and SRK loci of Brassica oleracea (19 SLG and 3 SRK alleles) and campestris (21 SLG and 3 SRK alleles). Proportions of synonymous and non synonymous differences per site (Ps and Pn, respectively) were estimated using MEGA (Kumar et al., 1994), and conservative and nonconservative amino acid substitutions (see text) were estimated using a FORTRAN program provided by Dr T. Ota C-domain (total of 30 nucleotides).

Age of the alleles

Alleles at the S-loci must be very ancient, because long periods of time, and low rates of recombinational exchange (Nr) are required for silent substitutions to accumulate between alleles within each species (Hudson, 1990). Furthermore, unlike most loci, whose alleles cluster together within species, S-allele sequences are sometimes more similar to alleles from related species than to others from their own species: Solanum alleles mingle in the allele tree with those from species in the genera Petunia, Lycopersicon, and Nicotiana, and the same is seen in Brassica species (Hinata et al., 1995; Kusaba et al., 1997). This ‘trans-specific clustering’ suggests that the S-alleles have been polymorphic since before the species became isolated from one another (Dwyer et al., 1991; Ioerger et al., 1990; Richman et al., 1996).

Quantitative analysis of Brassica oleracea and campestris S-allele sequences reveals, astonishingly, that average differences between SLG type 1 (dominant) alleles are as large when alleles are compared within species as when alleles from different species are compared (Fig. 2). This is true even for synonymous sites, and even in the most conserved regions, and also for the S-domain of SRK, discussed in more detail below (Table 1) Unless sequences can be exchanged by hybridization, these data suggest that the polymorphism must date from before the species split, though, unlike MHC alleles, no motifs common across different species (see O'huigin, 1995) are discernible in S-allele sequences. Turnover of alleles is also probably infrequent, since replacement of alleles by descendant alleles with new specificities will increase within-species similarity.

The alternative that sequence similarity between alleles from different species could occur by chance ‘convergence’ seems unlikely for the numerous amino acid and silent differences seen between S-alleles, but no general test for convergence yet exists to rule it out. It may occur in MHC loci, as different codons are used for the same amino acids in different species (Gustafsson & Andersson, 1994; O'huigin, 1995).

Nature of the differences between SLG alleles

The great age of the alleles makes it difficult to test for differences in functional constraints between different parts of the S-loci, and it is not yet certain that a meaningful pattern of diversity differences exists. Stochastic variation in diversity is expected at different sites within a sequence, even when all are evolving neutrally (Hudson, 1990). This seems unlikely to explain the diversity differences in the S-locus (hypervariable regions), which appear robust as sequence data have accumulated from more and more alleles (Hinata et al., 1995). It is, however, very unlikely that all the variants in these regions are maintained by balancing selection. Much of the diversity may be a consequence of linkage to a few sites under balancing selection. If there is little or no recombinational exchange between alleles, each functional allele class will behave like an isolated population. Because many SI types are maintained in species, effective population sizes of individual classes will be low. Even nonsynonymous mutations may therefore drift to high frequency or fixation within classes, so differences additional to those involved in determining SI type will accumulate, provided they are not strongly detrimental to function. This will produce variability at sites not themselves under such selection, as observed for the Drosophila Adh locus where, an extreme peak in diversity exists around the site of the F/S allozyme polymorphism (Hudson, 1990). At the S-loci, because multiple different alleles are maintained, the polymorphism must involve many sites, so we might expect a wide region of increased diversity due to drift.

As in other attempts to determine functionality from sequence data, it is remarkably difficult to decide between the extreme opposites: polymorphic sites such as the HV regions may be the most important (where amino acid differences determine the incompatibility types), or they may be regions of minor functional importance that are more free to vary than other parts of the locus. This dashes hopes that regions of the S-locus important in determining functional differences between incompatibility types can be pinpointed simply by identifying the most variable parts of the sequence. It even implies that the high polymorphism of the SLG locus is not a sure sign of its involvement in recognition. More direct evidence is necessary. For instance, comparisons between alleles of the same SI type should help identify sites which can change without affecting incompatibility type (the one such study to date found only silent differences Walker et al., 1996). The direct evidence that two B. campestris SLG alleles with different specificities have identical hypervariable region amino acid sequences suggests that other regions can determine specificity (Kusaba et al., 1997). Despite many difficulties, some evidence from transgenic approaches is becoming available. Replacement of the HV region of one Solanum allele by that from another allele with a very similar sequence has been shown to change the pistil incompatibility appropriately (Matton et al., 1997).

Evidence for diversifying selection

Instead of merely showing that diversity is high, a more sensitive method of testing for the operation of diversifying selection Fig. 2.is to quantify amino acid changes, and compare their frequency with that of silent changes, by Pn/Ps ratios (Nei, 1987). Natural selection usually eliminates alleles coding for proteins with variant amino acid sequences, so high values (e.g. 2–3 for the MHC antigen recognition sites) suggest balancing selection (Hughes & Nei, 1989). For the four variable regions identified in the SLG locus (Kusaba et al., 1997), Pn/Ps averages 0.95 for the Brassica species, i.e. as expected for neutrally evolving coding sequences (Nei, 1987). For the regions classified as conserved, values are lower (about 0.44) but still very high compared with other genes. Values derived from comparing sequences of nuclear genes between maize and rice average about 0.12 (Wolfe et al., 1989); few data are available for plant DNA polymorphisms, but a similar value (0.12) was estimated for Adh loci within species of the plant genus Leavenworthia (Charlesworth et al., 1998). Correction for saturation of the synonymous substitutions would increase the denominator in the calculations and reduce the values for the S-allele data, but even the nonsynonymous sites approach saturation, so the effect of correction is slight. As explained above, however, the possibility that polymorphisms may be present because of linkage to selectively maintained sites elsewhere in the locus makes it impossible to conclude from these results that the variable regions are more functionally important than the conserved ones.

Diversifying selection should also lead to nonconservative amino acid substitutions, with changed charge or polarity, whereas other parts of the locus should have mainly amino acids with similar properties (conservative differences). Excess nonconservative amino acid substitutions suggest the operation of diversifying selection at other recognition loci (e.g. Hughes et al., 1990). But in the B. oleracea and campestris S-allele sequences, nonconservative amino acid substitutions are slightly less likely than conservative changes, though much more frequent than in the kinase domain of SRK (Table 1). These data do not therefore indicate strong diversifying selection. Rather, sites within these regions appear to be evolving neutrally.

Finally, as in several other systems, the accumulation of differences at nonsynonymous sites in both the gametophytic and Brassica S-loci seem to be slowing down with evolutionary time. This is inferred because average numbers of nonsynonymous differences between allele pairs increase more slowly than synonymous ones (which increase roughly in proportion to divergence times but slow down due to saturation of substitutions). This difference becomes more pronounced as the number of synonymous differences between the alleles compared gets bigger (Hinata et al., 1995; Uyenoyama, 1997). This is the opposite of the effect of saturation and suggests diversifying selection, as differences accumulating neutrally should show a linear relation (Tanaka & Nei, 1989). This effect might help test which regions of the locus are under diversifying selection pressure. Surprisingly, such tests applied to the Brassica data show just as clear evidence for the conserved as for the HV regions, suggesting that both are selected similarly. This could be due to a slowing down of the rate of replacement of alleles by descendents with new incompatibility types. Accumulation of nonsynonymous substitutions (see above) or slightly deleterious mutations at linked sites (Uyenoyama, 1997) implies that heterozygous progenitor/descendant genotypes will be homozygous for deleterious variants of the progenitors, disfavouring new alleles. Their rarity advantage (explained above) diminishes as allele numbers in a population increase, so it becomes increasingly unlikely that new SI alleles will replace their progenitors, resulting in a ‘frozen polymorphism’ in which only synonymous substitutions occur.

Polymorphism at other loci in Brassica

To evaluate and interpret data from S-loci, heterozygosity and genetic diversity data for non-S reference loci will be helpful, as a basis for comparison, which can help the search for evidence of selection (Hudson, 1990) and hybridization. As already mentioned, sequence diversity has scarcely been studied in plant populations. Plants probably do not differ greatly in this respect from animals such as Drosophila, whose sequence diversity is well studied (Moriyama & Powell, 1995), but outbreeding plants may have higher diversity (Henry & Damerval, 1997; Liu et al., 1998).

Comparisons of diversity between different S-locus regions may also be informative, despite the difficulties mentioned earlier. Both the SLG- and SRK-loci are clearly exceptionally variable, while other loci in the S-gene family, whose products are nonessential for incompatibility, apparently have much less allelic polymorphism, though natural populations have not been studied (Hinata et al., 1995). Current scanty data on S-locus related (SLR) genes, nonessential for recognition, suggest sequence conservation within and between species (Hinata et al., 1995). For the unlinked SRA and SRB loci, silent differences (i.e. coalescence times) are greater between than within species, unlike SLG and SRK, though SRB shows moderate polymorphism. The S-linked anther-expressed B. oleracea locus SLA is nonessential for incompatibility and has low polymorphism, as have several other loci in the S-gene region whose expression is not confined to pistils, implying that they are not involved in pollen recognition processes (Nasrallah et al., 1991; Pastuglia et al., 1997b). These polymorphism differences, even involving loci linked to the S-locus, suggest some independence in their evolution.

This independence can also be detected in comparisons of SLG and SRK. The polymorphism in the S-domain of the few SRK alleles currently available appears similar to that at the SLG locus (both have mean diversity per base of 0.13). Pn/Ps values are as high as those for corresponding SLG regions. Over most of the variable regions, radical amino acid substitutions per site are again nearly as likely as conservative ones. When alleles from the two species are compared, Ps values are again no larger than for within-species comparisons, except perhaps for the conserved regions. These extensive differences suggest that the SRK S-domain may have a balanced polymorphism similar to that in SLG, and thus that it participates in recognition of incompatibility types.

In contrast, the SRK kinase coding sequence has fewer nonsynonymous differences (Table 1). Pn/Ps averages 0.48, similar to the conserved regions of SLG or the SRK S-domain. Consistent with this domain's overall lower variability, probabilities of radical amino acid differences between alleles within both species average less than half those of conservative differences. Finally, unlike the observations on SLG or the S-domain of SRK, Ps estimates from the kinase domain are somewhat less within each species than between them (Table 1). There are thus clear signs of selective constraint acting at the SRK locus, and its polymorphism may be less ancient than in the other regions, though still very old, since the introns are highly variable, implying long times to common ancestry of alleles (Nishio et al., 1997). This regional variability suggests that the kinase domain polymorphism is caused by linkage to sites elsewhere in SRK that are under balancing selection, with some recombination in the locus. With no recombination, all parts of the locus have identical coalescence times, so we expect no regional heterogeneity in silent diversity, even if selective constraints differ.

Linkage disequilibrium and recombination in the SLG-SRK region

The sequences of the SLG gene and the S-domain of the SRK gene from any given haplotype tend to be similar, whereas those from different haplotypes differ extensively. This has suggested that functional self-incompatibility requires alleles from the two loci to be similar and that recombination between the two loci is suppressed to maintain this matching (Stein et al., 1991). High sequence divergence in the region between the loci (Boyes et al., 1997) and near Petunia inflata S-loci (Coleman & Kao, 1992) support rarity of recombinational exchange, but divergence could be due to relaxed selection in these flanking regions. Sequence differences are not only a feature of S-loci, but are also found in maize intergenic regions (Sanmiguel et al., 1997). Further sequence data from Brassica show that SLG alleles are not always similar to SRK alleles from the same haplotype (Goring & Rothstein, 1996; Kusaba et al., 1997). For the few haplotypes sequenced for both loci, both synonymous and nonsynonymous within-species differences between haplotypes average almost double their within-haplotype values (Fig. 1B), in other words there is some linkage disequilibrium. This is consistent with the idea of selectively maintained within-haplotype similarity between the two loci. But an alternative is that sequence similarity is caused by exchange of sequence information between them (e.g. by gene conversion)

We need more evidence on whether genetic exchanges, by recombination or gene conversion, occur within the S-locus region. No estimates of recombination rates from the sequence data (see Hudson, 1990) have yet been attempted. Such estimates may be impossible, because the high variability may obscure patterns, and apparent ‘recombinant’ sequence motifs may be caused by independent origination of similar sequences. Some evidence, however, suggests recombination. If recombination were totally suppressed, we should expect strong linkage disequilbrium between sites in the S-loci, which is not apparent in the data. Rather, the data show a pattern similar to that in other loci (e.g. Miyashita et al., 1993): linkage disequilibrium is found only for very close sites, suggesting that, even within the SLG locus, more distant sites recombine over evolutionary time, consistent with the evidence for some measure of independence in the patterns of evolution of the SRK and SLG loci (above). This view is also consistent with the observation that different sequence regions within the S-locus produce different allele trees (Kusaba et al., 1997), though the significance of such differences is uncertain as the sequence regions are short. The presence of nonpolymorphic loci in the S-locus region also suggests recombination, such that each locus evolves essentially independently.

The pollen locus

A major question concerning the recognition function of self-incompatibility systems is whether there are separate pistil and pollen loci, as often assumed (e.g. Stephenson et al., 1997). This view involves several difficulties. It requires strong linkage disequilibrium, and very infrequent recombination, so that each pistil type rejects only its own pollen. Furthermore, recognition is relatively easy to imagine if one locus is involved and identity causes rejection. But if two loci are involved each pistil allele requires a different recognition reaction for pollen carrying just one allele product of the other locus. Finally, mutation at either locus will not create new incompatibility types but merely produce self-compatibility, so it is difficult to imagine how new SI types can arise.

To date, no candidate for the pollen locus has been discovered in any self-incompatibility system other than Phalaris coerulescens, but this gene is not expressed in pistils and the gene controlling the pistil incompatibility reaction is not known in this, or any other, grass species (Li et al., 1995). Loci for the pollen and pistil recognition functions must be tightly linked, otherwise recombinants should exist with different pollen and pistil incompatibility types; these are known in distylous species (Ernst, 1936) but have not been found in species with homomorphic SI (Lewis, 1963; Nasrallah et al., 1991; Nasrallah et al., 1992).

If the S-locus region rarely recombines, it may have accumulated repetitive sequences, making it hard to find coding sequences within it by approaches such as chromosome walking. Transgenic approaches may also fail due to cosuppression (e.g. Jorgensen, 1990), as expression of additional S-locus copies in pollen (e.g. in the diploid pollen of tetraploids) often causes self-compatibility (Lewis, 1963). It might be possible to identify the pollen locus by in vitro testing of pollen proteins, though work in Brassica has so far not definitively identified an S-linked component causing allele-specific rejection (Stephenson et al., 1997). Alternatively, molecular analysis of self-compatible mutants in which either the pollen or pistil SI type is not expressed, while the other function remains intact (see Lewis, 1963; Nasrallah et al., 1991 should be informative. If ‘pollen-part mutations’ involve the S-locus encoding the pistil S-protein, this would cast doubt on the existence of a separate locus. Such information is difficult to obtain, because mutations of known progenitor alleles are needed. Otherwise the high polymorphism in these loci makes it impossible to know that any sequence difference found really causes the mutant phenotype (Pastuglia et al., 1997b). In the one study yet published, a mutant Japanese pear with deleted stylar S-RNase, appears to have normal pollen reactions (Sassa et al., 1997); unless the plant is merely a chimaera, this implies a separate pollen locus, despite the difficulties of this hypothesis. Given the long history of astonishing results from SI systems, outcomes surprising to geneticists are now almost expected.