Introduction

Major histocompatibility complex (MHC) antigen-presenting genes (classes I and II) are the most variable loci in vertebrate genomes. MHC class I molecules are proteins that are expressed on the surface of all nucleated cells and present peptides to CD8+ T cells. Class II MHC molecules, which consist of non-covalently associated α and β chains, are expressed only on antigen-presenting cells such as B cells, dendritic cells and macrophages. The α1 and β1 domains, which are encoded by separate genes, form the peptide-binding region (PBR), in which peptides are bound and recognized by CD4+ helper T cells. Three distinct isoforms of class II MHC molecules are expressed in human. They are denoted HLA-DR, -DQ and -DP. The DRB1 locus is the most polymorphic class II locus known in humans with approximately 360 alleles described (Robinson et al., 2003). The extreme level of polymorphism observed for these genes is thought to arise from the need to recognize a diverse repertoire of pathogens that exhibit antagonistic co-evolution with the host to evade immune recognition (Doherty and Zinkernagel, 1975), and from potential roles in mate recognition (or inbreeding avoidance) for some species (Jordan and Bruford, 1998; Penn and Potts, 1999; Reusch et al., 2001). The pathogen-driven selection hypothesis is supported by the characteristic patterns of molecular evolution exhibited by MHC class I and class II genes in most species, including significant excesses of non-synonymous mutations over synonymous mutations for residues involved in peptide binding, evidence of balancing selection (Hughes and Nei, 1988) and a ‘trans-species’ mode of evolution, where many alleles appear to be older than the species in which they are found (Figueroa et al., 1988; Klein et al., 1993).

While MHC variation is getting increasing attention for a growing number of species (Bernatchez and Landry, 2003), there is still debate over the relative importance of different mechanisms for generating the large allelic diversity seen at these MHC loci. From the first MHC studies at a molecular level, most attention has focused on pathogen-mediated selection as the primary mechanism driving the evolution of MHC diversity. Studies focusing on the pattern of molecular evolution show clear evidence of the imprint of selection by pathogens (Hughes and Yeager, 1998; Hughes, 2002). In contrast, for natural or seminatural population studies, direct empirical demonstration of the differential fitness of alternate MHC genotypes in response to pathogens has been more challenging, with few studies possessing the necessary supporting data and statistical power to detect subtle effects (Paterson et al., 1998; Lohm et al., 2002; Penn et al., 2002; Wegner et al., 2003).

Despite the importance of recombination in population genetics, remarkably little is known regarding the extent and rate of recombination, particularly in wild populations. Recombination rates are known to change across the human genome (McVean et al., 2004), but also between very closely related species (Ptak et al., 2004). Recent work has suggested that intragenic recombination may also be an important mechanism for the generation of allelic diversity at MHC class I and class II loci (Gyllensten et al., 1991; She et al., 1991; Andersson and Mikko, 1995; Satta, 1997; Bergstrom et al., 1998; Takahata and Satta, 1998; Mikko et al., 1999; Richman et al., 2003a; Schaschl et al., 2005). While mutations generate novel variation at individual sites, recombination can produce multiple substitutions per event with the potential for greater change in the antigen-binding properties of MHC molecules. Therefore, novel molecules generated by recombination may have a greater selective advantage (She et al., 1991; Andersson and Mikko, 1995). Assessing the relative importance of selection, recombination and mutation in generating the diversity seen at MHC genes relies on being able to make quantitative estimates of these parameters, but robust quantitative population genetic estimators for the amount of recombination have been lacking until recently. Therefore, previous studies of recombination in the MHC have often relied on non-parametric methods that can detect recombination in a sample of sequences, but without estimating the rate at which it occurs (e.g., Andersson and Mikko, 1995). Although previous quantitative analysis of recombination has been impeded by lack of a strong theoretical framework, new developments in this area are now permitting progress (Hudson, 2001; McVean et al., 2002; Stumpf and McVean, 2003). The approach of McVean et al. (2002) is a composite likelihood method (Hudson, 2001), based on coalescent theory, employing a finite sites mutation model (e.g., allow recurrent mutations) to estimate the population recombination rate (4Ner). This is combined with a powerful likelihood-permutation test to determine if observed values of 4Ner are significantly different from zero. Both the estimator and significance test are robust to some misspecification of the model of sequence evolution (Richman et al., 2003b). This new quantitative method has so far been applied to a limited number of taxa for the analysis of MHC sequences. For example, Richman et al. (2003a, 2003b) found evidence of extensive recombination at the MHC class II DRB gene in deer mouse (Peromyscus) taxa; the population recombination rate is significantly different from zero and greatly exceeds the population mutation rate. However, whether this pattern observed in Peromyscus species is representative of other taxa, and the importance of recombination as a general mechanism involved in the evolution of MHC diversity remains to be demonstrated. More comparative surveys of multiple species are an important contribution to this debate as population demography also plays a key role, and the apparent importance of selection and recombination may change after fluctuations in population size (Richman et al., 2003a). Quantifying the relative importance of recombination in MHC genes has important implications for our understanding of the evolution of these loci. Failure to account for recombination may significantly influence population genetic and phylogenetic inferences (Schierup and Hein, 2000; Schierup et al., 2001), affecting interpretations of MHC allelic genealogies, and estimates of the strength and mode of selection.

In this paper, we present a comparative analysis of MHC class II DRB diversity in 15 ungulate species and, in particular, we test whether recombination has played a significant role in the evolution of MHC class II DRB diversity.

Materials and methods

Samples and DNA isolation

Liver samples were obtained from 27 Alpine ibex (Capra ibex) from one population in Switzerland (Albris, Kanton Graubuenden, Switzerland), seven Spanish ibex (Capra pyrenaica) from the populations Tortosa-Beceite and Parque Natural Sierra de las Nieves – Ronda (Spain) and 18 Himalayan tahr (Hemitragus jemlahicus) from New Zealand. These animals were taken during hunts for population management purposes by licensed managing authorities, and were not killed specifically for this study. A blood sample of one tahr individual was collected at Zoo Schönbrunn (Vienna, Austria). Genomic DNA was extracted from frozen liver using standard phenol/chloroform–isoamyl alcohol extraction or with the Qiagen DNA extraction kit. The tahr blood sample was used to isolate mRNA, and this was converted into cDNA using the Titan One Tube RT-PCR-System from Roche Diagnostic GmBH, Vienna, Austria.

PCR, cloning and sequencing

MHC class II DRB exon 2 (β1 domain) sequences were obtained by PCR using locus-specific primers in a seminested PCR approach as described in Schaschl et al. (2004). PCR products were cloned into the TA vector pCR2.1 (Invitrogen Ltd., Karlsruhe, Germany), and then transformed in Escherichia coli-competent cells. From each individual, between five and seven positive (blue/white selection) clones were picked and grown overnight in 3 ml of LB-Medium. Plasmids were isolated by the QIAprep Spin Miniprep Kit (Qiagen) and subsequently checked for the correct inserts with EcoRI digestion or by running a second PCR with M13 primers. Those clones, which contained the correct inserts, were sequenced bi-directionally with the Applied Biosystems BigDye Terminator v2.0 Cycle Sequencing Kit, using M13 universal and M13 reverse primers, on the ABI PRISM 377 DNA Sequencer. Samples with poor cloning efficiency and sequences produced by PCR recombination were discarded. The Alpine ibex samples were directly sequenced, because we expected low level of heterozygosity. However, from five random samples, the PCR products were cloned and sequenced as described above. Sequence data were prepared for further analysis with BioEdit (Hall, 1999).

Sequence analysis and detecting positive selection

Sequences were aligned using the program ClustalX (Thompson et al., 1997). The selective pressure at the amino-acid level is measured by the non-synonymous/synonymous rate ratio (ω=dN/dS). A ω ratio greater than 1 (ω=dN/dS>1) indicate positive selection (e.g., Yang and Bielawski, 2000). We used the programme CODEML of the PAML 3.14 package (Yang, 1997) to test for the presence of codon sites affected by positive selection and to identify those sites by a Bayesian approach implemented in CODEML. The models considered in this study were M7 (beta) and M8 (beta&ω) (Yang et al., 2000). While recombination can potentially generate false positives in the detection of positive selection, these two models are much more robust against the occurrence of recombination than the other models implemented in CODEML (Anisimova et al., 2003). Bayes' prediction of sites under positive selection also appears to be robust to recombination effects (Anisimova et al., 2003). Under the model M7 (beta), the ω ratio varies according to the beta distribution (with parameters p and q) and does not allow for positive selected sites (0<ω<1), and thus serves as the null model for comparison with model M8 (beta&ω). Model M8 adds an additional site class to the beta model to account for sites under positive selection (ω>1). For each species, the M7 and M8 models can be compared using a likelihood-ratio test (Nielsen and Yang 1998). The likelihood-ratio test statistics calculates twice the log-likelihood difference (2) compared with a χ2 distribution with degrees of freedom equal to the difference in the number of parameters between the two models. To provide phylogenetic information required for the analysis (Yang, 1997), the best tree for DRB sequences in each species was identified by maximum likelihood under the one-ratio model (M0) in CODEML. The average number of non-synonymous substitutions per non-synonymous site (dN) and synonymous substitutions per synonymous site (dS) was estimated with the MEGA 2.1 program (Kumar et al., 2001), using the distance of Nei and Gojobori (1986) with the Jukes and Cantor (1969) correction for multiple substitutions. Standard errors were calculated by 1000 bootstrapping replicates.

Estimation of population selection, mutation and recombination rates

Since we lack information about effective population size for each species, estimates of the rates of selection, recombination and mutation have to be limited to the composite population parameters S, ρ and θ, respectively, which are the rates of selection (s), recombination per base (r) and mutation per base (u), scaled by effective population size. In trying to estimate these parameters simultaneously, however, an inherent problem arises, in that the model framework for estimating S assumes an absence of recombination, and estimates for ρ are made under the assumption of no selection. Yet, in the course of this work, we show that both these processes influence the sequences being studied. Current methodologies offer no easy fix to these important violations of underlying assumptions, but attempts have been made to try and quantify the bias likely to be introduced (Satta, 1997; Fearnhead and Donnelly, 2001; Richman et al., 2003a; Smith and Fearnhead, 2005). The main objective of this study is to try and assess the relative magnitudes of selection, recombination and mutation. The existing simulation studies suggest that while absolute values of parameters are likely to be compromised by violation of model assumptions, relative values can be accepted with more confidence. Therefore, we proceed by acknowledging the likely limitations on this work, starting by outlining key model assumptions in the following methods, and then provide a discussion of the limitations of current statistical methods in relation to our results in the final part of the paper.

Estimation of S

Under a model of symmetric balancing selection (Kimura and Crow, 1964; Takahata and Nei, 1990), population selection can be defined as S=2Ns, where N is the number of breeding individuals (effective population size) and s is the selective disadvantage of homozygotes. In this model, there are a fixed number of alleles (n) through time, allele frequencies are approximately evenly distributed and there is allelic turnover, so that when a new allele arises, it either replaces its parental allele in the genealogy or one of the other allelic lineages becomes extinct and is replaced. Here ‘alleles’ refers to functional alleles, and each new allele arises through a single non-synonymous mutation in the PBR according to an infinite allele model. Additional assumptions are finite sites, no recombination and there is no inherent correction for multiple mutations at the same site. Takahata et al. (1992) show that under these conditions, S is proportional to the number of functional alleles and the selection intensity, which can be estimated as:

where n is the number of functional alleles in the population and γ is the ratio of the number of non-synonymous substitutions per non-synonymous site to the number of synonymous substitutions per synonymous site to ((Kn/Ln)/(Ks/Ls)), where n is approximated from Kn, which is the average pairwise number of non-synonymous substitutions in the PBR among sequences (Takahata et al., 1992; Richman et al., 2003a). Kn, Ln, Ks and Ls were estimated from the data, using the Jukes and Cantor (1969), within the Mega 2.1 program (Kumar et al., 2001). Takahata et al. (1992) advocate that if correction for multiple substitutions is to be made, that should be done by selecting the data so as to include only ‘young’ alleles in which sites have not had chance to experience saturation. However, since recombination is likely to make assessing the apparent age of alleles problematic, to ensure consistent treatment of data across all species (and with the data used to estimate ρ), while maintaining sample sizes for species with low numbers of sequences, we chose to include all sequences for each species rather than to attempt a correction.

Estimation of ρ and θ

The LDhat program was used to estimate rates of population mutation (from Waterson's θ, where θ=4Neμ) and population recombination (ρ, where ρ=4Ner) (see McVean et al., 2002). This program estimates population recombination from a set of aligned sequences using the composite-likelihood method (Hudson, 2001) within a coalescent framework. For the estimation of θ and ρ, the analysis was restricted to bi-allelic sites where the frequency for the alternate allele was 0.1. This approach was taken because the method does not consider among site rate variation and excluding rare allelic variants can increase the power of the analysis (McVean et al., 2002). We used the likelihood permutation-based approach to test significance against the null hypothesis of no recombination (i.e., 4Ner=0). The LDhat program recovers accurate estimates of relative recombination rate even when assumptions of the standard model are violated, such as through the action of selection (Richman et al., 2003a; Smith and Fearnhead, 2005). However, under these circumstances, technically it may no longer be a true estimator of the population recombination rate (4Ner). We also calculated the ratio ρ/θ as an estimate of the relative amount of recombination compared to point mutations, which again is robust to deviations from the underlying coalescent model (Fearnhead and Donnelly, 2001), and the corresponding ratio S/θ. Meunier and Eyre-Walker's (2001) G4 and Awadalla et al.'s (1999) r2 statistics, which assess the correlation of linkage disequilibrium among pairs of polymorphic sites with the distance between them, were calculated as alternate indicators of recombination. In these cases, recombination is inferred if there is a significant decay of linkage disequilibrium with distance. Finally, we used the program DnaSP 4 (Rozas et al., 2003) to calculate the minimum number of inferred recombination events (RM) (Hudson and Kaplan, 1985) for each species, which can be interpreted as a minimal number of different positions at which recombination has occurred.

Sequence nomenclature and sequences from GenBank

In accordance with the proposed nomenclature for MHC in non-human species (Klein et al., 1990), we designated the exon 2 alleles Caib-DRB for Alpine ibex (Capra ibex), Capy-DRB for Spanish ibex (Capra pyrenaica) and Heje-DRB for Himalayan tahr (H. jemlahicus) with serial numbers attached. Further, MHC class II DRB exon 2 sequence data from different ungulate species were taken from GenBank with following Accessions numbers: Alpine chamois (Rupicapra r. rupicapra), Accession nos.: AY368437–AY368455; Pyrenean chamois (Rupicapra pyrenaica), Accession nos.: AY212149–AY212157; Bighorn-sheep (Ovis canadensis), Accession nos.: AF324840–AF324861; wild goat (Capra aegagrus), Accession nos.: U00183–U00202; sheep (O. aries), Accession nos.: U00204–U00233; domestic cattle (Bos taurus), Accession nos.: U00124–U00144; water buffalo (Bubalus bubalus), Accession nos.: AF385473–AF385480; African buffalo (Syncerus caffer), Accession nos.: AF059233–AF059241 (the sequence AF059236 was later excluded from the analyses due to potential sequencing artifacts); South African antelope, consistent of the two subspecies bontebok (Damaliscus pygargus pygargus) and blesbok (Damaliscus p. phillipsi), Accession nos.: AJ302736–AJ302762; white-tailed deer (Odocoileus virginianus), Accession nos.: AF082161–AF082175 and AF407169–AF407171; reindeer (Rangifer tarandus), Accession nos.: AF012716–AF012724; and moose (Alces alces), Accession nos.: X83278–X83286. Although much DRB sequence data is available for other ungulate species, we restrict this analysis to data that can be considered unambiguously derived from single loci and potential functionally, in order to avoid bias that would be introduced due to the inadvertent inclusion of multiple loci, as might be possible when non-locus-specific primers are used to amplify MHC class II loci from genomic DNA.

Results

Sequence analysis of novel exon 2 sequences

The novel DRB exon 2 (β1 domain) sequences obtained for this study were 284 bp in length including primer sequences. The nucleotide sequences excluding primers (236 bp) of the exon 2 from each species have been deposited in GenBank (Accession numbers: AY706312 for Alpine ibex and AY706313–AY706317 for Himalayan tahr). No more than two sequences were detected for any individual in the analyzed samples, and we are confident that products were amplified from a single MHC class II DRB locus. We obtained four exon 2 alleles from the Himalayan tahr from New Zealand samples and a fifth distinct allele (as a homozygous genotype) from the zoo sample (Figure 1). Among these alleles, 15.7% of 236 nucleotide sites and 29.5% of 78-amino-acid sites were polymorphic. Two exon 2 alleles were obtained from the Spanish ibex samples (Figure 1). These two alleles corresponded to two published exon 2 sequences on GenBank, named Capy-DRB1-1 and Cypy-DRB1-2 (Accession nos.: AF461692 and AF461693; Amills et al., 2004). An additional four Spanish ibex alleles (Capy-DRB1-3, Capy-DRB1-4, Capy-DRB1-5 and Capy-DRB1-6) have previously been identified (GenBank Accession nos.: AF461694–AF461696, and AY351788; Amills et al., 2004). These alleles were included in all subsequent analyses. For these exon 2 sequences 22.9% of 236 nucleotide sites and 38.5% of 78-amino-acid sites were polymorphic. From the 27 Alpine ibex individuals, we obtained only a single exon 2 sequence (Figure 1). The single Alpine ibex sequence was in all subsequently analyses excluded.

Figure 1
figure 1

Alignment of the putative amino-acid sequences for MHC class II DRB exon 2 from Alpine ibex (Caib-DRB), Spanish ibex (Capy-DRB) and from Himalayan tahr (Heje-DRB). Dots indicate identity in the amino-acid sequence to the sequence of the Consensus sequence and a cross indicates codons involved in the PBR in humans (Brown et al., 1993).

Detecting positive selection at sites using maximum-likelihood analysis

Likelihood-ratio tests comparing the two models M7–M8 demonstrated that M8 (the model that accounts for sites under positive selection (ω>1)), fitted the data from all species significantly (P<0.001), better than the model M7 (that does not allow for positive selection (0<ω<1)) (Table 1). In all species, the ω ratio is over 1, indicating positive selection in the MHC class II DRB sequences. Sites identified as being under positive selection in the Bayesian analysis are listed in Table 2. Those sites that were identified as positively selected sites (at the 95% confidence level) are mostly in accordance with the human PBR sites (HLA-DRB1 gene) (Brown et al., 1993). The identification of sites under positive selection is remarkably consistent across species, with many sites identified in most species (e.g., site 11 is identified in all except reindeer), and with majority identified in at least two species. The average number of synonymous substitutions per synonymous site (dS) and non-synonymous substitutions per non-synonymous site (dN) (plus s.e.'s) estimated for the exon 2 are given in Table 1.

Table 1 Likelihood ratio test comparing the models M7–M8 for evidence of positive selection and model parameters estimates (ω is the selection parameter and pn is the proportion of sites that fall into ωn site class)
Table 2 The plus symbol presents sites under positive selection (model M8, at the >95% confidence level) identified by a Bayesian method implemented in CODEML

Analysis of population selection, mutation and recombination

Estimates of population selection (S), mutation (θ) and recombination (ρ) show that each of these processes has played a significant role in generating the diversity seen among ungulate DRB alleles (Table 3). The likelihood-permutation test, implemented in the LDhat program, indicated the estimates of ρ were significantly different from zero for all taxa in the study (Table 3). With the exception of Spanish ibex, which had a small sample size, all the results were returned with P-values <0.01. Estimates of recombination rate are large for almost all species, and in three cases, domestic sheep, white-tailed deer and moose exceed the maximum value for ρ assessed (ρ>100). The two alternative statistical tests for recombination (r2, Awadalla et al. (1999); G4, Meunier and Eyre-Walker (2001)) also detected recombination rates significantly different from zero in most cases. For r2 this was in all species, except for Spanish ibex and bighorn sheep (Table 3, data for G4 are not shown). The DnaSP recombination test identified recombination events (RM) in all species (Table 3). The ratios ρ/θ and S/θ are also given in Table 3 as an indicator of the likelihood of nucleotides being involved in recombination and selection, respectively, relative to point mutation, and these should be less sensitive to the influence of population size. They show a similar pattern to that exhibited by ρ and S.

Table 3 Statistics and probability (P) values for population selection (S=2Ns; Takahata et al., 1992), population mutation (Wattersons's θ=4) and population recombination (ρ=4Nr; McVean et al., 2002)

Discussion

In this study, we examined genetic diversity in the MHC class II DRB gene and demonstrated, via several complementary analysis approaches, that recombination has made a significant contribution to allelic diversity in a range of ungulate species, in addition to the expected role of positive selection. While a role for recombination in generating MHC diversity has previously been suspected, very few studies have been able to provide quantitative estimates of the extent of recombination (e.g., Richman et al., 2003a, 2003b). In combination with the previous results for rodents, this study suggests that recombination is likely to play a key role in generating MHC diversity for a wide range of taxa, with important implications for interpretation of parameters relating to molecular evolution or inferences that rely on genealogies of MHC alleles.

In common with most other taxa with well-characterized MHC loci, we identified the signature of strong positive selection acting on DRB alleles, with the ω ratio>1 (dN/dS; Yang and Bielawski, 2000) in all the ungulate species. Bayesian analysis identified the individual positively selected sites as being congruent with human HLA-DRB1 PBR sites (Brown et al., 1993), and the sites were largely consistent across different species. The important role of selection is also indicated by the very high values of S and S/θ (Table 3), obtained for all species except moose. It is tempting to try and assess the relative importance of selection, recombination and mutation in generating DRB diversity, by comparing values of S, ρ and θ. However, this is problematic for a number of reasons discussed below. Despite this, the relative ranking of S>ρ>θ is consistent for all species except moose (which is known to have undergone a severe recent bottleneck), tentatively suggesting the history of most species has been shaped by a similar pattern of evolutionary processes.

Effects of demographic history

The interpretation of these results must be done in the context of the demographic history of each species since changes in population size potentially will have an important influence on the magnitude of parameter estimates and how they interact. Ungulates are a large and diverse group of mammals with over 200 species. The natural distribution covers all continents, with the exception of Australia and Antarctica. Most ungulates have a polygynous mating system, and females usually raise offspring alone. The species included in this study have experienced a range of different demographic histories, for example, buffaloes are likely to have had relatively stable, large population sizes, while alpine ibex have recently experienced an extreme bottleneck. The pattern of sequence variation we observe is consistent with such differing demographic histories, as revealed by the remarkable differences in nucleotide diversity, and in particular silent variation, among the 15 studied ungulates (Table 1), ranging from dS=0.006 for Alpine chamois to dS=0.106 for water buffalo. Moreover, for many species low levels of silent variation are present despite high levels of non-silent variation, for example, in Pyrenean ibex and white-tailed deer. Such patterns can arise owing to the interaction between balancing selection and recombination during and after population bottlenecks, as in combination these processes act to distribute non-silent variation among alleles more effectively than silent variation Schierup et al. (2001). While both silent and non-silent variation is exposed to the bottleneck, silent variation is less likely to be sampled and make it through the population contraction (Richman et al., 2003a), as selection will favor alleles with functional differences, not silent variation. Thus, the pattern of loss of variation in the DRB gene by Alpine ibex is consistent with the severe bottleneck this species experienced due to over-hunting throughout the 16th and 18th centuries, causing a precipitous decline that brought the species close to extinction (Maudet et al., 2002). Comparable low amounts of MHC class II DRB variation were found in other ruminant species, which experienced severe bottlenecks. For example, in Arabian oryx (Oryx leucoryxs), only three different exon 2 alleles have been identified at the homologous locus (Hedrick et al., 2000). In Roe deer (Capreolus capreolus), in samplings from Norway and Sweden, only four alleles were found, and in moose from northern Europe and North America, 10 different DRB exon 2 alleles have been identified with probably no silent diversity (Mikko and Andersson, 1995; Mikko et al., 1999). The latter study also revealed that musk-ox (Ovibos moschatus) and fallow deer (Dama dama) were virtually monomorphic at the DRB locus. However, such low levels of diversity are not representative for ungulates, as for other vertebrates, high levels of nucleotide diversity in the MHC are the norm (Mikko et al., 1999).

Limitations of current statistical methodologies

We have taken a comparative approach and tried to estimate the effective population size-scaled parameters (S, ρ, θ) that quantify sequence diversity due to selection, recombination and mutation. A major issue with these methods is that while they are currently among the best-explored approaches available in the literature, they rely on important assumptions such that ρ is only an estimator of the absolute population recombination rate in the absence of selection, and S is only an estimator of absolute population selection in the absence of recombination. Therefore, obtaining absolute estimates for each parameter under the action of the other is problematic. This is compounded by the fact that we lack methods for estimating confidence intervals for these parameters. While this is highly desirable, for ρ and θ at least, it is an intractable problem under the current theoretical framework used to estimate them (McVean et al., 2002). Further, there is the additional complication that there are likely to have been fluctuations in effective population size for most of the species. Therefore, we are left in a position where it is not easy to be confident of the precision of the absolute values of our parameter estimates. However, results from a range of simulation studies (Satta, 1997; Fearnhead and Donnelly, 2001; Richman et al., 2003b; Smith and Fearnhead, 2005) indicate relative values may be more robust to model violations, and it is these in which we are most interested. We, therefore, argue that we can be confident that there is good evidence for selection and recombination shaping sequence diversity in ungulate DRB genes, as can be best determined with theses methods. Similar caution should be applied to studies of other taxa likely to be subject to such biases with these methods.

The impact of violation of model assumption on S is perhaps least well explored (Satta, 1997; Schierup et al., 2001), and differences in potential biases are likely to arise under different demographic histories. Firstly, Takahata et al.'s model (1992) has a number of restrictive assumptions, which are not necessarily met by real data, in particular, the assumption of a fixed number of alleles through time. The model also assumes an infinite allele model where mutation occurs through the substitution of single nucleotides, whereas recombination may generate alleles that differ at multiple positions relative to the parental alleles. Biases in S can then arise through the effect recombination may have on the number of alleles segregating in a population and the number of nucleotide differences between them. Under the model assumptions of Takahata et al., the number of alleles in the population should be approximately equal to Kn, the mean number of pairwise non-synonymous differences in the PBR among sequences (hence S can be approximated using Kn calculated from the data). However, the simulation study of Satta (1997) showed that recombination can greatly increase the number of alleles in a population (n), while decreasing Kn (since recombination will ultimately act to homogenize variation along the length of the sequence). In consequence, recombination may act to bias S upwards if calculated directly from n, but could bias it downwards slightly if n is estimated from Kn. This effect will show a strong interaction with demographic history, as bottlenecks in particular can have a substantial effect on the number of alleles and sequence diversity among them. If bottlenecks, in conjunction with recombination, remove synonymous variation at loci under balancing selection (Schierup et al., 2001; Richman et al., 2003a), then in these circumstances S could be overestimated. Such a pattern may be visible in our data where the highest values of S were obtained for Spanish ibex and Alpine chamois, species known to have experienced recent bottlenecks. Additional sources of error come from potential misidentification of PBR sites in these taxa, functional significance of non-synonymous variation outside of the PBR and lack of correction for multiple hits. These could all potentially bias S downwards as they would lead to an underestimation of Kn. Recombination is also likely to be an important consideration in the estimation of the magnitude of positive selection acting on a sequence and the identification of individual sites under selection using maximum-likelihood phylogenetic approaches (Anisimova et al., 2003; Wilson and McVean, 2006), as with recombination there will be multiple evolutionary trees along the sequence. This can lead to an upward bias of ω and false positives for sites under selection. Having said this, the methods used in the current study do appear to be relatively robust (Anisimova et al., 2003). Ultimately, while caution must be used in assessing the precision of our estimators of selection, the fact that strong signatures of selection are obtained from different methods lets us be reasonably confident that selection has been an important process in shaping variation for these sequences.

In a recent simulation study, Smith and Fearnhead (2005) evaluated the accuracy and robustness of three population-scaled recombination rate estimators, including that calculated by LDhat. They concluded that while absolute values were likely to be sensitive to violations of standard model assumptions, relative recombination rates were inferred reasonably well. Although a balancing selection scenario was not included in their study, they did examine bias due to a selective sweep, finding only a weak influence on all three methods, with the approach used by LDhat showing 2% upward bias. They also found there was some evidence that bottlenecks may bias estimates of recombination downwards by 35–45%, and population growth can lead ρ to be overestimated by 10–30%. These demographic influences seem plausible since in very low-diversity populations recombination is less likely to generate new allelic variants than in cases where some variation is retained. Further support to the robustness of this inference approach comes from the fact that ρ values obtained by LDhat have been shown to be in strong concordance to ρ estimates derived from pedigree studies and from sperm-typing experiments (Myers et al., 2005). Thus, our observations of significant recombination rates are likely to be robust, although again the absolute values should be interpreted with caution. Similarly, Fearnhead and Donnelly (2001) demonstrated that the ratio ρ/θ is also robust to violations of the standard model, and should be less sensitive to variation in N, lending some support to our observation that overall, recombination rates are likely to exceed mutation rates for ungulates (although the absolute values should be treated cautiously) and, therefore, that potentially recombination may play a more significant role than mutation in generating allelic diversity at MHC loci for these species.

Clearly, there is need for progress in this area and recently Wilson and McVean (2006) have reported a new method for the simultaneous estimation of ω and ρ using a coalescent model implemented in a Bayesian framework. Their initial application of this method to sequences for porB3, a protein thought to be important for pathogenesis in Neisseria meningitidis, appears to have strong potential for overcoming many of the issues raised here, but its general utility remains to be explored. Integrating the quantitative interpretation of these statistics with fluctuations in demography remains problematic however.

General conclusions

Recombination is a powerful molecular mechanism that may generate (under balancing selection) genetic diversity in the MHC genes. The functional constraints within the exon 2 are restricted to relatively few positions (e.g., cysteine residues at positions 15 and 79 for correct folding), so high mutation and recombination rates may only rarely have negative consequences for the molecule's function. In populations that have undergone bottlenecks, but which have retained some diversity, allelic variation may be restored much faster with recombination than under the occurrence of point mutations alone. Further comparative studies of recombination rates at MHC loci in diverse taxa will aid in assessing the general importance of recombination in generating MHC variation. These are likely to add to the growing number of theoretical and empirical studies indicating that recombination plays a key role in generating MHC diversity, which is then maintained by selection (Satta, 1997). Future improvements to theoretical frameworks for selection recombination rate estimators (e.g., Wilson and McVean, 2006) will allow more robust assessments of interactions between selection, recombination and demographic history, and how together they shape patterns of allelic diversity at the MHC, or other loci subject to strong balancing selection.

One important consequence of recombination in the DRB gene is the effect on the shape of inferred genealogies. Schierup and Hein (2000) showed in their study that even very low rates of recombination significantly alter the shape of the inferred phylogenetic trees. They concluded that ignoring recombination in phylogenetic-based analyses of sequence data from populations causes underestimation of the time both to the most recent common ancestor and the amount of recent divergence. Moreover, it leads to the loss of a molecular clock and generates more apparent ancient polymorphism (e.g., trans-specific evolution). Thus, the origin of ungulate MHC class DRB alleles may be, in some cases, much more recent than previously assumed. Consequently, if similar levels of recombination occur in the MHC genes of other taxa as observed in this study, the use of traditional phylogenetic methods and inferences made from them may be inadequate.