Standard models suggest that there are two different forms of homologous recombination, commonly called crossover (which may be accompanied by gene conversion) and gene conversion (without an associated crossover)1. Crossover rates vary tremendously across the human genome2, by several orders of magnitude over distances as small as 1 kb (ref. 3). But there are few direct data on homologous gene conversion rates4, and no information on the extent to which gene conversion rates vary across the genome. In the accompanying paper, Alec Jeffreys and Celia May5 estimated gene conversion rates at three known crossover hot spots in humans by sperm typing. They found that all three regions were gene conversion hot spots as well and that in each case, the location of the peak of conversion activity coincided with the peak of crossover rates. The coincidence of these peaks suggests that the molecular mechanisms generating most crossovers and gene conversion events are related.

Recombination and LD

Recombination is one of the primary factors that affect LD (the nonrandom association of alleles at different sites). Standard population genetics models of recombination generally ignore gene conversion, even though crossovers and gene conversions have different effects on the structure of LD. Recombination between pairs of markers that are far apart are almost exclusively crossovers, whereas pairs of markers that are close together are affected by both crossovers and gene conversion events. This is crucial for interpreting patterns of human sequence variation. For example, analyses of human data have found less LD than expected over short distances (e.g., <5 kb) given the LD observed over longer distances (e.g., >100 kb; ref. 6). This seemingly discordant observation is exactly what would be expected under a model of recombination that incorporates both crossovers and gene conversion. In addition, adding gene conversion to our models will make other population genetics applications, such as inference of population history from patterns of LD, both easier and more reliable.

The effect of gene conversion on LD also affects human genetics, a primary goal of which is to identify the genetic variants that affect susceptibility to complex diseases. Much recent work on the subject focuses on association mapping methods that use LD. Association mapping attempts to identify causal variants by typing many (e.g., thousands) single-nucleotide polymorphisms (SNPs) in a sample of unrelated individuals and then determining whether any of the SNPs are associated with the disease phenotype of interest. The rationale is that even if the SNPs that were typed do not directly affect disease susceptibility, they will be in strong LD with markers that do. The optimal marker density and success of association studies depend on the fine-scale structure of LD, and in particular on the expected decay of LD with physical distance. Jeffreys and May5 estimated that most recombination events are gene conversions (80–94% of events) and that mean tract lengths (i.e., size of the converted piece) are small (55–290 bp). For these parameter values, gene conversion would more than double the effective recombination rate between closely spaced markers (e.g., ones within 2 kb of each other) but would have little effect on pairs of distant markers.

This might be important when trying to predict levels of LD between typed and untyped markers. Generally, when two nearby markers are in strong LD with each other, it is assumed that all markers in between are in strong LD with both end markers. This is the motivation for several recent definitions of 'haplotype blocks' (reviewed in ref. 7). But when the gene conversion rate is high and the marker density is low, it is possible that the intervening SNPs are not in strong LD with the end markers7 (Fig. 1). These SNPs can be thought of as 'holes' in haplotype blocks; these holes would reduce the efficacy of association studies. Further empirical and theoretical studies are needed to determine what practical effect, if any, this will have on future association studies.

Figure 1: Schematic of the possible effects of gene conversion and marker density on patterns of LD.
figure 1

Horizontal lines represent chromosomes, and ovals represent markers. Ovals that are the same color are in strong LD with each other. (a) Three typed markers (labeled 1, 2 and 3) are in strong LD with each other. The patterns of LD of all the unobserved markers between these three are unknown. (b) One possibility, assumed in much of the discussion of haplotype blocks, is that all the unobserved markers are in strong LD with the end markers. (c) Another possibility, which is more probable if there are high rates of gene conversion or the original markers were far apart, is that many of the unobserved markers are in strong LD with the original three but some are not. Whether real data look more like b or c depends on both experimental parameters (e.g., sample size, marker density) and intrinsic parameters (e.g., gene conversion rate, crossover rate).

Recombination rate variation

A more detailed assessment of the potential relevance of gene conversion to patterns of LD and to questions in population and human genetics would require data on gene conversion rates and conversion tract lengths from many other regions of the genome. Because estimating gene conversion parameters indirectly from patterns of LD is rather difficult to do accurately8, more direct experimental data, for example, from sperm typing, would be a welcome development.

Jeffreys and May5 chose to study known crossover hot spots in the MHC in part because the high recombination rates and high marker densities made direct parameter estimation much easier. But because the MHC hot spot regions may be atypical, it is not clear whether the gene conversion parameter estimates from these regions are applicable to the rest of the genome. The fraction of recombination events that are gene conversions may be higher in regions of low crossover rate, and mean conversion tract lengths may vary with levels of heterozygosity9,10,11. These empirical questions will eventually be answered. But comparable sperm typing studies in regions of the genome with average recombination rates (e.g., with crossover rates of 1.3 cM per Mb; ref. 12) will have to screen many more sperm to recover the same number of recombinants and, because of the lower marker densities, will be less informative about the distribution of conversion tract lengths.

For now, the results of Jeffreys and May5, together with recent work from the Jeffreys laboratory (and other laboratories), raise a host of other questions about recombination rate variation. What is the scale over which crossover and gene conversion rates vary across the human genome? How common are recombination hot spots? Are there regions that are hot spots for either crossover or gene conversion activity but not both? How strongly are male recombination rates, which can be estimated by sperm typing, correlated with female recombination rates? Given that there can be large differences in recombination rates across individuals2,13, there is also the possibility that there are large differences in recombination rates between different human populations.