Introduction

Until recently, the term human population genetics was almost synonymous with studies of mitochondrial DNA (mtDNA) variation. Following from the seminal work of Cann et al. (1987) and Vigilant et al. (1991), there was an explosion of papers characterizing patterns of variability, largely in the noncoding control region, or D-loop, in human populations from across the world. In the first or second paragraph of almost all such papers a statement concerning why the mitochondrial genome is the ideal molecule to use for such studies can be found. One of the benefits cited is that it is nonrecombining, making it is possible to construct well-resolved, and meaningful phylogenies of the molecule and, by implication, the history of human populations.

The world of human population genetics was shocked by the publication, in 1999, of three papers claiming to find evidence for recombination in the molecule (Awadalla et al., 1999; Eyre-Walker et al., 1999a; Hagelberg et al., 1999). In each case the evidence was indirect, coming from population genetic inference rather than direct observation of recombination in a pedigree. Eyre-Walker et al. (1999a) suggested that properties of the inferred tree of complete mtDNA sequences do not fit our understanding of mutational processes in the molecule and are more likely to be the result of recombination. Hagelberg et al. (1999) discovered what looked like a recombinant control region genotype among individuals from islands in the western Pacific. In addition Awadalla et al. (1999) found that pairs of segregating mutations separated by large physical distances in the mtDNA molecule show less linkage disequilibrium than those close together; a pattern that is the fingerprint of recombination.

The presence of recombination in mtDNA would cause a revolution in human population genetics because it questions the validity of conclusions made under the assumption that there is no recombination. Historically, the lack of recombination has been assumed from the very beginnings of mtDNA analysis (Cann et al., 1987). To a large extent this is probably because the dogma of maternal transmission has been accepted without question, despite considerable evidence over many years for the presence of paternal mtDNA in the fertilized egg (Ankel-Simmons & Cummins, 1996). Another factor has undoubtedly been the general agreement between phylogenetic trees reconstructed from mtDNA and other types of evidence about human history (e.g. Vigilant et al., 1991). A low level of recombination may not have any great impact on the ability of phylogenetic methods to reconstruct trees, and the reconstructed ‘average’ tree topology (when there is recombination) may still agree with the general features of demographic history, while being misleading about details such as the timing of coalescent events (Schierup & Hein, 2000).

Criticisms of the analyses

Unsurprisingly, the evidence for recombination has been strongly disputed. Both the quality of the data (Macaulay et al., 1999; Kivisild & Villems, 2000) and the methods of analysis (Jorde & Bamshad, 2000; Kumar et al., 2000) have been questioned, and there are strong biological arguments as to why recombination in mtDNA must be, at best, extremely rare (Eyre-Walker & Awadalla, 2001). Furthermore, two recently published analyses of complete mtDNA genome variability among Europeans (Elson et al., 2001) and individuals chosen to represent linguistic diversity (Ingman et al., 2000) have failed to find any evidence for recombination using a similar battery of tests.

The biological feasibility of mitochondrial recombination is an important empirical issue and is unresolved (Eyre-Walker & Awadalla, 2001). Mitochondria are known to possess at least some of the enzymes required for recombination (Thyagarajan et al., 1996), and human cells contain low frequencies of mitochondrial genomes rearranged by within-lineage recombination (Kajander et al., 2000). The most plausible route through which mitochondria may recombine in a manner that influences genetic diversity is through leakage of paternal mtDNA at fertilization. Paternal inheritance of mitochondria has been detected in interspecific crosses in mice (Gyllensten et al., 1991; Kaneda et al., 1995), cattle (Sutovsky et al., 1999) and Drosophila (Kondo et al., 1990), but it appears to be rare and there are active mechanisms for elimination of sperm mitochondria (Sutovsky et al., 1999). Even if paternal mtDNA did persist, there is little evidence that different mitochondria can fuse and exchange genetic material (Howell, 1997). The one precedent for naturally occurring recombination between mitochondria inherited from both parents is from yeast, where mitochondrial inheritance is typically biparental (Saville et al., 1998).

While there is no direct experimental evidence for interparental mitochondrial recombination in humans, it may be argued that in terms of how recombination influences patterns of genetic diversity, it is the population rate of recombination (the product of the effective population size, Ne, and the rate of recombination per individual per generation, r) that is important, not the absolute rate. Consequently, a very low rate of recombination per individual may have a significant effect if the population size is large enough. The effective population size for the mitochondria in humans is estimated to be less than 5000 (Takahata, 1993). Assuming the standard Fisher-Wright population model and a sample of 50 sequences, a population recombination rate of 2Ner=1 is approximately the lower limit for which at least one recombination event is expected in a genealogy; most recombination events will have no effect on patterns of genetic diversity. Therefore, the predominant mitochondrial type in at least one in 10 000 individuals should be a recombinant between maternal and paternal mtDNAs, and a much higher frequency of early stage embryos should carry a low frequency of recombinant molecules. While these calculations are crude, they give some idea that direct tests for recombination, either in pedigrees or lab experiments using mammalian models, should be possible to resolve the issue.

The quality of the data

The argument for recombination has not been helped by the demonstration that several features of the population genetic data indicative of recombination are the result of human error. The result of Hagelberg et al. (1999) was later realized to be caused by an alignment artefact (Hagelberg et al., 2000), and a number of sequencing errors in a subset of the complete genome sequences analysed by Eyre-Walker et al. (1999a) contributed to the apparent excess of multiple mutations in the data set (Macaulay et al., 1999). Claims have been made that sequencing errors were also present in one of the data sets analysed by Awadalla et al. (1999) (Kivisild & Villems, 2000), but this does not affect all the data sets for which a significant result was obtained (Awadalla et al., 2000). In addition, a further analysis by Eyre-Walker et al. (1999b), excluding the questionable data and incorporating some additional sequences, also reached the conclusion that there is an apparent excess of multiple mutations. In short, while problems with the data have had some impact on the strength of the conclusions, there are still patterns in the data that require explanation.

Substitution-rate variation and the Homoplasy Test

The statistical methods used to detect recombination have also been extensively criticised. The analysis of Eyre-Walker et al. (1999a) used a method for detecting recombination called the homoplasy test (Maynard Smith & Smith, 1998). In this test, the minimum number of homoplasies (repeated or back mutations at a specific nucleotide position) required to construct a single phylogenetic tree for the sampled sequences is compared to that expected under the best-fitting parameters from a chosen substitution model (Maynard Smith & Smith, 1998). If the observed number of homoplasies exceeds that expected for 95% of simulated sequences (assuming a star phylogeny, a conservative assumption) the model is rejected.

The critical problem with the method is that rejection of the model may simply mean rejection of the substitution model, and may have nothing to do with the presence of recombination. The most important single factor that is likely to bias the homoplasy test towards producing false-positives is variation in the substitution rate (Worobey, 2001). Any unacknowledged heterogeneity in the substitution rate will lead to an underestimate of the expected number of homoplasies, and evidence from between-species phylogenetic analyses indicates strong substitution rate variation in the mtDNA molecule (Pesole et al., 1999).

Eyre-Walker et al. (1999a) argued, however, that substitution rate variation cannot explain the levels of homoplasy observed in human mtDNA. They considered two models to show how the rate of substitution may vary between sites in the mtDNA genome; conditional rate variation (at some sites one base mutates at a much higher rate than others), and unconditional rate variation (at some sites all bases mutate at an elevated rate). Conditional rate variation was excluded on the basis that levels of sequence divergence between distantly related primates approach levels expected at saturation from base composition alone, while unconditional rate variation was dismissed by the demonstration that sites polymorphic in humans were not significantly more likely to be polymorphic in chimps. Although superficially convincing, the power of these tests is unknown and both critically assume stationarity of base composition. It is therefore important to note that there is evidence of nonstationarity in both the data sets of Awadalla et al. (1999) and Ingman et al. (2000). In particular, at sites where the bases T and C are segregating, a significant excess have C as the rare allele; 45 : 25 in the data of Awadalla et al. (1999) and 152 : 94 in the data of Ingman et al. (2000). If base composition were stationary, equal numbers of the two types would be expected (assuming no selection). Consequently, the data suggest an excess of T → C mutations and a trend for an increase in the frequency of C residues). Given that there are many additional factors whose influences on patterns of substitution are almost completely uncharacterized (context-dependent mutation rates, epistatic selection constraints), it would seem desirable that any test for the presence of recombination should be robust to an incomplete characterization of the mutational and selective processes that determine patterns of substitution. Finally, a variant of the homoplasy test, designed to account for substitution rate variation fails to detect any excess of homoplasies among complete coding region mtDNA sequences from humans (Worobey, 2001).

Decay or not decay?

Given the uncertainty about substitution models, the test used by Awadalla et al. (1999) is preferable, although less powerful under certain circumstances (Maynard Smith, 1999). Recombination breaks down associations between alleles. For autosomal loci, the rate of recombination per generation due to crossing over increases approximately linearly with the physical distance between sites. In a circular molecule, such as mtDNA, a gene-conversion type model for recombination in which the relationship between physical and genetical (recombination) distance is unlikely to be linear seems more appropriate. For a large class of models, however, some correlation between physical and genetical distance is expected (Wiuf, 2001). Consequently, while recombination predicts a negative relationship between the physical distance between pairs of segregating sites and the correlation of their evolutionary history as measured by linkage disequilibrium, there is no a priori reason to believe that mutational processes should generate the same result.

The approach used by Awadalla et al. (1999) was to compare the Pearson correlation coefficient between distance and the summary statistic

where D is the standard measure of linkage disequilibrium and pi and qi are the frequencies of the two alleles at locus i. This statistic is itself the square of the correlation coefficient between the two alleles (Hill & Robertson, 1968) and can range between 0 and 1; r2 is also directly proportional to the χ2 statistic in a contingency table analysis of independence. Significance levels are assessed through a permutation test in which the location of segregating sites is shuffled to generate an approximation to the null distribution of correlation coefficients. It should also be noted that Awadalla et al. (1999) only consider sites at which the rare allele is segregating at a frequency greater than 10%. Because sites at higher frequency will tend to be older, they will have had more opportunity for recombination, and offer greater power to detect recombination.

The problem is that there is more than one way to measure linkage disequilibrium. Jorde & Bamshad (2000) criticised the choice of statistic and showed that if an alternative statistic, |D′| (Lewontin, 1964) is used, there is no evidence for a significant correlation. |D′| is a ratio of the standard measure of linkage disequilibrium to the maximal (minimal if D is negative) value it could take given the allele frequencies. For any pairwise comparison in which only three of the four possible haplotypes are observed, |D′| will be one. Ingman et al. (2000) used both r2 and |D′| and found no evidence for a decay in linkage disequilibrium with distance, athough all segregating mutations were analysed. Elson et al. (2001) also investigated both measures of disequilibrium, and an additional one, δ (Devlin & Risch, 1995), that is supposed to be less sensitive to differences in allele frequencies. For the complete data set, none of the statistics gave a significant negative correlation between linkage disequilibrium and distance, although a subset of sequences gave a marginally significant result for r2 (Elson et al. 2001).

It is highly disconcerting that conclusions about recombination should be so sensitive to the way in which linkage disequilibrium is measured. Three critical questions must be answered. First, is there a preferable method for measuring linkage disequilibrium with respect to detecting recombination? Second, what patterns are observed if the preferred method is consistently applied to different samples? Finally, under what circumstances might we expect variation in the signal of recombination between data sets?

Measures of linkage disequilibrium and the detection of recombination

In the evolutionary sense, recombination reduces the correlation in underlying genealogy between different segments of the genome (Fig. 1). When there is no recombination, two sites will have identical histories. When there is free recombination, two sites will have independent histories, drawn from the distribution of possible histories given the underlying population parameters of demography and selection. Ideally, we would like to be able to directly measure the correlation in history between different sites, however, our knowledge of the underlying genealogy can only be inferred from patterns of genetic variability. Measures of linkage disequilibrium, such as r2 and |D′| aim to summarize the underlying correlation in genealogical history, by focusing on mutations segregating in the sampled sequences.

Fig. 1
figure 1

Recombination events occurring during the coalescent history of a sample of gene sequences can be represented by the ancestral recombination graph (a). Sites either side of the recombination breakpoint will have different genealogies (b), although much of the ancestral history is shared (thick lines). This shared history creates correlations in the genealogical history that can be measured by statistics of linkage disequilibrium.

What should we desire from a measure of linkage disequilibrium? First, we are not generally interested in the mutations themselves (except when mapping a disease-causing mutation), but in the underlying genealogies, so the measure of association should be largely independent of the allele frequencies. Second, the measure should take a high value when the underlying correlation is high, and a low value when the underlying correlation is low. Finally, under some circumstances, particularly when allele frequencies are low, pairs of segregating sites are uninformative about the underlying genealogical correlation. We either wish to exclude such sites, or find a measure that can identify them.

The problem is that no single measure of linkage disequilibrium fulfils all these criteria. The r2 statistic is sensitive to allele frequencies and low values can either mean low correlation or low power, though high values are only likely to occur when the correlation is high. |D′| is often claimed to be superior (Hedrick, 1987; Jorde & Bamshad, 2000) because its range is insensitive to allele frequencies, however, both high and low values are likely for any degree of genealogical correlation. Other measures of disequilibrium also fail to meet one of the desired criteria (Hedrick, 1987). We can illustrate the problem by way of an example, a comparison of two samples, both of five sequences (Fig. 2). In both samples, the rare mutation at both loci is found as a singleton, but whereas the rare mutations are found on the same chromosome in the first sample, they are on different chromosomes in the second. In terms of the underlying genealogical correlation (as determined by the recombination rate) the first sample is considerably more likely when recombination is low, but the second sample is uninformative about the underlying recombination rate. As desired, both r2 and |D′| take a value 1 for the first sample, but r2 is almost zero (suggesting the presence of recombination) for the second and |D′| remains at one (suggesting the absence of recombination). The other measure used by Elson et al. (2001), the δ ofDevlin & Risch (1995), is undefined for the first sample.

Fig. 2
figure 2

The relationship between statistics of linkage disequilibrium and information about recombination. Two mutations, each found only once in the sample and on the same chromosome (a), lead to the maximal value of both r2 and |D′|, the two most commonly used summary statistics. In contrast, if the mutations are found on different chromosomes (b), r2 is close to zero, while |D′| remains at one. While the first configuration is more likely if the population recombination rate is low, the second configuration is almost equally likely for any recombination rate. Likelihoods are calculated by the method of Fearnhead & Donnelly (in press), with a two-allele, symmetrical mutation model and θ=0.01 per site.

Essentially the problem is that statistics of linkage disequilibrium tend to confound either the absence (|D′|) or presence (r2) of recombination with a lack of power. Awadalla et al. (1999) reduce the problem by removing all segregating sites at a frequency lower than 10%. As we have seen, however, even singletons can be informative about recombination, and there are configurations in which both mutations are above 10% that are uninformative. One possible solution is to use those pairs that are informative about recombination, a property that can be assessed by comparing the coalescent likelihood of observing each pairwise comparison under low and high recombination. This can be calculated exactly for infinite-sites model (Hudson, in press) or estimated by Monte-Carlo simulation (Fearnhead & Donnelly, in press).

Figures 3 and 4 show the results of linkage disequilibrium analyses for the global human data set of Awadalla et al. (1999) and Ingman et al. (2000); all mutations, both synonymous and nonsynonymous, have been used to provide maximal power. The data of Elson et al. (2001) are not publicly available. Figure 3 shows the results obtained using sites for which the rare allele is at a frequency of at least 10%. Figure 4 shows the results when all pairwise comparisons that are informative about recombination are used (the criterion employed is that the absolute difference in log-likelihood of observing the data when there is free recombination, compared to when there is no recombination, must be > 1). For the data of Awadalla et al. (1999), when all sites are used, there is a significant negative correlation between r2 and distance, but no such relationship for |D′|. This effect is most dramatic when just those sites segregating at a frequency of at least 0.1 are used (Fig. 3). If we just used those pairwise comparisons that are informative about recombination, however, both r2 and |D′| show significant (P < 0.05) negative correlations with distance (Fig. 4). In contrast, for the data of Ingman et al. (2000), neither r2 or |D′| show any significant correlation with distance, no matter how the data is analysed (Figs 3 and 4).

Fig. 3
figure 3

The relationship between physical distance and linkage disequilibrium for the complete mtDNA data sets analysed by (a) Awadalla et al., 1999 and (b) Ingman et al. 2000. The panels show the plots of r2 and |D′| for all pairwise comparisons for which the rare allele at both loci is at a frequency of at least 10%. Correlation coefficients and significance values estimated by 10 000 permutations are displayed below each plot.

Fig. 4
figure 4

The relationship between physical distance and linkage disequilibrium for the complete mtDNA data sets analysed by (a) Awadalla et al., 1999 and (b) Ingman et al. 2000. The panels show the plots of r2 and |D′| for all pairwise comparisons for which the absolute difference in log-likelihood between a model of free recombination and a model of complete linkage is greater than 1.0. Correlation coefficients and significance values estimated by 10 000 permutations are displayed below each plot.

Different data sets, different stories?

The important point about the above analysis is that for each data set, focusing on just those pairwise comparisons that are informative about recombination gives the same result, irrespective of which statistic of linkage disequilibrium is used. Furthermore, there really does appear to be a decay of linkage disequilibrium with distance in the data set of Awadalla et al. (1999) that is not sensitive to methodology, while no such pattern is discernable in the data of Ingman et al. (2000).

How can two data sets tell such different stories? There are two possibilities. Either patterns of linkage disequilibrium are being misleading about the underlying evolutionary forces generating patterns of genetic variability in one of the data sets, or the genealogical histories of the sampled sequences differ in terms of the amount of recombination. There is good reason to suppose that the signal of recombination may vary considerably in strength between samples. Population structure, important if genes are sampled from admixed or multiple, isolated populations, may obscure the signal of recombination that would be present if the genes had been sampled from a single, randomly mating population. This prediction agrees with the evidence for recombination in Scandinavian, Siberian and Native American populations (Awadalla et al., 1999), but lack there is a lack of evidence from the data of Ingman et al. (2000), whose sequences were sampled to reflect total human linguistic diversity. However, the story is more complicated, because the global data set analysed by Awadalla et al. (1999) does show evidence of recombination (Fig. 4), while the European data set of Elson et al. (2001) does not, although the analysis did include two African sequences. Elson et al. (2001) found a particular subset of sequences analysed (using 58 of the 66 sequences) for which there was a marginally significant negative correlation for r2 (ρ=–0.31; P=0.08). It could be argued that rare recombination events may only be detectable under certain circumstances, particularly when other factors, especially demography, are also likely to have influenced patterns of linkage disequilibrium. However, inconsistency between the population range of samples and evidence for recombination, and sensitivity of conclusions to specific sequences significantly weakens the hypothesis of recombination. In addition, caution should be attached to the significance of the particular configuration from the Elson et al. (2001) data subset, given the number of different tests carried out.

Where next?

Currently, the balance of the evidence points away from recombination being an important determinant of patterns of genetic variability in human mtDNA sequences. But before relief spreads too far through the world of human population genetics, a number of questions remain to be addressed. Most importantly, if not recombination, what has generated the relationship between distance and statistics of linkage disequilibrium in the data sets analysed by Awadalla et al. (1999)?

There are a number of possibilities. It has been suggested that sequencing protocols may give rise to correlated errors and an apparent decay in linkage disequilibrium with distance (Hey, 2000). Sequencing errors are, however, most likely to occur as singletons, and as Hey (2000) noted, the results are insensitive to their removal. Another scenario is that there is some correlation in the mutation rate between adjacent regions of the mtDNA molecule, and that historically, the rate of mutation has varied over time and location, a pattern that appears to apply to mtDNA evolution in Drosophila (Ballard, 2000). Epistatic selection constraints between nearby mutations could also lead to correlated, compensatory mutations. In short, any factor that creates correlations between mutations will lead to increased linkage disequilibrium, and if the correlation is stronger for sites separated by short physical distances, this may lead to a negative relationship between disequilibrium and distance.

One important path by which correlations between mutations might occur is if genes experience periods of local adaptive evolution (Fig. 5). The accumulation of multiple mutations in a subset of lineages over a short period of time within a given gene will create mutations at similar frequencies that will show strong linkage disequilibrium (as measured by r2). In order for |D′| to show the same pattern, at least some of these sites must also experience mutation (substitution) in other lineages at other times.

Fig. 5
figure 5

Local adaptive evolution can lead to a correlation between linkage disequilibrium and distance because a burst of substitution in gene 1 in a particular subset of lineages (a) will create correlations between mutations in the sequence (b) that can be detected by measures of linkage disequilibrium (c).

Can these types of models explain why different data sets show different patterns? Correlations between mutations caused by local adaptation, or geographical mutation (or substitution) rate variation, are more likely to be detected when genes are sampled from a small number of partially isolated populations, than a single population or a diverse global sample. Figure 6 shows the relationship between sequences analysed by Awadalla et al. (1999) and Ingman et al. (2000). The most noticeable feature is that the samples analysed by Awadalla et al. (1999) fall into three main clades, while the Ingman et al. (2000) sequences are more diverse (particularly with respect to African samples). Under these circumstances we might expect the data set of Awadalla et al. (1999) to be more likely to show the effect of spatially varying selection and mutation pressures.

Fig. 6
figure 6

The phylogenetic distribution of samples analysed by Awadalla et al. (1999) (solid lines), and Ingman et al. (2000) (dashed lines). The neighbour-joining tree was generated from the entire mtDNA molecule alignment and is rooted by the chimpanzee sequence. Clades represented only by sequences from Awadalla et al. (1999) are shaded.

Many unusual patterns of mutation and substitution have been described in the mitochondrial genomes of mammals and Drosophila, including extreme base composition bias, asymmetric mutation rates, lineage-specific rates and patterns of substitution, large-scale heterogeneity in the substitution rate across the molecule, and adaptive evolution (Pesole et al., 1999; Ballard, 2000; Yang et al. 2000). It is a major challenge to modern population genetics to incorporate such biological realism into our understanding of the forces that have shaped patterns of linkage disequilibrium in the human genome.