Indirect measures of gene flow and migration: FST≠1/(4Nm+1)

Whitlock, Michael C; McCauley, David E

doi:10.1038/sj.hdy.6884960

Download PDF

Short Review
Published: 01 February 1999

Indirect measures of gene flow and migration: F_ST≠1/(4Nm+1)

Michael C Whitlock¹ &
David E McCauley²

Heredity volume 82, pages 117–125 (1999)Cite this article

27k Accesses
1243 Citations
14 Altmetric
Metrics details

Abstract

The difficulty of directly measuring gene flow has lead to the common use of indirect measures extrapolated from genetic frequency data. These measures are variants of F_ST, a standardized measure of the genetic variance among populations, and are used to solve for Nm, the number of migrants successfully entering a population per generation. Unfortunately, the mathematical model underlying this translation makes many biologically unrealistic assumptions; real populations are very likely to violate these assumptions, such that there is often limited quantitative information to be gained about dispersal from using gene frequency data. While studies of genetic structure per se are often worthwhile, and F_ST is an excellent measure of the extent of this population structure, it is rare that F_ST can be translated into an accurate estimate of Nm.

Gene trajectory inference for single-cell data by optimal transport metrics

Article 05 April 2024

Rihao Qu, Xiuyuan Cheng, … Yuval Kluger

The Persian plateau served as hub for Homo sapiens after the main out of Africa dispersal

Article Open access 25 March 2024

Leonardo Vallini, Carlo Zampieri, … Luca Pagani

The covariance environment defines cellular niches for spatial inference

Article Open access 02 April 2024

Doron Haviv, Ján Remšík, … Dana Pe’er

Main

Everything should be made as simple as possible, but not simpler. – Albert Einstein.

Introduction

The movement of individuals and genes in space affects many important ecological and evolutionary properties of populations (Hanski & Gilpin, 1997). For example, it is well known that the extent of gene flow affects species integrity, because gene flow counters divergence which can lead to the evolution of reproductive isolation. The rate of movement of genes from one population to another helps to determine the possibility of local adaptation and of adaptive evolution on complex landscapes. Furthermore, dispersal affects the persistence of local populations, species extinction rates, the evolution of species ranges, synchrony of population size changes, and many other important ecological properties. These genetic and ecological issues have taken new urgency in the wake of the rapid loss of biodiversity, since developing effective species conservation strategies depends on knowing the genetic and ecological relationships among populations. Population biologists would very much like to be able to measure the rate at which migration among populations occurs and have collectively devoted a great deal of effort towards measuring gene flow, migration, and their consequences in a large number of species.

Unfortunately, direct measures of migration are fraught with difficulty. Marking and following individual organisms is at the least very time-consuming and expensive, and often technically very difficult. Mark and recapture techniques are prone to biases: long-distance dispersal may be very hard to observe but very important biologically. Estimates of migration are limited in time and do not accurately reflect rare but important events, such as the dramatic gene flow which may accompany storms or climatological shifts. Finally, direct measures of dispersal do not necessarily reflect the movement of genes, because the migrant must reproduce effectively in the new location for gene flow to have occurred.

As a result of these problems, methods have been developed that attempt to use gene frequency data to infer the extent of gene flow in natural populations indirectly (Slatkin, 1985, 1987). Most famously, Sewall Wright's island model of population structure predicts that, if a long list of assumptions is true, the variance in gene frequencies among different populations should be related to the number of migrants which come into each population each generation. With the advent of molecular biology, it has become easy to measure the distribution of alleles within and among populations and therefore tempting to use these data to study gene flow. A number of recent papers have addressed the estimation of gene flow (Milligan et al., 1994; Neigel, 1997; Bossart & Prowell, 1998a), but there is controversy about the usefulness of these estimates (see Bohonak et al., 1998, Bossart & Prowell, 1998b).

These indirect estimates of gene flow have the advantage that the data necessary to make such estimates are relatively easy to gather. Further, such estimates reflect migration rates averaged among numerous populations through time. However, indirect estimates of gene flow are not without their own problems. In particular, since those estimates rely on a mathematical relationship between genetic structure and the rate of gene flow, such estimates implicitly assume that the ecological properties of the populations from which the genetic data are taken match the often unrealistic assumptions of the theoretical model upon which that mathematical relationship is based. Even when such an estimate is warranted, the estimate is subject to sampling error, which can be very large. The central theses of this paper are that these real deviations from the artificial assumptions of the models undermine the reliability of indirect measures of gene flow and that these measures have a high degree of statistical uncertainty. We suggest that, for many applications, measures of genetic structure are valuable in their own right, but that transformations of these measures to quantitative estimates of gene flow or dispersal are at best not needed and, at worst, misleading.

Underlying theory

Wright's F-statistics are a set of hierarchical measures of the correlations of alleles within individuals and within populations. The F-statistic most relevant to the study of gene flow is F_ST, which has various interpretations; most famously it is the variance in allele frequencies among populations, σ²_p, standardized by the mean allele frequency (p) at that locus:

See Slatkin (1985) for details concerning its derivation and Weir (1996) concerning its estimation. Wright (1931) introduced a simple model of population structure, called the island model, which predicts a simple relationship between the number of migrants a population receives per generation and F_ST (Fig. 1). Under the assumptions of the island model,

where N is the effective population size of each population and m is the migration rate between populations. Since F_ST can be estimated readily from data gathered with molecular techniques, we would seem to have a way to quickly measure the number of migrants coming into a population per generation, Nm. The promise of such easy information has led to a minor cottage industry of estimating Nm from F_ST. For example there were 13 papers in this journal which have done this in 1997 alone. (Note that there are several methods for deriving a measure of differentiation from genetic data, such as G_ST, Φ_ST, AMOVA, private alleles, etc., but the estimates of gene flow derived from each of these make fundamentally the same assumptions as F_ST, and we will be referring to these measures collectively in the following section.)

The island model, however, makes a large number of simplifying assumptions. It assumes an infinite number of populations, each always with N diploid individuals, and that each of these populations gives and receives a fraction m of its individuals into and from a migrant pool each generation. The individuals which do migrate are randomized and dispersed back to the populations without respect to any geographical structure, such that all populations are equally likely to give and receive migrants from all other populations. Furthermore the island model assumes that there is no selection or mutation and that each population persists indefinitely and has reached an equilibrium between migration and drift. Each of these assumptions is unlikely to be true in any particular case; sometimes this will not matter very much at all with regard to estimating Nm, but in some cases it will matter tremendously. One intention of this review is to investigate the common ways in which natural systems violate the assumptions of the island model and to explore the effects these deviations from the simple model will have on the quantitative and qualitative conclusions from indirect studies of gene flow.

The Fantasy Island model: violating the unrealistic assumptions of the island model

Violation of each of the assumptions of the island model can significantly affect the interpretation of the results. In this section we will discuss the likelihood of various deviations from the island model and their implications. We have organized our discussion of these assumptions into five categories.

1. No selection

One significant assumption made by all models of population structure used to infer gene flow from genetic patterns is that the different alleles at the loci being measured are selectively neutral and that none are linked to selected loci. In fact there is much evidence that many loci are under selection, including many of the loci studied as markers of gene flow themselves. The topic of the neutrality of allozymes and various other markers is far too large to review here; we merely wish to add a reminder that this can be an important source of error in the interpretation of population structure statistics. Selection can either increase or decrease F_ST relative to the neutral case.

Several studies have demonstrated significant differences in the F-statistics estimated from genetic markers derived from coding vs. noncoding DNA (e.g. Karl & Avise, 1992; Pogson et al., 1995; Bossart & Prowell, 1998). The important implication of these studies is that there is strong selection in some species affecting the pattern of genetic differentiation, especially at allozyme markers.

Theoretically, overdominance, underdominance, and local adaptation can change the expected value of F_ST, even with the same value of Nm (Charlesworth et al., 1997; Slatkin & Barton, 1989). Underdominance and local adaptation serve to inflate genetic differentiation; overdominance and spatially uniform selection tend to reduce the genetic variance among populations. Frequency-dependent selection should decrease F_ST if there is a single internal equilibrium, but increase it if there are multiple equilibria. Particularly when the migration rate is small, selection can easily be strong enough to dominate the pattern of genetic differentiation.

Furthermore, selection acting at other loci can affect the distribution of marker alleles. Charlesworth et al. (1997) have demonstrated that linkage to locally selected alleles will substantially increase F_ST. They have also shown that background selection (the constant selection against deleterious mutations at many loci in the genome) can result in a substantial increase in F_ST. This background selection is likely to be extremely common.

Finally, selection caused by inbreeding depression can act to inflate the effects of migration (Ingvarsson & Whitlock unpublished; Berry et al., 1991). If local populations are inbred, then migrant individuals produce outbred offspring which can have substantially higher fitness than those individuals with two local parents, because the offspring of migrants are outbred and therefore may have a higher fitness. As a result, selection can substantially enhance the effective rate of gene flow. On the other hand, in cases where migrants come from a very long distance or a very distinct breeding pool, the offspring of migrants may suffer from outbreeding depression and therefore the effective migration rate would be diminished (Barton & Bengtsson, 1986).

2. No mutation

New mutation can also affect the pattern of genetic differentiation among populations, but unless the mutation rate is large relative to the migration rates, this will present little problem for interpreting genetic differentiation. With DNA-level genetic markers, such as microsatellites and mitochondrial DNA, the rates of mutation can be quite high relative to the migration rate, and merit special attention (see, e.g. Goldstein et al., 1995; Slatkin, 1995).

3. All populations are created equal, with a constant number of individuals and equal contributions to the migrant pool

This assumption is particularly unrealistic, since almost any species will have a great deal of variation in local population size and in immigration and emigration rates. In this section we will discuss the consequences of violating these assumptions.

If the migration rate is not very high (say, less than 10%), then the expected F_ST in an island model population depends approximately not on N and m separately, but only on their product Nm. As a result, even if N and m vary, there will be no effect on F_ST as long as Nm is constant. Often, however, the effect of variation in N is not counterbalanced by variation in m, and Nm is variable among demes.

It is easy to see why we must be concerned with spatial variation in migration rates and population sizes, especially if we consider a common variant of population structure, source-sink metapopulations (Harrison et al., 1988; Pulliam, 1988; Dias, 1996; Gaggiotti & Smouse, 1996; Whitlock & Ingvarsson, in prep.). Imagine the case where some populations (sinks) are not capable of sustaining themselves without immigration from other populations of higher reproductive capacity (sources). If these sinks are sufficiently poor that they produce almost no emigrants, they will contribute almost nothing to the evolutionary future of the species. Yet the differentiation among sink populations can be much greater than that among source populations, if there is a higher rate of population turnover among the sinks. In this case, the F_ST measured from the metapopulation without knowledge of whether a population is a source or sink would bias any estimate of the effective Nm of the metapopulation as a whole. Situations of asymmetric migration are also common, for example, in areas of varying habitat quality or with directional dispersal vectors, such as in an ocean or river current.

More generally, N and m vary across populations. Island biogeography theory predicts variance in migrant number for a variety of reasons (MacArthur & Wilson, 1967), as has often been observed in natural populations (Ebenhard, 1991). Furthermore, dispersal is often distance dependent, such that populations near many other populations receive a greater number of migrants, whereas more isolated populations receive fewer (Brown & Kodric-Brown, 1977). If N and m both vary, they could vary independently or they could covary. McCauley (1991) and Ingvarsson (personal communication) have found correlations between migration rate and population size in two species of insect. In most cases the net effect is that Nm (the number of individuals entering populations) varies among populations. In this case, the F_ST that we measure does not correspond to the average value of Nm in the metapopulation, but rather is extremely biased (see Whitlock, 1992b). When we measure F_ST by traditional techniques, we are measuring an average correlation of alleles within demes. The average of this correlation across demes is a nonlinear function of the Nm of a deme, so the ‘average’ Nm estimated from F_ST data becomes biased downwards. Figure 2 demonstrates this effect.

An extreme, yet common, form of variation in population size is recurrent local extinction and colonization of populations. In many species, the turnover of populations is high enough to substantially affect (usually increase) F_ST (Slatkin, 1977; Wade & McCauley, 1988; McCauley, 1989; Whitlock & McCauley, 1990; Whitlock, 1992a; McCauley et al., 1995; Giles & Goudet, 1997). This is because founding events usually involve far fewer individuals than a habitat patch can eventually sustain and because the finite life of individual populations limits the time over which subsequent gene flow can ameliorate the effects of the initial founding events. In many cases (Whitlock, 1992a; Ingvarsson et al., 1997), the F_ST of the population is substantially different from that predicted by using the island model with the direct measures of N and m alone, although the direct measures of demographic parameters predict F_ST very well when they include the effects of extinction and recolonization (Whitlock & McCauley, 1990). The effects of extinction and recolonization can be particularly pronounced for extranuclear genomes that are inherited uniparentally, especially in many angiosperms in which maternally inherited chloroplast and mitochondrial DNA can disperse only in seeds (McCauley, 1995).

4. There is NO spatial structure: migration is completely random

One of the most obvious deviations in natural populations from the assumptions of the island model is that migration rates are correlated with the distance between populations (Wright, 1943, 1946; Neigel, 1997). Populations which are farther apart tend to exchange fewer migrants. A recent review by Neigel (1997) summarizes recent advances in measuring various parameters important in determining the amount of genetic differentiation in a metapopulation with isolation by distance; here we will address the biases which arise from interpreting a system with distance-biased dispersal as if it were an island model with free migration.

Kimura & Weiss (1964), in their classic paper introducing the stepping stone model (where discrete demes are most likely to exchange migrants with adjacent demes), showed that the correlation among demes in allele frequencies drops with increasing distance, and that this would happen more rapidly in a one-dimensional system than in two dimensions. While this correlation is a function of the migration rate, F_ST in this kind of system does not behave as in the island model. The genetic differentiation of stepping stone systems is substantially greater for the same number of migrants coming into a deme per generation. The same value of F_ST is consistent with much or little migration, depending on the geometry of migration.

Obviously, measures of dispersal rates can be made only at the spatial scale at which the samples were taken. A population may have much dispersal locally, but none at a larger scale, or even vice versa. Thus, even if island model assumptions hold at the scale at which the sample is taken, they may not at a larger or smaller spatial scale, and therefore results from that scale will not extrapolate. This problem could be particularly important when Nm values are compared between species without acknowledging that each species may have been sampled at a different spatial scale.

One method has been proposed to measure the pattern of migration among specific pairs of populations (Slatkin, 1993). This formulation might sometimes be informative, but it is perhaps worthwhile to point out a potential misunderstanding in its use. Slatkin shows that the isolation by distance between populations can be estimated by calculations of F_ST for each pair of populations. He defines circ;M as the value of Nm that would give the pair-wise F_ST. It is important to realize, however, that this circ;M does not reflect the actual dispersal between two populations, but instead is another measure of differentiation. A pair of populations in the same metapopulation which receive migrants from the same sources will have a low F_STeven if they exchange no migrants at all (even indirectly).

Similarly, many geographical features can restrict gene flow between sets of populations, such as rivers, highways, mountain ranges, etc. In many circumstances, the equal migration assumption of the island model is clearly not true. Any particular geographical feature which could potentially be a barrier to gene flow can be examined by using hierarchical F-statistics (see Weir, 1996). Ignoring this hierarchical structure can result in substantial biases in estimates of gene flow (Husband & Barrett, 1994).

5. Everything is at equilibrium, nothing is changing

Another major assumption of methods of estimating migration rates from gene frequency data is that the whole population has reached an equilibrium between the forces of migration and genetic drift. Yet in many cases this is clearly not the case. Many species are naturally in new ecological contexts, such as high-latitude tree populations which have been in situ for mere tens of generations in many cases. A recent range expansion can cause migration and drift to have insufficient time to reach equilibrium and therefore give migration estimates biased towards the previous conditions. As an extreme case, populations that have recently been completely isolated will not yet necessarily reflect the equilibrium predicted by current levels of gene flow. Low F_ST values do not imply current gene flow.

This is a particular problem for attempts at estimating gene flow among ‘species’ - the large population size and low migration rate expected among species means that the equilibrium F_ST will take an extremely long time to be reached, perhaps longer than the history of the speciation event. F_ST cannot be used to infer the rate of gene flow among species.

We live in a time of rapid anthropogenic change. Many species are fragmented into smaller, more distant subpopulations than the species previously experienced; similarly, species which deal well with human disturbance have increased in number and are perhaps more connected by migration than they were historically. As a result, many species, especially those for which conservation biologists have particular concern, are not expected to be at an equilibrium between migration and drift, and therefore indirect dispersal estimators based on F_ST are likely to be in significant error.

The time (in generations) required for a metapopulation to reach equilibrium is increased by low migration rates and large population size (Crow & Aoki, 1984; Whitlock, 1992b). The time it takes for F_ST to reach halfway from an old value to a new equilibrium is ln(1/2)/ln[(1−m)²(1−1/2N)] (Whitlock, 1992b), which can be extremely long if population sizes are large and migration rates are low (Fig. 3).

The differences between dispersal and gene flow

Throughout we have attempted to maintain a subtle distinction between dispersal and gene flow. The differences can be very important for interpreting the results of an indirect measure of gene flow. Of course, in order for a dispersal event to effect gene flow, the migrant individual must successfully mate and breed, and at least some of its offspring must grow to adulthood. Nm in the island model refers to an ‘effective’ number of individuals in a population (N_e) and the effective proportion of breeding individuals that are migrants (m). In many ecological contexts, this effective Nm is not what is being sought; often we would prefer to know the number of individuals moving to a patch, consuming its resources and interacting with its residents in a variety of ways. Migrants that are reproductively unsuccessful may nevertheless significantly affect the ecology of their new deme. A good example of this would be the dynamics of a host/parasite or disease system, which is significantly affected by spatial structure and dispersal patterns, but where migrants can introduce disease to a new patch even without reproducing.

There are many reasons why migrants may have different fitness from resident individuals. Dispersers are often of different age classes, physical condition, social status, or genotype from nondispersers (Chepko-Sade & Halpin, 1987; Roff & Simons, 1997). The offspring of migrants are more likely to be outbred, and migrant genes may spread faster because of heterosis or more slowly because of outbreeding depression. Furthermore, migration may occur nonrandomly with respect to the life-cycle. If young individuals move, then most of their reproductive effort will be in the new patch and the usual assumptions of the island model are met. However, if individuals move after some of their reproductive life, then their contribution to the new deme is less than a full individual (Endler, 1979; McCauley, 1983).

One particular problem of estimating dispersal rates from genetic data is that the ‘Nm’ value from F_ST is actually N_em. The effective population size of a deme will be much less than the actual population size. The best estimates of N_e/N average about 10%, but are extremely variable among species (see Frankham, 1995 for a nice review). As a result, the actual number of migrants into a deme is likely to be 10-fold or more higher than that estimated by F_ST for this reason alone! Because of the variation in N_e/N ratios, however, this means that it is almost impossible to translate an N_em estimate into an estimate of the actual number of migrant individuals. Estimating N_e is extremely difficult; many applications of the F_ST approach are attempting to measure m rather than N_em, and without an accurate measure of N_e this becomes impossible.

With seed plants (and some other sedentary organisms as well), the differences between dispersal of individuals and gene flow can be even greater, because gene flow by gametes (i.e. pollen) may greatly outstrip movement by seeds (Ennos, 1994). For most ecological purposes, the movement of individual diploid organisms is much more relevant than gene flow alone.

Statistical issues

Even when the estimation of Nm from F_ST can be interpreted biologically as an unbiased estimate of the number of individuals moving between populations, that estimate of Nm comes with a great deal of uncertainty. This is because, for logistical reasons, estimates of F_ST are usually based on a small number of loci scored for a limited number of individuals taken from a few populations. Because F_ST is essentially the ratio of two variances, it is difficult to measure accurately without a large data set. Worse, because F_ST is a nonlinear function of Nm, estimates of Nm from F_ST will be especially inaccurate. Small differences in F_ST can result in large differences in estimates of Nm (Fig. 2), therefore the error in estimating F_ST is amplified when estimating Nm. As a result, the confidence interval for estimates of Nm can be enormous, particularly for F_ST values less than 0.1, as is often the case in nature.

To illustrate this we conducted a simple computer simulation in which F_ST estimates were created by randomly sampling five loci from each of 50 individuals from each of 10 demes (approximately the size of a typical data set used to estimate F_ST in nature) from an underlying set of island model demes with an F_ST of 0.005. This procedure was repeated 1000 times and the results are shown in Fig. 4. Estimates of F_ST ranged from slightly negative to greater than 0.02. The 95% confidence limits for values of Nm calculated from these F_ST estimates was 11–283 (Fig. 4), compared to the true value of 50. With a true value of F_ST=0.02, the confidence intervals are not as broad. These errors result not only from the sampling error caused by the finite samples of individuals and alleles from a finite number of subpopulations, but also from the random evolutionary history of any particular locus.

F_ST, as estimated from a typical size sample, is capable of estimating Nm only roughly, sometimes only within a couple of orders of magnitude. In many cases, this is a useful scale of resolution; more often, it is not (see below). In any event, estimates of Nm should always be accompanied by confidence limits. Recent advances in the sampling theory of F-statistics make this easier (Balding & Nichols, 1995).

Furthermore, as has been suggested by Jim Mallet (personal communication), errors in scoring of individual genotypes can potentially contribute strongly to the apparent F_ST measured in a population. These errors are often controlled for (by, say, running individuals from different populations at random on a particular gel), but when they are not, temporal variation in scoring of genotypes could easily contribute a strong systematic bias to the estimation of F_ST.

Non-diploid genetics

The expected F_ST for non-diploid genetic loci is of course much different from the F_ST for diploids. Haplodiploid or X-linked genes will depend critically on the sex-ratio of dispersers and in general give higher F_ST values (Kimura, 1963; Whitlock, 1995). Uniparentally inherited loci such as mitochondrial DNA or chloroplast DNA will have differentiation patterns which depend only on the population size and migration patterns of the sex of individuals which transmit these genes; therefore the level of differentiation tends to be much higher (McCauley, 1995). For haploid genomes, of course there are effectively half as many allele copies as there are for a diploid genome, so that the effects of drift are higher and the differentiation among populations is increased, such that the expected F_ST under the island model is ≈1/(2N_e,O m_O+1), where N_e,O is the effective size and m_O the migration rate of the sex transmitting the genome in question.

Conclusions

It is clear that the promise of easy estimates of dispersal by inference from genetic data must be viewed with caution. These methods make numerous assumptions about populations which are unlikely to be true, and there are other difficulties with the interpretation of these data. Although our central thesis is that great care should be taken in the interpretation of genetic data, we conclude with some more optimistic observations.

First, there is a great difference between studying genetic data to estimate dispersal and doing so to estimate genetic differentiation. For many purposes, F_ST is an excellent measure of the genetic differentiation among populations, and indeed studying the genetic structure of a population is essential to understanding its evolutionary properties. Most of the concerns we have brought up do not affect this conclusion. Often F_ST is truly intended to measure the genetic differences between populations, and in these cases we simply suggest not translating F_ST into a measure of Nm.

Second, the technology of direct estimation of dispersal has improved substantially. Smaller radio transmitters (Koenig et al., 1996), marker-assisted migration estimates (Devlin & Ellstrand, 1990; Broyles et al., 1994; Nason & Hamrick, 1997), and improved estimation techniques have allowed observation of dispersal events which were previously invisible to direct methods. These methods allow other important natural history information to be recorded, do not suffer most of the problems outlined above, and are not necessarily more expensive than indirect studies. A renewed emphasis on direct measures would spur even more development along these lines. Direct observations of dispersal through mark and recapture methods, or direct observations of gene flow through genetic methods, are labour intensive and are most useful when applied to a few focal populations. However, in addition to providing an estimate of gene flow and/or migration, they might provide some insight into the suitability of F_ST estimates for predicting Nm. If one can not reconcile the direct and indirect estimates, it would seem profitable to explore which additional ecological features cause a departure from the assumptions of the island model (population turnover, recent range expansion, temporal variation in migration rates, etc.).

Third, recognition of the limitations of the island model may spur theoreticians to account for these issues in more realistic models which could allow for using genetic data in a more realistic way. At this point, the issue most obviously being addressed in this way is the problem of isolation by distance, for which there is a growing literature dating back to Wright's isolation by distance papers (Wright, 1943, 1946; Slatkin, 1993; Neigel, 1997). Genetic data have also been used to infer the importance of extinction/colonization (Whitlock, 1992a; McCauley et al., 1995; Giles & Goudet, 1997; Ingvarsson et al., 1997), source-sink dynamics (Dias et al., 1996), and others, but more formal models are required. Furthermore, we need to use genetic data to investigate a broader spectrum of demographic processes. A transition to hypothesis testing, comparing the genetic structure of separate elements of a metapopulation (i.e. young populations vs. old, near vs. distant, mainland vs. island, etc.), can give us more insight into the importance of various factors in creating genetic patterns.

Fourth, F_ST measures may give reasonable estimates of N_em in cases where the spatial scale is small (so that migration may follow the island model and selection is less likely to cause strong patterns of genetic differences), migration rate is relatively high (so that equilibrium conditions are quickly reached), sample sizes and number of loci are large (to account for statistical issues), and when the biological questions are truly asking for an estimate of the effective rate of gene flow expressed as N_em. These conditions do not always hold, and it could be argued that if we knew these conditions to be the case we would already know more about the populations than we will learn from this indirect measure.

Finally, a note of cautious (and perhaps foolish) optimism. For the reasons we have discussed throughout this review, estimates of gene flow based on F_ST are unlikely to be very reliable. However, these estimates are likely to be correct within a few orders of magnitude. Comparisons of large groups of species are likely to be more informative, as many of the differences may average out. Estimates of dispersal from F_ST should be undertaken with great caution, and only if the biological question behind the attempt at estimating dispersal depends on knowing migration rates within very large bounds.

References

Balding, D. J. and Nichols, R. A. (1995). A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica, 96: 3–12.
Article CAS PubMed Google Scholar
Barton, N. H. and Bengtsson, B. O. (1986). The barrier to genetic exchange between hybridizing populations. Heredity, 57: 357–376.
Article PubMed Google Scholar
Berry, R. J., Triggs, G. S., King, P., Nash, H. R. and Noble, L. R. (1991). Hybridization and gene flow in house mice introduced into an existing population on an island. J Zool, 225: 615–632.
Article Google Scholar
Bohonak, A. J., Davies, N., Roderick, G. K. and Villablanca, F. X. (1998). Is population genetics mired in the past? Trends Ecol Evol, 13: 360
Article CAS PubMed Google Scholar
Bossart, J. L. and Prowell, D. P. (1998a). Genetic estimates of population structure and gene flow: limitations, lessons and new directions. Trends Ecol Evol, 13: 202–206.
Article CAS PubMed Google Scholar
Bossart, J. L. and Prowell, D. P. (1998b). Reply from Bossart and D. Pashley Prowell. Trends Ecol Evol, 13: 360
Article CAS PubMed Google Scholar
Brown, J. H. and Kodric-Brown, A. (1977). Turnover rates in insular biogeography: effect of immigration and extinction. Ecology, 58: 445–449.
Article Google Scholar
Broyles, S. B., Schnable, A. and Wyatt, R. (1994). Evidence for long-distance pollen dispersal in milkweeds (Asclepias exaltata). Evolution, 48: 1032–1040.
PubMed Google Scholar
Charlesworth, B., Nordborg, M. and Charlesworth, D. (1997). The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genet Res, 70: 155–174.
Article CAS PubMed Google Scholar
Chepko-Sade, B. D., Halpin, Z. T. (eds) (1987). Mammalian Dispersal Patterns: the Effects of Social Structure on Population Genetics. University of Chicago Press.
Google Scholar
Crow, J. F. and Aoki, K. (1984). Group selection for a polygenic behavioural trait: Estimating the degree of population subdivisions. Proc Natl Acad Sci USA, 81: 6073–6077.
Article CAS PubMed PubMed Central Google Scholar
Devlin, B. and Ellstrand, N. C. (1990). The development and application of a refined method for estimating gene flow from angiosperm paternity analysis. Evolution, 44: 248–259.
Article CAS PubMed Google Scholar
Dias, P. C. (1996). Sources and sinks in population biology. Trends Ecol Evol, 11: 326–330.
Article CAS PubMed Google Scholar
Dias, P. C., Verheyen, G. R. and Raymond, M. (1996). Source-sink populations in Mediterranean Blue tits: evidence using single-locus microsatellite probes. J Evol Biol, 9: 965–978.
Article Google Scholar
Ebenhard, T. (1991). Colonization in metapopulations - a review of theory and observations. Biol J Linn Soc, 42: 105–121.
Article Google Scholar
Endler, J. A. (1979). Gene flow and life history patterns. Genetics, 93: 263–284.
CAS PubMed PubMed Central Google Scholar
Ennos, R. A. (1994). Estimating the relative rates of pollen and seed migration among plant populations. Heredity, 72: 250–259.
Article Google Scholar
Frankham, R. (1995). Effective population size/adult population size ratios in wildlife: a review. Genet Res, 66: 95–107.
Article Google Scholar
Gaggiotti, O. E. and Smouse, P. E. (1996). Stochastic migration and maintenance of genetic variation in sink populations. Am Nat, 147: 919–945.
Article Google Scholar
Giles, B. E. and Goudet, J. (1997). A case study of genetic structure in a metapopulation. In: Hanski, I. A. and Gilpin, M. E. (eds) Metapopulation Biology: Ecology, Genetics and Evolution, pp. 429–454. Academic Press, New York.
Book Google Scholar
Goldstein, D. B., Ruiz Linares, A., Cavalli-Sforza, L. L. and Feldman, M. W. (1995). An evaluation of genetic distances for use with microsatellite loci. Genetics, 139: 463–471.
CAS PubMed PubMed Central Google Scholar
Hanski, I. A. and Gilpin, M. E. (1997). Metapopulation Biology: Ecology, Genetics and Evolution. Academic Press, New York.
Google Scholar
Harrison, S. J., Murphy, D. D. and Ehrlich, P. R. (1988). Distribution of the bay checkerspot butterfly Euphydryas editha bayensis: evidence for a metapopulation model. Am Nat, 132: 360–382.
Article Google Scholar
Husband, B. C. and Barrett, S. C. H. (1994). Estimates of gene flow in Eichhornia paniculata (Pontederiaceae): effects of range substructure. Heredity, 75: 549–560.
Article Google Scholar
Ingvarsson, P. K., Olsson, K. and Ericson, L. (1997). Extinction-recolonization dynamics in the mycophagous beetle Phalacrus substriatus. Evolution, 51: 187–195.
Article PubMed Google Scholar
Karl, S. A. and Avise, J. C. (1992). Balancing selection at allozyme loci in oysters: implications from RFLPs. Science, 256: 100–102.
Article CAS PubMed Google Scholar
Kimura, M. (1963). A probability method for treating inbreeding systems especially with linked genes. Biometrics, 19: 1–17.
Article CAS Google Scholar
Kimura, M. and Weiss, G. H. (1964). The stepping stone model of population structure and the decrease of genetic correlation with distance. Genetics, 49: 561–576.
CAS PubMed PubMed Central Google Scholar
Koenig, W. D., van Vuren, D. and Hooge, P. N. (1996). Detectability, philopatry, and the distribution of dispersal distances in vertebrates. Trends Ecol Evol, 11: 514–517.
Article CAS PubMed Google Scholar
MacArthur, R. H. and Wilson, E. O. (1967). The Theory of Island Biogeography. Princeton University Press, New Jersey.
Google Scholar
McCauley, D. E. (1983). Gene flow distances in natural populations of Tetraopes tetraophthalmus. Evolution, 37: 1239–1246.
Article PubMed Google Scholar
McCauley, D. E. (1989). Extinction, colonization, and population structure: a study of a milkweed beetle, Tetraopes tetraophthalmus. Am Nat, 134: 365–376.
Article Google Scholar
McCauley, D. E. (1991). The effect of host plant patch size variation on the population structure of a specialist herbivore insect, Tetraopes tetraophthalmus. Evolution, 45: 1675–1684.
Article PubMed Google Scholar
McCauley, D. E. (1995). The use of chloroplast DNA polymorphism in studies of gene flow in plants. Trends Ecol Evol, 10: 198–202.
Article CAS PubMed Google Scholar
McCauley, D. E., Raveill, J. and Antonovics, J. (1995). Local founding events as determinants of genetic structure in a plant metapopulation. Heredity, 75: 630–636.
Article Google Scholar
Milligan, B. G., Leebens-Mack, J. and Strand, A. E. (1994). Conservation genetics: beyond the maintenance of marker diversity. Mol Ecol, 3: 423–435.
Article Google Scholar
Nason, J. D. and Hamrick, J. L. (1997). Reproductive and genetic consequences of forest fragmentation: two case studies of neotropical canopy trees. J Heredity, 88: 264–276.
Article Google Scholar
Neigel, J. E. (1997). A comparison of alternative strategies for estimating gene flow from genetic markers. Ann Rev Ecol Syst, 28: 105–128.
Article Google Scholar
Pogson, G. H., Mesa, K. A. and Boutilier, R. G. (1995). Genetic population structure and gene flow in the Atlantic cod Gadus morhua: A comparison allozyme nuclear RFLP loci. Genetics, 139: 375–385.
CAS PubMed PubMed Central Google Scholar
Pulliam, H. R. (1988). Sources, sinks, and population regulation. Am Nat, 132: 652–661.
Article Google Scholar
Roff, D. A. and Simons, A. M. (1997). The quantitative genetics of wing dimorphism under laboratory and ‘field’ conditions in the cricket Gryllus pennsylvanicus. Heredity, 78: 235–240.
Article Google Scholar
Slatkin, M. (1977). Gene flow and genetic drift in a species subject to frequent local extinctions. Theor Pop Biol, 12: 253–262.
Article CAS Google Scholar
Slatkin, M. (1985). Gene flow in natural populations. Ann Rev Ecol Syst, 16: 393–430.
Article Google Scholar
Slatkin, M. (1987). Gene flow and the geographic structure of natural populations. Science, 236: 787–792.
Article CAS PubMed Google Scholar
Slatkin, M. (1993). Isolation by distance in equilibrium and non-equilibrium populations. Evolution, 47: 264–279.
Article PubMed Google Scholar
Slatkin, M. (1995). A measure of population subdivision based on microsatellite allele frequencies. Genetics, 139: 457–462.
CAS PubMed PubMed Central Google Scholar
Slatkin, M. and Barton, N. H. (1989). A comparison of three indirect methods for estimating average levels of gene flow. Evolution, 43: 1349–1368.
Article PubMed Google Scholar
Wade, M. J. and McCauley, D. E. (1988). Extinction and recolonization: their effects on the genetic differentiation of local populations. Evolution, 42: 995–1005.
Article PubMed Google Scholar
Weir, B. S. (1996). Genetic Data Analysis II. Sinauer Associates, Sunderland, MA.
Google Scholar
Whitlock, M. C. (1992a). Nonequilibrium population structure in forked fungus beetles: Extinction, colonization, and the genetic variance among populations. Am Nat, 139: 952–970.
Article Google Scholar
Whitlock, M. C. (1992b). Temporal fluctuations in demographic parameters and the genetic variance among populations. Evolution, 46: 608–615.
Article PubMed Google Scholar
Whitlock, M. C. (1995). Two-locus drift with sex chromosomes: the partitioning and conversion of variance in subdivided populations. Theor Pop Biol, 48: 44–64.
Article CAS Google Scholar
Whitlock, M. C. and McCauley, D. E. (1990). Some population genetic consequences of colony formation and extinction: Genetic correlations within founding groups. Evolution, 44: 1717–1724.
Article PubMed Google Scholar
Wright, S. (1931). Evolution in Mendelian populations. Genetics, 16: 97–159.
CAS PubMed PubMed Central Google Scholar
Wright, S. (1943). Isolation by distance. Genetics, 28: 114–138.
CAS PubMed PubMed Central Google Scholar
Wright, S. (1946). Isolation by distance under diverse systems of mating. Genetics, 31: 39–59.
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Thanks to many people who have read previous versions of this manuscript and made many helpful suggestions: Sally Otto, Rick Taylor, Pelle Ingvarsson, Mike Stamford, Steve Latham, and Jim Leebens-Mack. In particular we would like to thank the reviewers of this manuscript for thoughtful and generous contributions which have greatly improved this paper: Nick Barton, Jim Mallet, and Joe Neigel. This work was supported by a Natural Sciences and Engineering Research Council (Canada) grant to M. C. W. and by National Science Foundation grant DEB-9610496 to D. E. M.

Author information

Authors and Affiliations

Department of Zoology, University of British Columbia, Vancouver, V6T 1Z4, BC, Canada
Michael C Whitlock
Department of Biology, Vanderbilt University, Nashville, 37235, Tennessee, USA
David E McCauley

Authors

Michael C Whitlock
View author publications
You can also search for this author in PubMed Google Scholar
David E McCauley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael C Whitlock.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Whitlock, M., McCauley, D. Indirect measures of gene flow and migration: F_ST≠1/(4Nm+1). Heredity 82, 117–125 (1999). https://doi.org/10.1038/sj.hdy.6884960

Download citation

Received: 08 October 1998
Accepted: 13 November 1998
Published: 01 February 1999
Issue Date: 01 February 1999
DOI: https://doi.org/10.1038/sj.hdy.6884960

Keywords

This article is cited by