In nature, there would be no panmictic populations; any population is at least partially structured into subpopulations, which should be in different environments. Migration allows subpopulations to share genetic variation, which contributes to the maintenance of genetic variation within each subpopulation. Migration also enhances adaptation to local environments because some alleles could be adaptive in certain environments but not in others. Thus, to understand the evolutionary dynamics of a population, it is very important to quantify the level of migration. As it is a challenging task to directly estimate the migration rate in wild populations, it has been a common approach to use genetic variation data including microsatellites and single-nucleotide polymorphisms (SNPs) (Slatkin, 1985a; Neigel, 1997; Broquet and Petit, 2009).

Classically, Wright (1951) introduced FST, a summary statistic of population differentiation. FST measures the difference in heterozygosity among populations, which can be easily computed for any kind of polymorphism data. It is well known that the expectation of FST is given by 1/(1+4 Nm) in the island model with equal effective sizes of subpopulations (N) and uniform migration rates among them (m). When Nm is large, FST is small because there is little difference in heterozygosity between subpopulations, while FST is large when Nm is small. Given the simple relationship between FST and Nm, FST is very frequently used for estimating Nm for various species (Holsinger and Weir, 2009, but see Whitlock and McCauley, 1999).

Slatkin (1981, 1985b) proposed an alternative idea to estimate Nm by using private alleles, which are defined as alleles that appear in the sample from only one subpopulation (Neel, 1973). Figure 1 illustrates a hypothetical situation with three subpopulations, where there are four private alleles (presented in black). In this highly cited gem from the Heredity archive, Barton and Slatkin (1986) obtained the analytical relationship between the average frequency of private alleles and Nm, suggesting that they are roughly in a linear correlation for a reasonable range of Nm. This simple relationship allowed the private allele frequencies to be a simple estimator of Nm, which has been commonly used for decades.

Figure 1
figure 1

An illustration of the spatial distribution of shared (white) and private (black) alleles in a three-subpopulation model.

What is the difference between the two methods for estimating Nm? As both of them are summary statistics, they reflect only part of the data. In the ideal situation (that is, sampling with no errors from equilibrium populations under neutrality), the expectations of the estimates of Nm by the two methods would be the same, but when some assumptions are violated they would be different. The direction and extent of the bias caused by such violations have not been fully explored, but we can have some intuitive understanding. For example, private allele-based estimates of Nm should be most sensitive to recent migration because most private alleles are relatively rare (Slatkin and Takahata, 1985). Note that rare alleles are expected to be young. In contrast, FST is a summary statistic based on heterozygosity, which is largely determined by the frequencies of common alleles. Because common alleles are usually old, FST should reflect migration over a relatively longer time span. Because both estimators assume neutrality, any kind of selection will lead to bias. This bias could be stronger for one measure than for the other. For example, FST should be more sensitive to local adaptation because it causes a major shift in the common alleles frequencies. See Slatkin and Barton (1989) for more technical discussions on the difference between the two estimators.

Given the obvious importance of understanding population dynamics and evolution, these two simple methods for estimating Nm made significant contributions in ecology and evolution especially since the 1990s. They were applied to genetic variation data from a wide range of species, partly because the two methods are incorporated in the GENEPOP software (Raymond and Rousset, 1995). Thanks to dramatic improvement in computational power, this field is shifting to depend more on computationally intensive methods using likelihood-based algorithms such as Markov chain Monte Carlo (MCMC) and approximate Bayesian computation (ABC) methods (Nielsen and Wakeley, 2001; Beaumont et al., 2002). Nevertheless, simple theoretical solutions for FST and private allele frequencies provide great intuitive understanding of migration and are useful in various situations. One interesting example is comparing estimates of Nm within a single genome, which gives significant insights into natural selection (Storz, 2005). If migration is defined as movements of individuals between populations, we should have similar estimates of Nm from different genomic regions, but this does not hold when selection is active. Consider two subpopulations, I and II, between which migration is allowed. Selection works on a particular biallelic locus with alleles A and B; A is favored in population I but disfavored in population II, and vise versa. Then, as B is preferentially selected out in subpopulation I and A is selected against in subpopulation II, those migrants are less likely to contribute to genetic admixture between the two subpopulations. In this situation, the migration rate is ‘effectively’ reduced because of less success in admixture. Other unlinked genomic regions are free from this selection, so that there would be no reduction in the effective migration rate, making a clear contrast to the selected locus. In genome-wide polymorphism data, thus, there could be heterogeneity in the ‘effective’ migration rate due to selection. There would also be cases where the effective migration rate is elevated at the selected gene. Suppose a new population emerges, in which A is assumed to be advantageous over B, then, there should be preferential migration of A into this new niche, resulting in an increased effective migration rate. This idea has been frequently used to scan a genome for evidence of selection, and there are many successful demonstrations of selection (for example Akey, 2009). For this kind of large-scale polymorphism data analysis, there are many situations where simple summary statistics are very useful and powerful. Thus, Nm tells not only about migration itself but also about the action of natural selection working on particular genomic regions, making it very important information in ecology and evolution.

Unfortunately, FST has been predominantly used as a summary statistic to describe the level of migration for a long time, but the amount of information we can obtain from FST alone is very limited. The proportion of private alleles is a useful second summary statistic. With dramatic improvement in computational power, the current trend is toward using as much information from data as possible. An example is the likelihood-based analysis under the isolation with migration (IM) models (Nielsen and Wakeley, 2001), in which the major focus is on the ratio of private to shared alleles. As more polymorphism data become available, this kind of computationally intensive method that does not fully rely on FST will have a central role.