When FST ceases being approximately equal to 1/(1+4Nm)

Genetic variation among populations created by drift, mutations and selection is counteracted by migration. Wright (1931) in his seminal work derived a famous formula that relates approximately the amount of genetic variation among populations (as measured by the F statistic or FST) to the migration rate m: FST≈1/(1+4Nem), where Ne is the common effective size of populations. Together with the advent of molecular techniques, this formula opened the door to a simple and inexpensive way to quantify gene flow among populations through estimated FST. In a timely review, Whitlock and McCauley (1999) reminded us that an estimate is worth just as much as the model that has been used to obtain it, and that the assumptions underlying Wright's model are likely to be violated by most natural populations. Indeed, Wright's model assumed the existence of an infinite number of populations having a common and constant effective population size, that migrations occur at random at a common and constant rate among populations and that there is no mutation and no selection. Whitlock and McCauley (1999) noted that mutation might not have a crucial role in shaping population structure and focused their comments on the effect of selection, of asymmetries between populations and other departures from homogeneity in space and time. These aspects have been the object of active research in the last 10 years.

Islands with selection

Directional selection acting differentially among populations tends to increase FST values compared to the prediction of Wright's formula whereas balancing selection tends to decrease them (Nielsen, 2005; Holsinger and Weir, 2009). In this context, all quantities involved in Wright's formula still make sense and in principle, the formula could be fixed to encompass the effect of selection. Actually, this fix would be of little value for estimating migration rate, as allele fitness is typically unknown. However, the detection of loci under selection has lately become an active area of research and another strategy can be undertaken. Indeed, increasingly sophisticated methods allow one to test the hypothesis of genetic neutrality of each locus (Vitalis et al., 2001; Beaumont and Balding, 2004; Foll and Gaggiotti, 2008; Riebler et al., 2008; Excoffier et al., 2009). These methods have all the same flavor: they compare estimated FST's to quantiles of the FST distribution expected under the null hypothesis of neutrality. See also Nielsen (2001) for alternative approaches. The recent work by Excoffier et al. (2009) stresses that an incorrect model (for example, ignoring hierarchical structure) may lead to incorrect attribution of outlier loci to the effects of selection. However, providing that the model assumed is correct, a valid approach to indirectly estimating the rate of migration consists in detecting genetically neutral markers by one of the above-mentioned methods and then resorting to Wright's formula.

More complex scenarios

Whitlock and McCauley (1999) pointed out that Wright's formula relies also on the assumption that populations have equal and constant size and exchange migrants at the same rate, in particular, the migration rate does not depend on the geographical distance separating populations. In this situation, using Wright's formula requires one to restrict estimation of FST and m to pairs of populations. This is carried out at the price of an increased estimation variance. Narrowing the data set to make it fit a model seems to be a brute force strategy. A more efficient option consists in enriching the model to make it suitable for the whole data set. This has been made possible by the advent of computational statistical methods that allow one to estimate parameters under evolutionary scenarios that are more general than Wright's island model (Rousset, 1997; Beerli and Felsenstein, 2001; Lopes et al., 2009). The latter is perhaps the most advanced method in this context. It is aimed at jointly estimating branching histories and population genetic parameters (effective population sizes, migration rates, mutation and recombination rates) for known panmictic populations in the isolation with migration model (Nielsen and Wakeley, 2001). The key idea is to assess the plausibility of a given scenario (that is, a set of parameters) by looking at how often simulation according to this scenario produces patterns similar to those displayed in data at hand. This task was considered prohibitive only 15 years ago and is now made possible by exploiting two key ideas: simulating data by coalescent and summarizing information contained in data by summary statistics (Beaumont et al., 2002; Cornuet et al., 2008).

… and θ̂≠Argmaxθ L(θ; x)

The report that FST≠1(1+4Nem) sounded like bad news, in particular because it did not provide straightforward solutions to the issues pointed out. However it became widely cited, presumably because of its cathartic power, but also because it corresponded to a change in the way people look at evolutionary biology. Around the corner were new, more realistic models than Wright's island model. Besides, detecting outlier FST values has been the key to the first methods of detecting loci under selection, showing that the model did have its uses. Ironically enough, the statistical methods by which the issues reported by Whitlock and McCauley have been solved have been mostly Bayesian (Beaumont and Balding, 2004), striking a blow to the likelihood dogma promoted almost 100 years ago by Fisher (1922), the other giant of evolutionary biology.