Introduction

Identifying the action of positive selection from genomic patterns of variation has remained as a central focus in population genetics. This owes both to the importance of specific applications in fields ranging from ecology to medicine, but also to the desire to address more general evolutionary questions concerning the mode and tempo of adaptation. In this vein, the notion of a soft selective sweep has grown in popularity in the recent literature, and with this increasing usage the definition of the term itself has grown increasingly vague. A soft sweep does not reference a particular population genetic model per se, but rather a set of very different models that may result in similar genomic patterns of variation. Further, it is a term commonly used in juxtaposition with the notion of a hard selective sweep, the classic model in which a single novel beneficial mutation arises in a population and rises in frequency quickly to fixation. Patterns expected under the hard sweep model have been well described in the literature (see reviews of refs 1, 2; Box 1), and consist of a reduction in variation surrounding the beneficial mutation owing to the fixation of the single haplotype carrying the beneficial, with resulting well-described skews in the frequency spectrum3,4,5 and in patterns of linkage disequilibrium6,7,8. Indeed, a part of the recent popularity of soft sweeps comes from the seeming rarity of these expected hard sweep patterns in many natural populations (for example, see refs 9, 10, 11).

In terms of patterns of variation, the primary difference between soft and hard selective sweeps lies in the expected number of different haplotypes carrying the beneficial mutation or mutations, and thus in the expected number of haplotypes that hitchhike to appreciable frequency during the selective sweep, and which remain in the population at the time of fixation. This key difference results in different expectations in both the site frequency spectrum and in linkage disequilibrium, and thus in the many test statistics based on these patterns (see Box 1). Owing to this ambiguous definition, a number of models have been associated with producing a soft sweep pattern—including selection acting on previously segregating mutations, and multiple beneficials arising via mutation in quick succession (see review ref. 11 and Box 1).

However, apart from shared expected patterns of variation, these two population genetic models are very different. Selection on standing variation requires that the beneficial mutation segregate at appreciable frequency in the preselection environment, whereas the multiple beneficial model requires a high mutation rate to the beneficial genotype. One important point that will be returned to throughout is the distinction between the relevance of these models themselves and the likelihood of these models resulting in a hard (that is, single haplotype) versus a soft (that is, multiple haplotype) selective sweep at the time of fixation. Below, I will discuss what is known from theory regarding these models, and what is known from experimental evolution and empirical population genetic studies regarding the values of the key parameters dictating their relevance. I conclude by arguing that the recent enthusiasm for invoking soft sweeps to explain observed patterns of variation is likely to be largely unfounded in many cases.

Selection on standing variation

As described in Box 2, understanding the likelihood of a model of selection on standing variation requires knowing the frequency and fitness of beneficial mutations segregating in the population before becoming beneficial. Below, I will briefly review what is known from both experimental and empirical studies regarding these parameters in the handful of instances in which we have good inference.

Rare standing variants appear to contribute to adaptation

Orr and Betancourt12 previously considered a model of selection on standing variation and reached a similar result as Hermisson and Pennings13—namely, a soft sweep from standing variation only becomes feasible when the mutation has a non-zero probability of segregating at an appreciable frequency at the time of the selective shift (that is, the beneficial mutation was previously neutral or slightly deleterious and segregating under drift at relatively high frequency, it was maintained at appreciable frequency by balancing selection before the selective shift and so on). Indeed, they provide a direct calculation for the probability that multiple copies of the beneficial allele (X) exist, conditional on fixation of the beneficial mutation:

where sd is the selection coefficient before the shift, sb is the selection coefficient after the shift, and Nu is the population mutation rate.

With this, Orr and Betancourt made a notable observation that even if selection is acting on standing variation instead of a new mutation, a single copy is nonetheless surprisingly likely to sweep to fixation (that is, producing a hard, rather than a soft, sweep from standing variation). Indeed, they demonstrate that multiple-copy fixations become more likely than single-copy fixations from standing variation only when 4Nusb/sd>1. For reasonable parameter estimates, they calculate that the allele must be present in many copies in the population before obtaining an appreciable probability of sweeping multiple copies of the beneficial mutation to fixation. For example, from Orr and Betancourt, for N=1 × 104, sd=0.05, sb=0.01, u=10−5 and h=0.2, 96% of the time a single copy will fix in the population, despite 20 copies segregating at mutation–selection balance before the shift in selection pressure. For these parameters, the population size must be in excess of N=1.5 × 105 (thus more than 300 copies segregating at mutation–selection balance) before multiple copies are more likely on fixation than a single copy.

Revisiting this model, Przeworski et al.14 more explicitly examined the frequency at which a mutation must be segregating before the shift in selection pressure, before multiple haplotypes would likely be involved in the selective sweep. They found that a hard sweep is likely when x<1/2Nesb (consistent with the simulated exampled from Orr and Betancourt above). Thus, taking the mutation–selection balance frequency given above, we may conclude that a hard sweep (that is, involving a single haplotype) is likely from standing variation when Θμ/2hαd<1/2Nesb (see Fig. 1). An important distinction is again necessary here. While the parameter requirement mentioned above concerns the likelihood of a soft sweep from standing variation, it further suggests that we are unlikely to have statistical resolution when attempting to distinguish between a hard sweep on a new mutation versus a hard sweep on a rare previously standing variant.

Figure 1: The conditions under which a soft sweep from standing variation becomes possible.
figure 1

The y axis represents the selection coefficient before the selective shift (that is, given by negative selection coefficients) and the x axis is the selection coefficient after the shift in selective pressure (that is, given by positive selection coefficients). The area under each line represents the parameter space for which such a soft sweep is feasible for two different effective population sizes—one human like (104, given by pink shading) and one Drosophila like (106, given by vertical blue lines). As shown, the effect before the selective shift must be nearly neutral or weakly deleterious in order for the allele to segregate at an appreciable frequency, and the effect post selective shift must be strongly beneficial. As described in the text, this inference rests on the argument that Θμ/2hαd must be greater than 1/2Nesb for a soft selective sweep from standing variation to become likely, where here Θμ=10−8 and h′=1.

As an empirical example of the above point, one of the most widely cited and discussed examples of selection on standing variation surrounds the Eda locus in Sticklebacks15. With evidence for selection reducing armour plating in freshwater populations compared with the ancestral heavily plated marine populations, the authors sequenced marine individuals to estimate the allele frequency of the freshwater adaptive low plate morphs, with estimates ranging from 0.2 to 3.8%. While the low plate morph is likely deleterious in marine populations (potentially suggesting that it is at mutation–selection balance), migration from the marine environment may indeed serve as an important source of variation for local freshwater hard sweeps. However, as noted by the authors, it is difficult to separate this hypothesis from that of local freshwater adaptation on new mutations, followed by back migration of locally adapted alleles into the marine population. Similarly, related arguments have been made for rare standing variation being responsible for the quick and persistent response of phenotypic traits to selection in the quantitative genetics literature (for a helpful review, see ref. 16). However, as with the above example, distinguishing rare standing ancestral variation from newly accumulating mutations has also been a topic of note17. Regardless of these caveats, hard sweeps from rare standing variants segregating at mutation–selection balance in ancestral populations, rather than on de novo mutations alone, appear to be an important and viable model of adaptation.

Quantifying the cost of beneficial mutations

On the basis of the simple and enlightening result of Orr and Betancourt, it is reasonable to ask, for cases in which we have reasonably strong functional evidence of adaptation, what we know about the value of 4Nusb/sd, as this will dictate the likelihood of a hard versus a soft sweep from standing variation. There are two fields from which we may obtain insights—experimental-evolution studies in which the selective effects of mutations may be precisely measured under controlled environmental conditions and empirical population genetic studies in which inference can be drawn about the selective effect of functionally validated mutations in the presence and absence of a given selective pressure.

First, there is a rich literature in experimental evolution from which we can draw. In a recent evaluation of the distribution of fitness effects (DFE) in both the presence and absence of antibiotic in the bacterium Pseudomonas fluorescens, Kassen and Bataillon18 found that of the 665 resistance mutations isolated, greater than 95% were deleterious in the absence of the antibiotic treatment. In populations of yeast raised in both standard and challenging environments (in this case, high temperature and high salinity), Hietpas et al.19 identified a handful of beneficial mutations in each of the challenge environments, all of which were deleterious under standard conditions, with some even being lethal in the absence of the selective pressure. Foll et al.20 in investigating the evolution of oseltamivir resistance mutations in the influenza A virus, identified 11 candidate resistance mutations, with the one functionally validated mutation (H274Y) having been demonstrated to be deleterious in the absence of drug pressure (see also refs 21, 22).

Second, there are a small but increasing number of examples from natural populations where we have both a functionally validated beneficial mutation for which we understand the genotype–phenotype connection, as well as inference on the selection coefficient both in the presence and absence of a given selective pressure. One such example is the evolution of cryptic colouration in wild populations of deer mice23. In the Nebraska Sand Hills population, population genetic and functional evidence has been found for positive selection acting on a small number of mutations modifying different aspects of the cryptic phenotype, all contained within the Agouti gene region24. Three lines of population genetic evidence suggest that selection began acting on these mutations when they arose (that is, selection on a de novo mutation): (1) the beneficial mutations appear to be carried on single haplotypes (though, as discussed above, selection on standing variation may indeed often only result in a single haplotype fixation), (2) the beneficial mutations have not been sampled off of the Sand Hills region (that is, the mutation is unlikely to have been segregating at appreciable frequency in the ancestral population before the formation of the Sand Hills) and (3) using an approximate Bayesian approach, the age of the selected mutation has been inferred to be younger than the geological age of the Sand Hills23. In addition, ecological information pertaining to this phenotype exists as well. Performing a predation experiment with clay models, Linnen et al.24 demonstrated a strong selective advantage of crypsis—with conspicuous models being subject to avian predation significantly more than cryptic models. This result suggests that if the beneficial phenotype currently present in the Sand Hills was indeed present in the ancestral population, it was likely to be strongly deleterious.

Other notable examples exist in the empirical literature as well. For example, Agren and Schemske25 mapped quantitative trait loci for 398 recombinant inbred lines of Arabidopsis derived from crossing locally adapted lines from Sweden and Italy. Their results suggest a small number of locally adaptive genomic regions, and that in many cases the locally adaptive change was deleterious in the alternate environment. Performing a meta-analysis on a wide range of antibiotic-resistance mutations in pathogenic microbial populations, Melnyk et al.26 found that across eight species and 15 drug treatments, resistance mutations were widely found to be deleterious in the absence of treatment (that is, in 19/21 examined studies). At the Ace locus of D. melanogaster, four described variants conferring varying degrees of pesticide resistance have been described, all of which are strongly deleterious in the absence of this pressure (with deleterious selection coefficients ranging from −5 to −20%; see ref. 27).

Thus, given the required preselection frequency necessary to result in a soft rather than a hard sweep, it is fair to say that this combination of results provides poor support for the relevance of soft sweeps from standing variation in the populations examined. However, it is worth noting that such studies likely represent an ascertainment bias towards traits that strongly affect the phenotype, thus making them amenable for ecological and laboratory study. Assuming a relationship between the observed phenotypic and underlying selective effects, it may well be that beneficial mutations of small effects (which are more difficult to study and thus under-represented in the literature) may be those more likely to have only weakly deleterious effects in the absence of a given selective pressure.

Multiple competing beneficials

Box 3 describes the key parameters for understanding the likelihood of a model of competing beneficials—namely, the mutation rate to the beneficial genotype and the size of the mutational target available for creating an identical beneficial mutation. Below, I will briefly review what is known from both experimental and empirical studies regarding these parameters in the handful of instances in which we have good inference.

On the proportion of beneficial mutations

To understand the beneficial mutation rate is fundamentally to understand the DFE—that is, the proportions of newly arising mutations that are beneficial, neutral and deleterious. Characterizing this distribution has spawned a long and rich literature among both theoreticians and experimentalists. Fisher28 had already considered the probability that a random mutation of a given phenotypic size would be beneficial, concluding that adaptations must consist primarily of small-effect mutations. Kimura29 recognized one difficulty with this conclusion, noting that while small-effect mutations may be more likely to be adaptive, large-effect mutations have a higher probability of fixation. Thus, Kimura argued that, in fact, the intermediate-effect mutations may be most common in the adaptive process. Orr30 gained an important additional insight—given any distribution of mutational effects, the distribution of factors fixed during an adaptive walk (that is, the sequential accumulation of beneficial mutations) is roughly exponential. An important by-product of this result is the notion that the first step of an adaptive walk may indeed be quite large (in agreement with Fisher’s Geometric Model).

Efforts to quantify the shape of the DFE and characterize the beneficial mutation rate have come largely from the experimental evolution literature. One common feature amongst this work is the use of extreme value theory (see review ref. 31). Because the DFE of new mutations is generally considered to be bimodal32—consisting of a strongly deleterious mode and a nearly neutral mode—beneficial mutations represent the extreme tail of the mode centred around neutrality. One particular type of extreme value distribution—the Gumbel type (which contains a number of common distributions including normal, lognormal, gamma, and exponential)—has been of particular focus beginning with Gillespie33.

Recently, experimental efforts have begun to better characterize the shape of the true underlying distribution in lab populations experiencing adaptive challenges (see review ref. 34). Though the fraction of beneficial mutations relative to the total mutation rate is indeed small, providing good support for the assumptions of extreme value theory, the exact shape of the beneficial distribution varies by study. Sanjuan et al.35 found support for a gamma distribution using site-directed mutagenesis in vesicular stomatitis virus. Kassen and Bataillon18 found support for an exponential distribution assessing antibiotic-resistance mutations in Pseudomonas. Rokyta et al.36 found support not for the Gumbel domain but rather for a distribution with a right-truncated tail (that is, suggesting that there is an upper bound on potential fitness effects), using two viral populations. MacLean and Buckling37, again using Pseudomonas, argued that an exponential distribution well explained the data when the population was near optimum, but not when the population was far from optimum, owing to a long tail of strongly beneficial mutations. Schoustra et al.38, working on the fungus Aspergillus, demonstrated that adaptive walks tend to be short, and characterized by an ever-decreasing number of available beneficial mutations with each mutational step taken. One important caveat in such experiments, however, is that they commonly begin from homogenous populations. Thus, while providing a good deal of insight into the underlying DFE, they are far from direct assessments of the relative role of single de novo beneficial mutations in adaptation.

Whole-genome time-sampled sequencing is also shedding light on the fraction of adaptive mutations. Examining resistance mutations in the influenza virus both in the presence and absence of oseltamivir (a common drug treatment), Foll et al.20 identified the single and previously described resistance mutation (that is, H274Y (ref. 39)) as well as 10 additional putatively beneficial mutations based on duplicated experiments and population sequencing, suggesting a fraction of 11/13,588 potentially beneficial genomic sites in the presence of drug treatment, or 0.08% of the genome. But perhaps the most specific information currently available regarding beneficial mutation rates comes from experiments in which all mutations may be generated individually (as opposed to mutation-accumulation studies) and directly evaluated across different environmental conditions (see refs 19, 40). Within this framework, Bank et al.41 recently evaluated all possible 560 individual mutations in a subregion of a yeast heat shock protein across six different environmental conditions (standard, as well as temperature and salinity variants), identifying few beneficials in the standard environment, and multiple beneficials associated with high salinity. To quantify this shift, the authors fit a Generalized Pareto Distribution, using the shape parameter (K) to summarize the changing DFE—with the Weibull domain fitting the less-challenging environments (that is, demonstrating that the DFE is right-bounded, suggesting that the populations are near optimum), and the Frechet domain fitting the challenging environment (that is, a heavy-tailed distribution owing to the presence of strongly beneficial mutations, potentially suggesting that the population is more distant from optimum).

Thus, despite some small but important differences between these conclusions, there is general support for a model in which newly arising mutations take a bimodal distribution, with the extreme right tail of this distribution representing putatively beneficial mutations. In other words, the beneficial mutation rate is likely a very small fraction of the total mutation rate. Given the requirements of the multiple competing beneficials model (that is, Θb>0.04), this would seemingly make the model only of relevance to populations of extremely large Neμ, as in perhaps certain viral populations (see ref. 42). Indeed, in an attempt to argue for the relevance of these models in Drosophila, Karasov et al.27 claim an effective population size in Drosophila that is orders of magnitude larger than commonly believed (that is, Ne>108), despite the great majority of empirical evidence to the contrary (see review ref. 43).

Small adaptive target size in natural populations

However, despite the above conclusion, if the mutational target size is not a single site, but rather a large collection of sites, this value may become more attainable for a wider array of species. As with the above section, the most abundant and reliably validated information comes from experimental evolution. However, this data is of course imperfect, as these studies do not necessarily reflect all potential available beneficial solutions (that is, mutation-accumulation studies can only draw inference on the mutations which happen to occur during the course of the experiment, and studies using direct-mutagenesis have thus far only evaluated sub-genomic regions). Returning to the examples given in the section above, we may ask what are the functional requirements of the identified beneficial mutations. In studying adaptation to the antibiotic rifampicin in the pathogen Pseudomonas, MacLean and Buckling37 demonstrated that the beneficial mutations identified are consistent with known molecular interactions between rifampicin and RNA polymerase—as the antibiotic binds to a small pocket of the β-subunit of RNA polymerase, in which only 12 amino acid residues are involved in direct interaction. Wong et al.44 investigated the genetics of adaptation to cystic fibrosis-like conditions in Psuedomonas both in the presence and absence of fluoroquinolone antibiotics, describing a small number of stereotypical resistance mutations in DNA gyrase. Examining the evolution of oseltamivir resistance in the influenza A virus, Foll et al.20 described a similar story, in which a small handful of putatively beneficial resistance mutations are concentrated in haemagglutinin and neuraminidase, with the single-characterized resistance mutation being shown to alter the hydrophobic pocket of the neuraminidase active site, thus reducing affinity for drug.

Again considering the natural populations for which we have solid genotype–phenotype information and about which we understand something about the nature of adaptation acting on these mutations, let us consider a few examples. Describing wide-spread parallel evolution on armour plating in wild threespine sticklebacks, Colosimo et al.15 demonstrated the Ectodysplasin signalling pathways to be repeatedly targeted for modifications to this phenotype with a high degree of site-specific parallel evolution. Looking across 14 insect species that feed on cardenolide-producing plants, Zhen et al.45 also noted repeated bouts of parallel evolution for dealing with this toxicity not only confined to the same alpha subunit of the sodium pump (ATPα), but in the great majority of cases to the same two amino acid positions. Examining adaptation to pesticide resistance in Drosophila, four specific point mutations in the Ace gene have been identified, which result in resistance to organophosphates and carbonates (see ref. 27). Cryptic colouration has also been a fruitful area, with specific mutations in the Mc1r and Aguoti gene regions having been described as the underlying cause of adaptation for crypsis in mice of the Arizona/New Mexico lava flows46, Nebraska Sand Hills23 and the Atlantic coast47, as well as in organisms ranging from the Siberian mammoth48 to multiple species of lizards on the White Sands of New Mexico49,50 (and see review ref. 51 for further examples).

Thus, for the handful of convincing genotype-to-phenotype examples in the literature, the adaptive mutational target size appears small, a result which would appear to be biologically quite reasonable.

The effects of selection on linked sites

However, even if the mutational target size is sufficiently large such that a model of competing beneficials becomes feasible, it becomes necessary to consider interference between these segregating selected sites. It is helpful to consider three relevant areas of the parameter space: (a) beneficial mutations of identical selective effects arising in a low recombination rate region, potentially allowing for a soft sweep; (b) beneficial mutations of differing selective effects arising in a low recombination rate region, where the most strongly beneficial likely outcompetes the others producing a hard sweep; or (c) multiple beneficial mutations occurring in a high recombination rate region, allowing for a hard sweep of the most beneficial haplotype (that is, the recombinant carrying the most beneficial mutations).

Hill and Robertson52 explicitly considered the probability of fixation for two segregating beneficial mutations. Confirming the arguments of Fisher28, they demonstrated that selection at one locus indeed interferes with selection at the alternate locus, reducing the probability of fixation at both sites—with the conclusion being that simultaneous selection at more than one site reduces the overall efficacy of selection (see also refs 53, 54).

This effect is clearly a function of the amount of recombination between the selected sites. If the sites are independent there is no such effect, and if they are tightly linked the effect will be very strong (Fig. 2). While it is difficult to generalize this information, for the current empirical data available discussed above, it appears both likely and biologically reasonable to consider that mutations conferring identical selective effects may indeed be occurring within a narrow genomic region (for example, mutations within the drug-binding pocket haemagglutinin in influenza virus in response to drug, within RNA polymerase in Pseudomonas in response to antibiotic, at the Agouti/Mc1r locus in vertebrates for colour modifications, at the Ace locus of Drosophila for pesticide resistance, at the Eda locus in Sticklebacks for armour modifications and so on).

Figure 2: Probability of fixation of a first beneficial mutation under a scenario of two competing beneficial mutations.
figure 2

Representation of the results of Hill and Robertson52 (modified from their Fig. 2). On the y axis is the probability of fixation of the first beneficial mutation, and on the x axis is the selection coefficient multiplied by the phenotypic effect of the second (and thus competing) beneficial mutation. Here the selective effect of mutation #1 is 8, on the scale given on the x axis, and each beneficial mutation is assumed to be at a 10% frequency in the population. The four lines represent four different recombination rates between the two sites. As shown, if the effect of the second site is weaker than the first (that is,<8), the first beneficial retains a high probability of fixation and the second site will be lost unless it can recombine on to a common haplotype. However, as the effect of the second beneficial mutation becomes stronger, the probability of fixation of the first beneficial mutation decreases rapidly—a phenomenon that is increasingly strong with tighter genetic linkage between the sites. Thus, for rare to moderate recombination, and a strong beneficial effect of the second site, the probability of fixation of the first beneficial mutation is halved even for this simple case of only two competing beneficials.

Examining the extent of this effect by simulation, Comeron et al.55 found that the effect becomes stronger as (1) the sites become more weakly beneficial, (2) the recombination rate is decreased and (3) the number of selected sites increases (consistent with the results of refs 56, 57). However, as long as there is linkage between the sites, the probability of fixation decreases rapidly as the number of selected sites grows, even for very strong selection. Examining the effect of two competing beneficial mutations in the presence of recombination analytically, Yu and Etheridge58 further demonstrate the relative likelihood of an ultimate single haplotype fixation.

Thus, while a large mutational target size may, in principle, increase the relevance of this model, it results in a scenario still requiring a large beneficial mutation rate, necessitates that these beneficial mutations escape initial stochastic loss and finally, owing to interference, results in a decreased probability of fixation for each competing beneficial relative to independence. Again invoking results from experimental evolution, in a highly informative recent study by Lee and Marx59, the authors demonstrate the strong effects of clonal interference in replicated populations of Methylobacterium extorquens—identifying as many as 17 simultaneous beneficial mutations existing in a population which may rise in frequency initially, only to be lost owing to competition with an alternate and ultimately successful single beneficial mutation, in what they termed repeated ‘failed soft sweeps.’

As a natural population example, Hedrick60 discusses the multiple identified malaria resistance variants identified in humans, and makes the case that in the continued presence of malaria, single variants are highly likely to ultimately fix at the cost of losing other competing and currently segregating beneficial resistance mutations, owing to measured selection differentials. Similarly, at the previously discussed Ace locus of D. melanogaster, a single of the four identified resistance mutations was found to confer 75% resistance to pesticide, two mutations confer 80% resistance and three mutations confer full resistance—again suggesting that a single haplotype carrying multiple beneficial mutations will likely ultimately result in a hard selective sweep. Both of these observations, along with the results of Lee and Marx59, suggest that multiple competing beneficial mutations may indeed be a likely model, but a soft sweep from multiple beneficials is unlikely owing to non-equivalent selective effects between the mutations (or the haplotypes carrying these mutations). Thus, as with the model of selection on standing variation above—the model of competing beneficial mutations itself has good empirical and experimental support for being relevant, but a hard sweep rather than a soft sweep appears as the more likely outcome given our current understanding of the parameters of relevance.

Perspective and future directions

Apart from the considerations discussed in the sections above, and conditional on the unsubstantiated assumption that adaptive fixations are common, the absence of hard sweep patterns in many natural populations has led some to conclude that soft sweeps must be the primary mechanism of adaptation, with a recent popularity for invoking these models in the human and Drosophila literature. However, as argued above, this assumption is poorly supported, and theoretical and experimental insights to date suggest that soft sweeps from standing variation or from multiple beneficial mutations for populations of this size are unlikely. This argument itself is of course somewhat circular, as quantifying the fraction of adaptively fixed mutations, and the proportion of newly arising beneficial mutations, is indeed one of the central focal points of population genetics, and is far from resolved as discussed. Thus, assuming a very large fraction of adaptive fixations to quantify the fraction of adaptive fixations is rather self-defeating.

A quite separate point has also been neglected in this literature. Namely, the power of existing tests of hard selective sweeps to identify these patterns within demographically complex populations (a category that certainly includes humans and Drosophila). Biswas and Akey61 examined the consistency between methods used for conducting genomic scans for beneficial mutations in humans. Results differed dramatically, ranging from 1,799 genes identified by Wang et al.62 to 27 genes identified by Altshuler et al.63 Perhaps even more striking, of the six studies examined, there was virtually no overlap in the genes identified. For example, of the 1,799 genes identified by Wang et al.62, 125 overlap with the scan of Voight et al.64, 47 from Carlson et al.65, 5 from Altshuler et al.62, 4 from Nielsen et al.66 and 40 from Bustamante et al.67 In addition, the recent review by Crisci et al.2, summarizing estimates of the rate of adaptive fixation in Drosophila, noted that the inferred genomic rate differs by two orders of magnitude between studies (from λ=1.0E−12 (ref. 68) to λ=1.0E−11 (refs 69, 70) to λ=1.0E−10 (ref. 71)—where 2Neλ is the rate of beneficial fixation per base pair per 2Ne generations).

Evaluating the performance of these statistics has thus remained an important question, and over the past decade numerous researchers have demonstrated low power under a wide range of neutral non-equilibrium models72,73,74,75,76. More recently, Crisci et al.77 specifically evaluated the ability of the most widely used and sophisticated tests of selection via simulation (Sweepfinder66, SweeD76 and OmegaPlus78), to identify both complete and incomplete hard selective sweeps under a variety of demographic models of relevance for human and Drosophila populations. The results are troubling, with the true positive rate rarely exceeding 50% even under equilibrium models, and being considerably worse for models of moderate and severe population size reductions (Fig. 3). Furthermore, the false positive rate was often in excess of power, particularly for models of population bottlenecks. Though not conclusive, this indeed suggests a troubling potential interpretation for the lack of overlap between the above mentioned genomic scan studies.

Figure 3: Statistical power to detect hard selective sweeps.
figure 3

Representation of the results of Crisci et al.77 examining the power of three of the most commonly used approaches for detecting hard selective sweeps—Sweepfinder (blue), Sweed (red) and Omegaplus (black). On the y axis is the power of the test statistic and on the x axis is the strength of selection. Two models are plotted: (1) a selective sweep in an equilibrium population (given by the solid lines) and (2) a selective sweep in a moderately bottlenecked population, in which the population is reduced to 10% of its former size 0.2 4N generations in the past (given by the dashed lines). In all cases, Ne=104 and the selective event occurred 0.01 4N generations in the past. As shown, even in equilibrium populations, the power scarcely reaches 50% even for strong selection, while in mildly bottlenecked populations the power approaches nominal levels for frequency spectrum based statistics (that is, Sweed and Sweepfinder) and drops to 30% for the better performing linkage disequilibrium-based approaches (that is, Omegaplus).

If nothing else, these results demonstrate that the absence of evidence is not evidence of absence for the hard sweep model—implying that we only have minimal power to detect even very recent and very strong hard selective sweeps in these populations, and essentially no power for the great majority of the parameter space. However, concerning these results may be, it is important that the field has made the effort to quantify the performance of the test statistics designed for detecting hard sweeps—defining Type-I and Type-II error and examining performance in demographic models both with and without selection. This scrutiny has yet to be brought to soft sweep expectations and statistics. Before these models can be reasonably invoked as explanations for observations in natural populations, we need to similarly understand the ability of neutral demographic models to replicate soft sweep patterns, quantify our ability to identify soft sweeps from standing variation and from multiple beneficial mutations in non-equilibrium populations and understand the effects of relaxing current assumptions involving linkage and epistasis (that is, for selection on standing variation, the assumption is made that a single beneficial mutation will have the same selective effects on all genetic backgrounds, and the multiple beneficial model assumes that there are no epistatic interactions between co-segregating mutations). Early efforts have been made in some of these areas, with recent work examining basic expectations of these models under fluctuating effective population sizes, resulting in a further description of how population size changes may result in the ultimate fixation of a single beneficial mutation79.

In conclusion, the wide array of genomic patterns of variation that may be accounted for by models associated with soft selective sweeps has allowed adaptive explanations to proliferate in the literature, and be invoked for a larger subset of genomic data. However appealing this may be, these models in fact carry with them very specific and well-understood parameter requirements. Further, the ability of alternate models to produce these patterns needs to be more carefully weighed in future studies, particularly given preliminary findings concerning similar patterns produced under both neutral demographic models77 and models of background selection80. Indeed, alternative models of positive selection have also been suggested to produce qualitatively similar patterns—including hard selective sweeps in subdivided populations exchanging migrants81,82,83 and polygenic adaptation84.

Finally, while examples in the literature are accumulating in support of the models themselves (for example, selection on standing variation at the Eda locus of Sticklebacks or selection on multiple beneficials at the Ace locus of Drosophila), there is very little evidence of soft sweep fixations, with the best empirical and experimental examples to date almost universally pointing to hard sweep fixations under these models. This appears to primarily be owing to the low preselection allele frequency of the standing variants (which are seemingly often deleterious before the shift in selective pressure), and to the selective differential between competing beneficial mutations (or between the haplotypes carrying the beneficial mutations) resulting in the ultimate fixation of only a single haplotype. Thus, while the models themselves certainly deserve further attention, theoretical, empirical and experimental results to date suggest that the field ought to take greater caution when invoking soft sweep fixations, as hard sweep fixations (be it from models of selection on new mutations, standing variation or competing beneficial mutations) seem to remain as the most likely outcome across a wide parameter space relevant for many current populations of interest.

Additional information

How to cite this article: Jensen, J. D. On the unfounded enthusiasm for soft selective sweeps. Nat. Commun. 5:5281 doi: 10.1038/ncomms6281 (2014).