Main

A major focus of evolutionary genetic research has been to decipher causes of speciation from patterns of nucleotide polymorphism and divergence. In particular, researchers infer gene flow between related species and use these results to reject models of species formation wherein complete barriers to gene flow evolved during periods of geographic isolation (allopatry). Because ‘the number of [recent] studies focusing on testing hybridization between species has increased by orders of magnitude’ (Stevison, 2008), expressions such as ‘speciation with gene flow’ have become commonplace in the literature to describe cases of gene flow putatively occurring during initial species divergence and/or after secondary contact.

However, shared variation predating speciation (‘lineage sorting’) creates patterns often mistaken for gene flow between diverging species (see Hey, 2006 for review). To address this complication, several statistical models of DNA sequence evolution apply coalescent principles or other approaches to distinguish these possibilities (Wakeley and Hey, 1997; Machado et al., 2002; Hey and Nielsen, 2004; Becquet and Przeworski, 2007; Joly et al., 2009). Although these models are used extensively, known deviations from their assumptions in particular systems or inappropriate data sets (for example, microsatellite polymorphism rather than DNA sequence) cause investigators to resort to more basic predictions in testing for interspecies introgression. Perhaps the most common test for gene exchange is to determine whether some regions are significantly more differentiated between species than putatively ‘neutral’ regions or relative to the overall distribution of divergences observed. Although identified by divergence alone, such regions may bear alleles conferring adaptation or reproductive isolation between species. Relative divergence measures like FST in particular have been advocated and used to test the importance of such regions in promoting adaptation or speciation in the face of gene flow (Beaumont, 2005).

One hypothesis that received particular attention in the past decade is that chromosomal rearrangements, or other regions of the genome in which recombination is rare or absent in species hybrids, are associated with creating or maintaining young species despite gene flow (Butlin, 2005; Hoffmann and Rieseberg, 2008). Theoretical models predict that regions of restricted recombination may facilitate species formation or persistence by creating linkage disequilibrium along large swaths of the genome including alleles conferring adaptation or barriers to gene flow (Noor et al., 2001c; Rieseberg, 2001; Navarro and Barton, 2003). Various lines of empirical data also support this idea: rearrangements are detected at lower genetic divergence in co-occurring species than in allopatric species (Noor et al., 2001c; Ayala and Coluzzi, 2005; Kandul et al., 2007), traits that prevent gene flow between species (such as habitat choice, mate preference or hybrid sterility) preferentially map to rearranged regions of the genome (Noor et al., 2001b; Feder et al., 2003), and most commonly, inverted regions tend to show greater nucleotide differentiation between species than regions not inverted (see below).

Here, we review several problems associated with using patterns of nucleotide differentiation (especially relative measures such as FST or Da) to test the role of restricted recombination in maintaining species. We discuss how restricted recombination can create regions of low intraspecific variation that, in comparison to regions of normal recombination, lead researchers to conclude differential gene flow among segments of the genome even if the species have never hybridized. The expression ‘islands of speciation’ (Turner et al., 2005) was coined to analogize genetic material being exchanged between species to flowing ocean water, but we conclude that the water (gene flow) itself may be a ‘mirage’ at times.

Chromosomal rearrangements

Studies of various taxa have shown higher divergence in rearranged than collinear regions between diverging species, including Drosophila species (Noor et al., 2007; Machado et al., 2007a, 2007b), shrews (Basset et al., 2006, 2008; Yannic et al., 2009), Anopheles mosquito races (Michel et al., 2006) and Rhagoletis fruit flies (Feder et al., 2003). Early evidence also supported this model in Helianthus sunflowers (Rieseberg et al., 1999), though later studies suggested this effect may be localized to regions immediately adjacent to the rearrangement break points (Yatabe et al., 2007; Strasburg et al., 2009). However, support has not been universal—some species clearly hybridize extensively and persist without rearrangements (for example, Llopart et al., 2005), and some studies report regions of high differentiation widely distributed across the genome rather than clustered to specific rearrangements (see review in Nosil et al., 2009). Nonetheless, this prediction has been upheld in many systems tested and interpreted as evidence for a role of regions of restricted recombination in maintaining species despite ancient or recent hybridization.

However, rearranged regions may exhibit higher nucleotide divergence between species than collinear regions even if the species do not hybridize at all (Table 1). As such, this observation does not necessarily support a role of restricted recombination in allowing species to persist. First, multiple chromosomal rearrangements such as inversions segregate within many species (for example, Lewontin et al., 1981; Powell et al., 1999; Singh, 2001). Such inversions reduce recombination (and homogenization) from the time that they arise, particularly for short inversions and particularly near the inversion break points. If the different arrangements (for example, ‘inverted’ vs ‘uninverted’) persist within the species for some time and eventually alternately fix within subpopulations, the pattern of higher divergence in regions inverted between the species will appear. However, this higher divergence reflects the more ancient coalescence of the inverted regions relative to the collinear regions in the ancestor rather than ‘speciation with gene flow.’ Given the ubiquity of chromosomal rearrangements segregating within species, this pattern is likely to arise by chance and would result in inverted regions displaying greater nucleotide differentiation between species than regions not inverted, even in nonhybridizing species.

Table 1 Biases that may lead to empirical observations mimicking a role for restricted recombination in maintaining species

Second, chromosomal rearrangements have another biasing complication more directly associated with their recombination-reducing effect. Such rearrangements may often spread through directional selection (for example, Hoffmann and Rieseberg, 2008; Kirkpatrick and Barton, 2006). As with the spread of any adaptive variant, other sites will ‘hitchhike,’ and nucleotide diversity will be reduced near the selected site (MaynardSmith and Haigh, 1974). However, as a new chromosomal rearrangement spreads within a population, its spread will eliminate nucleotide diversity across a much wider swath of the genome because the entire segment (potentially megabases large) is linked as a single unit. The temporary reduction in nucleotide diversity within a subpopulation bearing the rearrangement will artifactually increase relative divergence measures such as FST or Da. These relative measures subtract or divide within-species diversity from total between-species divergence, so a reduction in the former will necessarily inflate the relative divergence measure irrespective of whether any interspecies gene flow has occurred.

Centromeric regions

Other recent studies have observed greater differentiation between diverging taxa near centromeres, potentially associated with their highly reduced recombination rates. This pattern has been documented repeatedly in Anopheles mosquito races (Stump et al., 2005; Turner et al., 2005; Slotman et al., 2006), but also in rabbits (Geraldes et al., 2008) and house mice (Panithanarak et al., 2004). Although conceptually similar to the observations of high divergence in rearranged regions, this pattern is distinct because centromeric regions exhibit low recombination rates both within species and in species hybrids.

However, each of the empirical studies cited above specifically documented this pattern at least in part using relative divergence measures such as FST and Da and interpreted in the context of regions of low recombination facilitating species divergence in the presence of gene flow. Regions of low recombination generally possess low nucleotide diversity within species (Nachman, 2002) resulting from recurrent hitchhiking (MaynardSmith and Haigh, 1974) or background selection (Charlesworth et al., 1993). In this context, Charlesworth (1998) elegantly described the problem of low nucleotide diversity increasing relative divergence measures, concluding that ‘FST is strongly influenced by the level of within-population diversity [and] several published cases of differences in FST among regions of high and low recombination in Drosophila may be caused in this way.’ Such regions would sustain an artificially high relative divergence even longer than the temporary artifact discussed above resulting from the spread of new chromosomal arrangements. Overall, higher relative divergence in regions of low recombination may be (1) artifactual, (2) exist even in species that do not hybridize and (3) not support a role of restricted recombination in allowing species to persist in the absence of other data (Table 1).

Differentiating water from mirages

Divergence measures

Our strongest recommendation is that researchers need to consider the inherent bias associated with using relative measures of divergence in testing the role of restricted recombination in maintaining species. As an illustration, we have compared Da (relative average divergence corrected for within species diversity: Nei, 1987) with Dxy (absolute average divergence) for the M and S races of Anopheles gambiae using the data from Stump et al. (2005) (Figure 1). Although a highly significant difference between races is apparent in Da, no significant difference is noted in Dxy. In fact, we observe a nonsignificant difference in the wrong direction: high recombination regions being more differentiated on average than low recombination regions. This result does not disprove the conclusions of the many studies of these races (Stump et al., 2005; Turner et al., 2005; Slotman et al., 2006), but it illustrates a problem of relative divergence measures. In this case, there is direct evidence of current hybridization between these races (Tripet et al., 2001), and recent introgression may have occurred.

Figure 1
figure 1

Da and Dxy in low and high recombination regions between races of the A. gambiae X chromosome. The same loci are data points in both the Da and Dxy plots, and error bars indicate standard errors.

Further, we emphasize that absolute measures of divergence are no panacea; relative measures were used in those studies specifically to factor out biases associated with within-race diversity. Using only absolute measures may be overly conservative because higher diversity within races in regions of high recombination may cause the appearance of higher divergence in such regions because of ancestral polymorphisms, consistent with the Anopheles data in Figure 1. When the two types of measures give the same answer, one can have some confidence in the interpretation, but when they give different answers, then a bias is likely affecting one measure (either by overly deflating Da or by giving a high Dxy that does not reflect divergence occurring since the species split). The difficulty in the latter situation is interpreting which measure is biased or misinterpreted.

Mapping traits differentiating the species to such regions

Models of restricted recombination maintaining species predict that trait differences between diverging races or species should map disproportionately to regions of low recombination. This pattern has been documented in the Drosophila pseudoobscura system (Noor et al., 2001b, 2001c) and Helianthus sunflowers (Lai et al., 2005). Such mapping lends further support to studies showing higher DNA sequence differentiation in such regions. However, mapping studies can be biased by very similar phenomena: associations between markers and traits will, on average, be much stronger in regions of low recombination than regions of high recombination (Noor et al., 2001a: see Table 1; Feder and Nosil, 2009). This bias can be partially alleviated through higher marker density in regions of high recombination or if one finds that the low recombination regions alone contribute effects sufficient to explain the full interspecies difference.

Similarity in divergence across multiple rearrangements

Species often differ by multiple, rather than single, rearrangements, and these systems offer a potential additional means for testing the importance of regions of restricted recombination. In such systems, one approach to differentiating ancient arrangements differentially segregating into diverging species (first problem in Table 1) from inversions arising after lineage split is to determine if absolute divergence (corrected for mutation rate) is similar across multiple arrangements, and consistently greater than in collinear regions (Kulathinal et al., 2009). Assuming the rearrangements arose at different times, consistent measures of divergence between species across all of them would suggest a model of species divergence in isolation with subsequent gene exchange and homogenization in collinear regions. However, this test assumes that the rearrangements arose at different times; if the rearrangements actually arose close in time to each other, then the test is uninformative. In addition, the test is conservative in that it assumes allopatric divergence and secondary contact—if the rearrangements were sequential and contributed to speciation in the face of gene flow, then they could exhibit quite different divergence times.

Comparing co-occurring and allopatric populations

Perhaps the most direct test for interspecies gene flow is to identify greater genetic similarity between species in populations that co-occur compared to allopatric populations, particularly in collinear regions of the genome. The restricted recombination model predicts that hybridizing species exchange genetic material in regions of normal recombination, but regions of low recombination remain differentiated because of their stronger associations with adaptive variants or barriers to gene flow such as hybrid sterility. Co-occurring populations receive this exchanged genetic material from the other species directly, and only later might foreign alleles spread to allopatric populations (for example, Nosil et al., 2003; Grant et al., 2005).

The difficulty with this test is that it requires that the populations within species exchange genetic material with each other at a rate comparable to or lower than interspecies gene flow. If intraspecies gene flow is high, then any genetic material obtained from other species will quickly spread to allopatric populations, and the ‘signature’ of introgression will not be detectable. As an illustration of this difficulty, Kulathinal and Singh (2000) found that Drosophila pseudoobscura populations co-occurring with vs allopatric to D. persimilis were similarly divergent from D. persimilis. However, a recent next-generation sequencing approach identified a slight, but marginally significant, difference in divergence from D. persimilis between co-occurring and allopatric D. pseudoobscura subspecies in collinear regions (Kulathinal et al., 2009), while inverted regions exhibited no difference in divergence, consistent with restricted recombination maintaining the co-occurring subspecies. Genetic mapping results demonstrating that hybrid sterility maps only to inverted regions in these co-occurring subspecies but to inverted and collinear regions in allopatric subspecies (Brown et al., 2004; Chang and Noor, 2007) further support this recent sequence data.

More complex models

The discussion above focused on simple approaches for detecting gene exchange between closely related species, as these have been used heavily in the context described. However, several models apply Markov chain Monte Carlo or other coalescent approaches to distinguish between shared variation through interspecies gene flow and ancestral polymorphism (Hey and Nielsen, 2004; Becquet and Przeworski, 2007). These models have also been used to infer gene exchange between species specifically in the context of the role of restricted recombination maintaining species. Although certainly more rigorous than the simple approaches described previously, these models also may bear assumptions not met in specific systems. A recent study showed that many realistic departures from the models’ assumptions can lead to erroneous inference (Becquet and Przeworski, 2009). Tests inferring introgression through the length of contiguous introgressed DNA segments (‘migrant tracts’) may be used to alleviate this problem (Davison et al., 2009; Pool and Nielsen, 2009).

Outlook

Despite the bleak picture painted here, there are compelling reasons to expect that regions of restricted recombination (as by chromosomal rearrangements) can facilitate the formation or maintenance of good species, and diverse data support this contention. However, this model still requires careful evaluation, particularly in light of recent theoretical results that suggest differences in divergence between rearranged and collinear regions of hybridizing species may only persist a few thousand generations (Feder and Nosil, 2009), because differences in rearranged regions decay from rare gene conversion or double crossovers. This recent study provides an unusual situation where, at first glance, many results in nature do not appear consistent with theoretical predictions, suggesting that further work is needed to identify the sources of inconsistency.

Further, at some level, some of what we call ‘biases’ here about the restricted recombination model may be considered ‘real’: any genomic region that becomes ‘isolated’ by lack of recombination from alternate alleles in heterozygotes is effectively a ‘genotypic cluster’ as considered in some species concepts (Mallet, 1995). This would be true for the first bias listed in Table 1: all individuals carrying a new arrangement are recombinationally isolated from individuals carrying progenitor arrangement in that region. However, in practice, no one would argue that every new chromosomal rearrangement that restricts recombination from its progenitor should form an entity that should have its bearer dubbed a new species.

That said, inferring the role of restricted recombination in species persistence for a particular system warrants extra caution, particularly given that intraspecific processes create a signature similar to one predicted by this model (Table 1). Our intention here is not to attack particular proposed cases or studies but instead to draw attention to this concern for future work and inferences. Indeed, many of the studies cited have applied multiple lines of evidence, rather than a single line, to test the hypothesis that regions of restricted recombination fail to cross species boundaries. Critical to testing this hypothesis is unambiguously identifying both that interspecies gene flow has occurred and that it happens disproportionately in regions of higher recombination. We urge caution in future studies and awareness of the likely biases, hence reducing the possibility that we will be misled by ‘mirages’ while seeking water.

Conflict of interest

The authors declare no conflict of interest.