Introduction

Understanding the evolution of asexual vertebrates requires knowledge of their genetic population, structure and dynamics. To gain this knowledge, it is necessary to understand the influence of biotype composition, reproductive mechanisms and historical processes in shaping current population patterns. Among the vertebrates, approximately 90 species of fish, amphibians and reptiles are asexual (Janko et al., 2007 and references therein) and many show hybridization and polyploidy (Schultz, 1969). Asexual species have difficulty adapting to environmental change and may become extinct because of a decline in fitness, related to the accumulation of slightly deleterious mutations, the lack of recombination (Muller's ratchet) and/or ecological processes, such as Red-Queen dynamics (Muller, 1932; Konfrashov, 1988; Butlin, 2002; Loewe and Lamatsch, 2008). At least for a short term, some asexual species combat these debilitating effects through (i) multiple hybridization events, (ii) high heterozygosity buffering the effect of gene redundancy on mutations, (iii) recurrent hybridization involving parental genotypes that increases the genetic diversity of the new species and (iv) recombination (Vrijenhoek, 1994; Comai, 2005; Bi et al., 2008). Moreover, tolerance to increases or decreases in ploidy level (Leitch and Leitch, 2008) allows easy incorporation of an additional dose of proteins provided by a third set of chromosomes (Loewe and Lamatsch, 2008). These occurrences, combined with several types of reproductive mechanisms, may explain the persistence of asexual organisms over time (Bogart et al., 2007; Lampert et al., 2008). However, the existence of these compensatory mechanisms to generate the variation needed for adaptation to changing environments does not mean that asexual taxa will avoid extinction in all cases (Loewe and Lamatsch, 2008).

The Squalius alburnoides complex displays a number of characteristics that make it an extraordinary example of viability and evolutionary success among polyploid hybrid taxa.

This Iberian cyprinid fish has descended from an interspecific hybridization between Squalius pyrenaicus females (P genome) and males of an extinct species (A genome) that has been placed in a sister taxon of Anaecypris hispanica (for example, Crespo-López et al., 2007). Sympatric bisexual Squalius species act as sperm donors (S. pyrenaicus in the southern basins and S. carolitertii in the northern ones—C genome), contributing new genetic material (Alves et al., 2001). In contrast to other asexual species, the S. alburnoides complex is not strictly asexual (Cunha et al., 2008 and references therein). In fact, fertile females and males are found, and often there are functional pathways towards sexual reproduction involving recombination, meiosis and syngamy.

Diploid (for example, CA, PA), triploid (for example, CAA, PAA) and tetraploid (for example, CCAA, PPAA) biotypes are found in S. alburnoides, including both females and males that are fertile, with a combination of several reproductive modes. In southern populations, triploid females show meiotic hybridogenesis with random segregation and recombination between non-excluded genomes, producing haploid and, more rarely, diploid gametes. The diploid females always produce diploid hybrid eggs, although a few may develop by gynogenesis (3%). The diploid eggs yield triploid progeny after fertilization (Alves et al., 2001). As in other asexual complexes (Ambystoma: Bogart and Licht, 1986; Bogart, 1989; Rana: Hotz et al., 1992; Phoxinus: Goddard and Schultz, 1993), the S. alburnoides complex has regenerated and maintained the extinct parental species genotype (AA, all males) through the fertilization of A ovocytes (produced by PAA females) by reduced A sperm (produced by AA males) (Alves et al., 2002). This AA genotype is apparently absent from the northern populations. The reproductive mechanisms described in the northern populations include hybridogenesis in hybrid diploids and triploids from the Douro basin (Carmona et al., 1997) and meiotic hybridogenesis in triploids from the Mondego basin (Pala and Coelho, 2005).

Most analyses of mitochondrial DNA sequences and allozyme markers have indicated multiple, unidirectional, independent hybridization origins of the S. alburnoides complex (Carmona et al., 1997; Cunha et al., 2004; Alves et al., 1997a, 1997b); however, results differ regarding the number of hybridization origins. Alves et al. (1997b) suggest two hybridization origins (Guadiana-Tagus and Sado), whereas Cunha et al. (2004) suggest five historically independent hybridization events in the following areas: the Alagón Tagus tributary-Douro, Mondego-Tagus, Guadiana-Guadalquivir, Sado, and Quarteira, with subsequent dispersal by colonization at the end of the Pleistocene. Recently, however, a study based on partial cytochrome b sequences (Sousa-Santos et al., 2007) interpreted haplotype sharing among some populations of S. alburnoides as a consequence of a recent single origin for the complex (700 000 years ago) in the Upper Tagus-Guadiana area, followed by dispersion. Therefore, the precise number and time of hybridization origins of the S. alburnoides complex is still up for debate.

Previous studies used microsatellites to analyse patterns of genetic diversity and similarity between S. alburnoides biotypes with different ploidy levels and their parental species (Pala and Coelho, 2005; Crespo-López et al., 2007). These studies revealed the high genetic diversity of southern Portuguese populations from the Tagus and Guadiana basins and confirmed the intricate pathways of genetic exchange that contribute to the continuous shifting and maintenance of biotype diversity. However, S. alburnoides populations from the Mondego River, a small, independent Portuguese basin situated in the northern range of this complex, showed much lower genetic variability than southern populations. AA males do not exist in the Mondego populations, limiting the chance of introducing new genetic combinations in the A genome. The loss of genetic diversity in these populations may threaten their survival (Pala and Coelho, 2005).

The distinct reproductive mechanisms, biotype population compositions and potentially different hybridization origins within the distribution range of this complex provide an excellent system with which to infer the relative influence of contemporary (for example, genetic drift and genetic flow) and historical factors (for example, glaciations and past fragmentation) on the long-term persistence of S. alburnoides in the Iberian Peninsula, as well as on the population structure of polyploid hybrid complexes. Moreover, freshwater systems provide exceptional opportunities to study the distribution of genetic diversity, in part because drainage structure can restrict gene flow among populations in different basins and preserve historical ‘prints’ of colonization events (Suk and Neff, 2009).

Here, for the first time, we use microsatellite DNA markers to examine the biotype composition, genetic diversity and population structure of S. alburnoides from the Douro River basin. The Douro is the largest basin in the Iberian Peninsula and the longest river in the northern range of S. alburnoides, where it lives in sympatry with S. carolitertii. We also analysed other northern and southern S. alburnoides populations, as well as their sperm donors (S. carolitertii and S. pyrenaicus), to investigate the evolutionary history of polyploid hybrid S. alburnoides populations. We aim to understand the effect of contemporary and historical factors on the population genetics of this asexual complex and to tentatively explore its hybrid origins and history of colonization. To achieve these goals, our data were compared with previous analyses based on mitochondrial DNA and microsatellites. For instance, if S. alburnoides from the south colonized the northern Iberian Peninsula, we would expect a loss of genetic variability during this process. Moreover, if the colonization of the north were relatively recent, as proposed by some authors, less population structure in the north would be predicted.

Materials and methods

Sample collection and laboratory procedures

A total of 701 specimens were obtained from the locations indicated in Figure 1: 83 S. carolitertii (9 Manzanas, 24 Rabaçal, 15 Tâmega, 24 Lodeiro, 11 Paiva) and 248 S. alburnoides (36 Manzanas, 54 Rabaçal, 39 Tâmega, 96 Lodeiro, 23 Paiva) from the Douro basin; 41 S. carolitertii (29 Ceira, 12 Alva) and 157 S. alburnoides (92 Ceira, 65 Alva) from the Mondego basin; 20 S. pyrenaicus and 43 S. alburnoides from the Tagus basin (Almonte); and 32 S. pyrenaicus and 48 S. alburnoides from the Guadiana basin (Zujar). We also examined a population of A. hispanica (29), the closest relation to the paternal ancestor of S. alburnoides, from the only basin where this species still exists, Guadiana (Zujar).

Figure 1
figure 1

Map of the Iberian Peninsula showing the collection sites and distribution area of the S. alburnoides complex. The textured area represents the distributional range of the complex; light grey indicates the distribution of the northern S. carolitertii (sperm donor); dark grey indicates the distribution of the southern S. pyrenaicus (sperm donor). The pies illustrate the percentage of each S. alburnoides biotype in the populations sampled: (1) Manzanas River, (2) Rabaçal River, (3) Tamega River, (4) Lodeiro River, (5) Paiva River, (6) Ceira River, (7) Alva River, (8) Almonte River (Tagus) and (9) Zujar River (Guadiana).

The ploidy level determination of all specimens was made using flow cytometry of blood cells as described by Collares-Pereira and Moreira da Costa (1999). Blood samples were stabilized in buffer and immediately frozen at −80 °C. Chicken erythrocytes were used as both an internal and external standard. The sex of each fish was determined by examining the gonads under a dissecting microscope.

For microsatellite analyses, total DNA was extracted from fins preserved in absolute ethanol or from frozen muscle tissue using the standard procedures outlined by Sambrook et al. (1989). Eight microsatellites were investigated for S. alburnoides: n7k4, n7j4, e2f8, e1g6 (Mesquita et al., 2003; Pala and Coelho, 2005), lco1, lco3, lco4 and lco5 (Turner et al., 2004). PCR reactions were performed as described by Pala and Coelho (2005) except for LCO1, which was amplified in 10 μl volumes containing 25–50 ηg of template DNA, 0.2 mM of each dNTP, 2 mM MgCl2, 1 U Taq DNA polymerase (Invitrogen, Paisley, UK) and 0.2 μM of each primer. PCR products were electrophoresed in a CEQ 2000XL automatic sequencer (Beckman Coulter, Brea, CA, USA) and the molecular weights of alleles determined using CEQ fragment analysis software.

The information obtained on microsatellite loci and ploidy levels allowed us to identify the genome copy number of the intergeneric hybrid complex (for example, CAA or CCA, PAA or PPA) through the identification of specific alleles from the A, C and P genomes. Previous work has shown this method to be accurate for S. alburnoides (Pala and Coelho, 2005; Crespo-López et al., 2007; Cunha et al., 2008) and for other asexual complexes (Christiansen, 2005; Lampert et al., 2006; Ramsden et al., 2006). The assignment of alleles to each species was confirmed with the analysis of hybrids, especially diploids, in which the alleles attributed to each genome (C, P and A) should be present. Data sets from Paiva, Lodeiro and Ceira, Alva populations were compiled from Cunha et al. (2008) and Pala and Coelho (2005), respectively. The LCO1 locus was not available for the Mondego populations and was thus excluded from further analyses. A contingency χ2-test was performed using SPSS Statistics 17.0 software (SPSS Inc., Chicago, IL, USA) to investigate whether biotype frequencies differed among basins.

Statistical analyses

The genetic diversity of the S. alburnoides biotypes was measured with FDASH (Obbard et al., 2006), which is suitable for the analysis of allopolyploids in which allele dosage cannot always be determined. The observed heterozygosity (Ho), number of phenotypes (NoPhen), average number of unshared alleles between each pair of individuals within populations (H′S) and diversity of allelic phenotypes (HPhen) within samples, averaged across samples (Obbard et al., 2006) were estimated.

Possible scoring errors due to stuttering, large allele dropout or the occurrence of null alleles were assessed using Micro-Checker 2.2 (Oosterhout et al., 2004) for S. carolitertii, S. pyrenaicus, A. hispanica and for the reconstituted parental species (AA males).

The following analyses were performed for all progenitor genomes of S. alburnoides, the A, C and P genomes. In the allotetraploids, the two genomes were analysed separately because they contain genetic information arising from the two progenitor species. In triploids, only the homospecific genome was analysed. To eliminate the effect of sample size, unbiased allelic richness and private allelic richness were estimated using the rarefaction method implemented in HP-RARE 1.0 (Kalinowski, 2005) for the C, P and A genomes. Intra-population genetic diversity of the three nuclear genomes of S. alburnoides was evaluated by estimating gene diversity and observed heterozygosity using GENEPOP 3.4 (Raymond and Rousset, 1995). This software was also used to test for deviations from Hardy–Weinberg equilibrium, applying the Exact test with the default settings of the Markov chain Monte Carlo procedure.

Kruskal–Wallis, Mann–Whitney U and Wilcoxon tests were used to determine whether the measures of genetic diversity differed between subpopulations. These analyses were performed with SPSS Statistics 17.0 software (SPSS Inc., Chicago, IL, USA). The modified false discovery rate procedure by Benjamini and Yekutieli (2001) was used to correct multiple comparisons.

To examine the extent of genetic differentiation and structuring, multiple approaches were applied to the A genome because it is the only genome present across the entire distribution of S. alburnoides. The A genome has been maintained since the formation of the complex, allowing comparisons between all populations and inferences about historical processes to be made. Genetic differentiation between pairs of populations was quantified through pairwise FST estimates based on variance in allelic frequencies using FSTAT 2.9.3 (Goudet, 2001). The interpretation of the FST values from multi-allelic data is problematic because their maximum values depend on the amount of within-population variation and even in the absence of any shared allele often fails to reach the theoretical maximum of 1 (Hedrick, 2005; Meirmans, 2006). We therefore calculated pairwise Theta′ (FST) following Meirmans (2006), using the program RecodeData (Meirmans, 2006), an estimator that is highly comparable with GST. This standardized measure ranges from 0 to 1 and therefore makes interpretation of the degree of subdivision much easier and facilitates comparing results among studies. A Mantel test for matrix correspondence was used to test for correlation between the pairwise FST and Theta′ estimates (similar to GST).

Genetic differentiation was also estimated using mutational differences among alleles, assuming the stepwise mutation model by computing pairwise standardized RST estimates with RSTCALC (Goodman, 1997). We examined the effect of allele sizes on the levels of genetic structuring following Hardy et al. (2003). This test compares observed estimates of RST with simulated values of ρRST, in which allele size classes have been randomized, resulting in measures analogous to FST. The tests were performed with 10 000 permutations using Spagedi 1.2 (Hardy and Vekemans, 2002). The value of RST is mainly determined by extra mutations that accumulate within each population. The opportunity for such mutations will largely depend on the time since population divergence. The RST estimates will generally be larger than FST estimates when divergence time has been sufficient for mutations to accumulate. In contrast, both parameters should yield relatively similar values when the divergence time has been too short for neutral mutations to rise to high frequencies. In such cases, genetic drift and gene flow are likely to be more important than mutation in determining the extent of population divergence.

The population structure was further analysed without a priori clustering of individuals into populations using the Bayesian program Structure (Pritchard et al., 2000), by varying the number of clusters in the data set (K) and assigning admixture proportions of each individual to these clusters. Ten independent runs were carried out for each value of K (1 to 9) clusters. Markov Chain Monte Carlo was run for 106 iterations after burn-in periods of 105 iterations. The best value of K was inferred from ln Pr[X/K] (Pritchard et al., 2000) and the modal value ΔK (Evanno et al., 2005). A Bayesian assignment test was performed using Geneclass 2.0 (Piry et al., 2004) software, which gives the probability of each individual belonging to the reference population. The computation followed the Bayesian method of Rannala and Mountain (1997), on a simulation of 10 000 individuals.

To understand the hierarchical partitioning of genetic structure, an analysis of molecular variance was performed using Arlequin 3.0 (Excoffier et al., 2005). Groups were defined to assess population configurations and their geographical distribution.

To estimate isolation by distance, the genetic distances of RST/(1−RST) and FST/(1−FST) were regressed against the minimum stream distance between populations. Significance of the relationship was evaluated by a Mantel test with 10 000 permutations performed in the Isolde program included in Genepop.

Genetic relationships among polyploid populations and sperm donors, avoiding the effect of different ploidy levels, were determined as DTL distances (Tomiuk and Loeschcke, 1991) estimated using PopDist (Guldrandtsen et al., 2000). Genetic relationships among populations of diploid and triploid biotypes and the A genome were inferred from the Edwards Cavalli–Sforza–DCE distance (Cavalli-Sforza and Edwards, 1967) and through shared allele distance, DAS (Sai-Chakraborty and Jin, 1993). These distances were calculated with the Populations 1.2.28 software (Langella, 2002). The neighbour-joining clustering method was used to construct the phylogenetic trees and the tree-topology support was assessed by bootstrapping (1000 iterations).

Recent population bottlenecks were inferred by the program Bottleneck (Piry et al., 1999). As is recommended when analysing a small number of microsatellites, the Wilcoxon rank-sign test was used to test the significance of heterozygote excess fewer than three different models of microsatellite evolution, the infinite allele model, the stepwise mutation model and a two-phase model.

Results

Biotype composition

Microsatellites proved to be a powerful tool to identify the various S. alburnoides biotypes (Figure 1). Genotypes could be determined for 94% of the individuals. The loci LCO3, LCO4, E1G6 and LCO1 were extremely useful for discriminating the complete genotypes in diploid and polyploid biotypes.

The populations distributed across the geographical ranges of the two sperm donors were generally composed of the same biotypes. However, there were no diploid nuclear non-hybrid males (AA males) in the northern populations, and no tetraploid or diploid hybrid males were found in the Guadiana basin (see Figure 1). Differences in the frequency and distribution of biotypes were also observed between river basins (Figure 1).

According to flow cytometer and microsatellite data, Douro populations (except Lodeiro and Paiva) and Mondego populations were mainly composed of CAA triploid females (>70%, Figure 1). However, the Mondego and Douro basins showed significant differences in population composition (Figure 1) (χ2=90.7, P<0.001). The Douro region had a significantly lower (P<0.05) percentage of diploid forms (mainly males) with only 7%, while the Mondego region had around 20% of diploids (mainly males). The proportion of CCA biotypes in the Douro basin (13%) was significantly larger (P<0.05) than observed in the Mondego basin (6%). However, the most marked difference between the Douro and Mondego basins was the absence of tetraploids from the Mondego, whereas in the Douro, the Lodeiro and Paiva populations were mainly composed of symmetric tetraploid individuals (71 and 86%, respectively).

Sex ratio

A sex ratio deviation was observed among S. alburnoides triploids: CAA/PAA were mostly females and CCA were mostly males (Supplementary Table S1). This finding suggests a relation between genome dosage and sex. However, it is not known how sex is determined in S. alburnoides. Previous studies indicated that the mechanism of sex determination cannot be fully explained either by female or male heterogamety (Alves et al., 1998, 1999) apparently being strictly a genetic mechanism (Pala et al., 2009). It seems that sex ratio in triploid forms is influenced by other Squalius genomes (C and P), either as a consequence of lower fitness of the AA, CAA/PAA males or CCA/PPA females and/or by the mechanism of sex determination.

Genetic diversity and population structure

Micro-Checker analyses revealed no evidence of scoring errors in any loci except E2F8, which was thus omitted from the analysis. Therefore, a set of six microsatellites was used for further analysis. Overall, the triploids showed significantly higher phenotype diversity than the hybrid diploids and tetraploids (Mann–Whitney's test, P<0.025). Significantly higher diversity of phenotypes was observed in triploids from Rabaçal (Mann–Whitney test, P<0.041) and Manzanas (Mann–Whitney test, P<0.033) in contrast to hybrid diploids (Supplementary Table S1). In most sampled populations with CAA, this biotype showed the highest HPhen and H′S values (Supplementary Table S1).

The observed heterozygosity value of 1.0 per locus indicated a hybrid origin of the biotypes. Values lower than 1.0 were occasionally observed. Since no null alleles were detected and recombination between genomes was unlikely because of the observed reproductive modes, the values below 1.0 were probably because of the presence of shared alleles between genomes.

In all, 110 alleles were found for the A genome: 48 alleles in the northern populations and 79 alleles in the southern populations (with sharing of alleles between populations). Significant differences were found among river basins for all genetic diversity parameters analysed (Kruskal–Wallis test for allelic richness, private allelic richness and gene diversity, P<0.011 in all cases). The southern river basins showed higher genetic diversity than the northern ones in all parameters analysed (Table 1) and the differences between these regions were significant (Mann–Whitney U tests, all P<0.002). For the C and P genomes, 46 and 61 alleles were found in northern and southern populations, respectively. Among the Douro populations, Rabaçal showed significantly higher allelic richness, private alleles and gene diversity in the A genome than all other populations (Wilcoxon signed-rank tests, all P<0.05). Rabaçal also showed significantly higher allelic richness, private alleles and gene diversity in the A genome than the two Mondego populations (Wilcoxon signed-rank tests, both P<0.046), whereas no significant differences were found in the Tagus and Guadiana basins. For the C genome, the genetic diversity of the Rabaçal was significantly higher than the Lodeiro and Paiva (Wilcoxon signed-rank tests, P<0.046), whereas there were no significant differences in the Mondego populations. The Paiva population showed significantly lower genetic diversity values than the other populations for the C and A genomes (Mann–Whitney U tests, all P<0.035).

Table 1 Intra-population genetic diversity of the biotypes in the S. alburnoides complex

After false discovery rate correction a significant deviation from Hardy–Weinberg equilibrium in the C genome was noted for Alva and Tâmega because of a heterozygote deficit across all loci (P<0.041). For the A genome, Tâmega, Rabaçal, Tagus and Guadiana showed significant deviation from Hardy–Weinberg equilibrium across all loci (P<0.022) as the result of a heterozygote deficit. The heterozygote deficit found in the A genome was probably a consequence of asexual reproductive mechanisms (Menken et al., 1995). The deficit detected across all loci (P<0.001) of the C genome in the Alva and Tâmega populations could have been the consequence of a Wahlund effect.

The pairwise standardized genetic differentiation values (highly comparable to G′ST) were in general higher than the FST ones; however, they showed the same pattern (Supplementary Table S2). In fact, a significant correlation was found between these two genetic differentiation measures (r=0.925, P<0.001) and thus only FST values were used subsequently.

The null hypothesis of RST=FST over all loci was rejected (RST=0.894>ρRST=0.595, 95% confidence interval 0.267–0.863, P<0.001), suggesting that mutations contributed to genetic differentiation and that the mutation process at least partially followed a stepwise mutation model. The significant global RST (P<0.0001) over all loci suggested the population structure in S. alburnoides. The RST estimates were significantly higher than the FST estimates among basins (Wilcoxon signed-rank tests, P<0.014) and lower within the Douro and Mondego basins (Table 2). All pairwise comparisons made among Douro populations except the Lodeiro and Paiva showed significant FST values. Within the Douro basin, significant FST values ranged between 0.089 and 0.657 (P<0.001). The highest FST value was between Lodeiro and Manzanas, whereas the lowest differentiation was observed between Manzanas and Tâmega. Lodeiro and Paiva showed no significant differentiation (FST=0.000, P>0.05).

Table 2 Pairwise genetic distance matrices for the A genome of S. alburnoides complex populations based on FST (below diagonal) and RST(above diagonal) estimators

Bayesian clustering analyses detected the highest likelihood and a modal value of ΔK for the model K=6 (Figure 2), and the probability of individuals’ assignment was high (P>0.8). At values of K=6, most individuals from a location were clustered into their a priori population of origin. The exceptions were three single clusters of Lodeiro and Paiva, Tâmega and Manzanas, and Ceira and Alva. The Bayesian assignment test assigned 67.6% of the individuals to their own source locations and no individuals were assigned to river basins outside their sources.

Figure 2
figure 2

Bayesian clustering of S. alburnoides samples (A genome). (a) Log probability of data, L(K). (b) Ad hoc statistic ΔK based on the rate of change in the log probability, as a function of K over 20 runs for each K value (putative different clusters tested).

The hierarchical analysis of molecular variance revealed that the proportion of among-group variance was higher when populations were divided into six and four groups: (1) Rabaçal, (2) Manzanas-Tâmega, (3) Loudeiro-Paiva, (4) Ceira-Alva, (5) Tagus and (6) Guadiana basins; and (1) Douro, (2) Mondego, (3) Tagus and (4) Guadiana basins (Table 3). For the above groups, 51.4 and 47.4% of the variance was distributed within populations, respectively, and only 5.7 and 14.3% among populations, respectively.

Table 3 Hierarchical analysis of molecular variance (AMOVA) for four alternative groupings of populations, based on the A genome

The neighbour-joining tree built using the genetic distance DTL clearly revealed clusters with a geographical pattern for S. caroliterti, S. pyrenaicus and S. alburnoides biotypes. The reconstituted parental species (AA) are in a cluster close to the A. hispanica cluster, as was expected, because A. hispanica is a sister taxon to the extinct paternal ancestor of S. alburnoides (Figure 3).

Figure 3
figure 3

Neighbour-joining tree based on the DTL distance for diploids and polyploids, their two sympatric sperm donors (S. carolitertii, S. pyrenaicus) and A. hispanica, a sister taxon of the extinct paternal ancestor of S. alburnoides.

Neighbour-joining trees constructed using DAS (not shown) and DCE genetic distances revealed identical topologies for the A genome, indicating a geographical structure similar to the one obtained above (Figure 4). The exceptions were trees obtained for triploids, which showed a clear effect of the copy number of each genome instead of a geographical pattern. For instance, CCA biotypes from the Douro and Mondego basins were more closely related than the CCA and CAA biotypes within each basin (Supplementary Figure S2c).

Figure 4
figure 4

Neighbour-joining tree, based on DCE microsatellite genetic distances for the A genome, among populations of S. alburnoides from the Douro, Mondego, Tagus and Guadiana River basins. Branch lengths are proportional to the genetic distances between populations. Numbers indicate nodes with bootstrap support higher than 50% in 1000 replications.

The Mantel test showed no significant correlation between genetic distances (RST and FST) and geographical distances [RST (r=0.140, R2=0.019, P=0.075); FST (r=0.046, R2=0.002, P=0.196] (Supplementary Figure S4). Bottleneck tests showed no evidence of recent population bottlenecks in any of the populations.

Discussion

Asexual taxa are supposed to accumulate deleterious mutations, lose uncorrupted genotypes by chance and have limited ability to adapt to rapid environmental changes. However, ‘little sex’ seems to be sufficient for avoiding the accumulation of mutations and for maintaining genetic variability (Hurst and Peck, 1996). As such, strictly asexual taxa are very rare and most do not completely avoid sexual reproduction, which could be the key to their long persistence. There are many examples of hybrid–polyploid complexes that are not strictly asexual. For instance, Poecilia formosa shows paternal leakage of undamaged DNA from its sexual sister species (Loewe and Lamatsch, 2008), and the Ambystoma complex maintains various intergenomic exchanges between A. laterale and A. jeffersonianum genomes in unisexual individuals (Bi and Bogart, 2006; Bi et al., 2007). In the allopolyploid S. alburnoides complex there are several reproductive mechanisms that, in association with different ploidy levels, can generate new genetic combinations.

The breeding dynamics of S. alburnoides allow bidirectional movement of alleles between different polyploid levels, which avoids the fatal accumulation of deleterious mutations and allows the incorporation of novel genetic variability, making this complex an example of viability and evolutionary success among polyploid hybrid taxa. There are, however, geographical variations in the mechanisms shown by S. alburnoides to overcome the disadvantages of asexuality (Alves et al., 1998; Pala and Coelho, 2005; Crespo-López et al., 2006; Cunha et al., 2008). An uncommonly high number of pathways involving recombination, meiosis and gamete syngamy generate new genetic material in the southern populations (reviewed by Alves et al., 2002). Our results confirmed high genetic diversity in the southern populations because of the incorporation of P alleles from the bisexual S. pyrenaicus. Also, meiotic hybridogenesis and gametogenesis in AA males, that allow for recombination of A alleles, are highly relevant for the maintenance of genetic variability. The meiotic hybridogenesis has also been observed in other asexual taxa, such as Rana esculenta (Graf and Pelaz, 1989) and Misgurnus anguillicaudatus (Morishima et al., 2008). It causes the diploidization of triploids, which can explain their persistence in nature (Otto, 2007; Pala et al., 2008).

The success of triploids in nature is mainly related to the effect of heterosis. The maintenance of a permanent heterozygote condition results from the absence of inter-genomic recombination and gene redundancy (Comai, 2005), which may be a crucial intermediate step in the formation of even-ploidy sexual lineages (Mable, 2003). The S. alburnoides complex, like other hybrid complexes such as Ambystoma, Rana esculenta and Phoxinus (Berger, 1973; Bogart et al., 1985; Goddard et al., 1989), has a prevalence of triploid females in most populations, accompanied by the loss of normal sexual reproduction. Triploidy in S. alburnoides was shown to be an intermediate step for the formation of tetraploid lineages (Alves et al., 2004; Cunha et al., 2008) and, consequently, it has evolutionary importance. In fact, the Douro basin is the first place where established populations mainly composed of symmetric tetraploids of both sexes (1:1 sex ratio) were observed (Lodeiro and Paiva, Figure 1, Supplementary Table 1) in a process of speciation adapted to specific habitat conditions (Cunha et al., 2008, 2009). In other Douro populations, such as in Tâmega, the occurrence of symmetrical tetraploid individuals may also indicate the stabilization of new bisexual polyploid lineages. The return to a balanced genome may allow adaptation and long-term evolution of these populations. Tetraploidization has apparently also arisen in the Tagus River basin, either through triploidy or diploidy (Alves et al., 1999, 2004). However, there was no evidence of selfsustaining populations of tetraploids in this southern basin. This is probably because of environmental conditions that characterize this region, namely a heterogeneous annual hydrological cycle characteristic of the circum-Mediterranean region and habitat loss and hydrological disturbance that increases competition among the S. alburnoides biotypes for the sympatric sperm donor, S. pyrenaicus. The pathways for evolution and speciation by tetraploidization are also apparently unfeasible in the Mondego and Guadiana Rivers because this ploidy was not found in any individuals from this region. Such findings and future studies will help to further explore the persistence of triploid forms in nature and their importance in the maintenance of polyploid hybrid complexes.

The genetic diversity of the northern populations of S. alburnoides results from the incorporation of C alleles arising from the sexual S. carolitertti in each generation and the diversity of reproductive pathways. However, the low frequency of hybrid males and the absence of AA males make the northern populations more dependent on the sympatric sexual species, S. carolitertii. Thus, the northern hybrid populations have more competition with the sexual species that probably reduces their dispersal potential and settlement, similar to what has been proposed for the spiny loaches (Janko et al., 2005). The absence of AA males and of symmetrical tetraploids from the Mondego populations (Figure 1) reduces the incorporation of new genetic material in the A genome, resulting in decreased genetic diversity compared with the southern populations and perhaps restricting their long term persistence (Pala and Coelho, 2005). In the Douro populations the only way to recruit new genetic combinations for the A genome is through gametogenesis in symmetric tetraploids because the diploid and the triploid hybrid females do not show meiotic hybridogenesis (Supplementary Figure S3) and AA males are absent. Therefore, much lower genetic diversity would be expected than in the southern and Mondego populations, resulting in reduced adaptative potential and viability. However, this trend was not observed, indicating that other processes such as historical factors can also influence the viability of northern populations.

The tertiary orogeny of the Iberian Peninsula and subsequent glaciations have led to important changes in climate and topography (Taberlet et al., 1998; Loidi, 1999; Thompson, 1999). These factors had a significant impact on contemporary distribution patterns and genetic variation of a great number of species (for example, Lacerta schreiberi: Paulo et al., 2002; Godinho et al., 2008; Chioglossa lusitanica: Alexandrino et al., 2000, 2007). Low mitochondrial DNA diversity has been found in fish populations from the northern Iberian Peninsula, such as Pseudochondrostoma duriensis and S. carolitertii, and in other northern fish populations that formerly inhabited glaciated regions (Brito et al., 1997; Bernatchez and Wilson, 1998; Anger and Schlosser, 2007; Mesquita et al., 2007; Aboim et al., 2009). Microsatellite data indicate lower genetic variability in the northern S. carolitertii (C genome) compared with S. pyrenaicus (P genome) from the south, which may reflect the climatic fluctuations felt in the northern Iberian region during the Pleistocene glaciations (Williams et al., 1988; Pérez-Alberti et al., 2004). When temperatures decreased, bottleneck events may have led to high mortality and a drastic decrease of genetic diversity in the S. carolitertii populations.

The generation of genetic diversity in S. alburnoides can result from the accumulation of genetic differences among hybrids that derived from a common ancestor, or from multiple origins of hybrids arising from the sexual ancestors. Alves et al. (1997b) and Cunha et al. (2004) suggested multiple hybridization origins for the S. alburnoides complex, with subsequent dispersal by colonization at the end of the Pleistocene (Cunha et al., 2004). However, a scenario of recent single origination (700 000 years ago) in the Upper Tagus-Guadiana area followed by dispersion has been recently suggested (Sousa-Santos et al., 2007). These authors also inferred that the Mondego and Douro Rivers could have been colonized from the Tagus river around 0.05 and 0.01 million years ago, respectively. If S. alburnoides colonized the north of the Iberian Peninsula from the south, a loss of genetic variability would be expected during this process. Moreover, if the colonization of the north was relatively recent, less population structuring in the north would be expected because of the effects of shared polymorphisms. The geographical distribution of microsatellite allelic richness in S. alburnoides found in this study did not entirely fit a simple south–north colonization scenario, because allelic richness was not significantly lower in populations theoretically colonized more recently, as is true of the Douro basin. Moreover, we observed genetic structuring of river basins that was highly supported by overall RST and corroborated by Bayesian assignment tests, STRUCTURE and analysis of molecular variance results. Neighbour-joining phylograms confirmed the phylogeographical structure of S. alburnoides obtained from the complete mitochondrial cytochrome b (Alves et al., 1997b; Cunha et al., 2004). The Mondego populations fit the trend of lower genetic diversity in the A genome when compared with the southern populations, possibly as a consequence of their colonization from the Tagus, supporting the hybridization origin scenarios suggested by Cunha et al. (2004) and Sousa-Santos et al. (2007). These populations also display fewer mechanisms for incorporating new genetic combinations in the paternal genome (A genome) than the southern populations. Colonization from the Tagus River would have caused founder effects and genetic drift, which could explain the reduced genetic variability of the Mondego basin populations and their high genetic differentiation from the other basins.

However, our findings at the edge of the northern distribution of the S. alburnoides complex (Douro basin) contradict the general trend observed in many species (for example, Hewitt, 1996; Paulo et al., 2001; Horn et al., 2006) of decreased genetic variability towards the north. The unusually high levels of genetic differentiation of Douro populations (except the Rabaçal) from the other basins, when combined with low levels of genetic diversity, are consistent with the idea that these populations were subject to harsh genetic founder effects followed by genetic drift. Nevertheless, it is worth noting that a correlation between genetic differentiation and geographical distance was not observed. This result does not imply, however, that isolation by distance would not be the best model to describe the genetic differentiation associated with the spatial distribution of these populations. The lack of correlation could be an artefact caused by the high genetic differentiation found within the Douro River basin. However, when we tested isolation by distance between the Rabaçal River (the more genetically diverse population in the Douro basin) and the other basins, this model was again not supported (data not shown). Thus, the results suggest that differences in genetic variation found in S. alburnoides derived not only from genetic drift and mutation associated with their population dynamics, but also from historical events associated with range shifts and Pleistocene climatic fluctuations.

The populations within the Douro basin varied the most in allele frequencies (that is, they showed higher FST values than RST), and they displayed a paucity of private alleles as they moved away from Rabaçal. The north-western Rabaçal River population showed high genetic diversity (A, private allelic richness and He) (Table 1), similar to that shown by the southern populations, and reduced genetic variability was observed as we moved away from Rabaçal. The evolutionary divergence of populations can be influenced by the opposing effects of migration, which tends to homogenize populations, and genetic drift, which leads to increased population differentiation (Allendorf and Luikart, 2007). Our data seem to indicate that the genetic diversity of populations adjacent to Rabaçal, such as Tâmega and Manzanas, is because of the contributions from its genetic pool that decreased during successive founder events. The tetraploid populations of Lodeiro and Paiva showed strong evidence of founder events, including a reduction in the number of alleles and high frequencies of other alleles (Supplementary Figure S1). These populations seem to have been colonized from the neighbouring population of Tâmega, but this colonization was not recent because bottleneck tests did not detect recent founder events. Indeed, despite the high degree of genetic differentiation of these populations from the neighbouring population of Tâmega, the assignment tests indicated that individuals from these two populations could belong to Tâmega.

Higher levels of heterozygosity and a wider range of allele sizes in Rabaçal (Supplementary Figure S1) suggest more stable demographic conditions. Higher RST values (compared with the FST ones) were recorded for Rabaçal than the southern populations, indicating a sufficiently long period for the divergence of accumulated mutations. Therefore, our results suggest that Rabaçal constituted a northern Pleistocene refugium whose population re-colonized the western populations of the Douro basin after glacial retreat. The possible existence of a Douro refugium is also indicated by Podarcis bocagei (Sá-Sousa, 2001), Iberolacerta spp. (Godinho, 2003), other freshwater fish (Gómez and Lunt, 2006) and the presence of thermophilic plants in this region (Costa et al., 1999). The Douro region was intensely affected by Pleistocene glaciations, while the Rabaçal region was able to maintain warm conditions in some valleys and tributaries during the Würm glaciation (Costa et al., 1999). We therefore propose that S. alburnoides populations arose in Douro before or during the Pleistocene glacial cycles and that during glacial times they remained in a single refugium within the Rabaçal region.

Historical events have not only contributed to clear geographical patterns of genetic variability among and within the northern and southern populations of S. alburnoides, but they have also contributed to differential patterns of genome composition. Our findings indicated the capacity of microsatellite analyses to identify recent historical re-colonization events by an ancestral population from at least one glacial refugium in the northern range of the S. alburnoides complex. As in other asexual taxa, the long-term persistence of S. alburnoides depends mostly on genetic diversity, and new hybridization events can help stop the decay of genome variability in S. alburnoides populations. However, not all populations have the same potential for survival because long-term viability of southern and northern populations depends not only on their genetic legacy, but also on population dynamics that may allow the introduction of fresh genes that can eventually lead to new speciation events. The combination of these processes may allow long-term survival of S. alburnoides, and therefore, understanding these contributions is central to our knowledge of this and other hybrid complexes.