Introduction

While speciation is often defined as the evolution of reproductive isolation (Mayr 1942; Coyne and Orr 2004), recent empirical and theoretical work has demonstrated that distinct species can persist in the presence of gene flow. This phenomenon may occur by sympatric divergence with gene flow (primary gene flow; Niemiller et al. 2008; Martin et al. 2013), or more commonly is the result of allopatric speciation followed by secondary contact (secondary gene flow; Tarroso et al. 2014; Grossen et al. 2016). Many species are maintained in nature despite some level of gene exchange, either coexisting in sympatry due to niche and/or phenotypic divergence (e.g., Whittemore and Schaal 1991; Milne et al. 1999; Neaves et al. 2010; Hochkirch and Lemke 2011); or replacing each other abruptly at narrow contact zones (e.g., Szymura and Barton 1986; Irwin et al. 2009; Tarroso et al. 2014; Grossen et al. 2016). Thus, it has been increasingly accepted that speciation can occur without complete reproductive isolation (Mallet 2008; Pinho and Hey 2010) and complete reproductive isolation is not a pre-requisite for most current species definitions such as the Evolutionary Species Concept (Simpson 1961) or even for most current versions of the Biological Species Concept originally proposed by Mayr (1940, 1942). This perspective calls for a new vision on gene flow in fields like evolutionary biology, ecology and conservation biology (VonHoldt et al. 2018).

The ultimate fate of two lineages that come into secondary contact depends on the strength of the barriers to gene flow. In empirical studies, three key questions arise when closely related lineages meet: (i) do they hybridize (i.e., is there interspecific mating forming F1 hybrids)? (ii) does this lead to introgression (i.e., occurrence of backcrosses leading to persistence of gene flow)? (iii) are there intrinsic or extrinsic barriers to gene flow that “protect” each genome from complete “invasion” by the other gene pool? Answering these questions will inform on the future evolutionary trajectories of the lineages involved (Wu 2001), i.e., whether they are likely to persist as distinct lineages or to fuse back in the future. In this context the amount of gene flow depends, among other factors, on the existence and efficiency of pre-mating barriers and mate choice, the genomic architecture of differentiation, the strength of selection of loci combinations in each genomic background, and on whether or not such combinations are involved in epistatic interactions (Wu 2001).

These questions are particularly relevant for conservation biology. Gene flow and introgression are processes that have recently become more important as human-mediated species translocations and range modifications bring into contact previously allopatric species or lineages that can still hybridize (e.g., Rubidge and Taylor 2005; Ayres et al. 2008; Senn and Pemberton 2009; VonHoldt et al. 2018). Distinct views on the evolutionary consequences of hybridization often conflict (Butlin and Ritchie 2013; Sætre 2013), which may hamper management decisions (Allendorf et al. 2001; Wayne and Shaffer 2016; VonHoldt et al. 2018). On the one hand, hybridization is often regarded as having negative effects on the conservation of endangered species because introgression may decrease divergence (Seehausen et al. 2008), impede reproductive isolation (Owens and Samuk 2019) and ultimately lead to genetic swamping (e.g., Rhymer et al. 1994; Roberts et al. 2010). Other negative effects of hybridization include the waste of reproductive effort in the generation of inviable or maladapted hybrid offspring (Lepais et al. 2009; Beatty et al. 2010) or the loss of locally adapted alleles (Bourret et al. 2011). In extreme cases, these situations may contribute to extinction (Todesco et al. 2016), particularly in species that are already rare or affected by other threats. On the other hand, hybridization and introgression have been growingly recognized as a source of genetic novelty, as they may increase diversity and facilitate the acquisition of advantageous alleles through adaptive introgression (Anderson et al. 2009; Whitney et al. 2010; Becker et al. 2013; Leroy et al. 2019) or prevent inbreeding depression (Johnson et al. 2010), thus enhancing the resilience of endangered populations (Tompkins et al. 2006) and even triggering biological diversification (Lamichhaney et al. 2016). This would suggest that the effects of hybridization on endangered species may sometimes be positive in conservation frameworks.

Carbonell’s wall lizard (Podarcis carbonelli) is a small lacertid lizard belonging to the Podarcis hispanicus species complex, a monophyletic group of genetically divergent but ecologically and morphologically similar species inhabiting the Iberian Peninsula and North Africa (Harris and Sá-Sousa 2002; Kaliontzopoulou et al. 2012b; Kaliontzopoulou et al. 2011). Natural hybridization and introgression have been documented in Podarcis (Capula 1993, 2002), and the P. hispanicus complex makes no exception (Pinho et al. 2009; Renoult et al. 2009) despite ancient diversification with speciation events within the complex ranging from ~2.5 to 10 million years ago (Kaliontzopoulou et al. 2011). Most species of the P. hispanicus complex are parapatric and replace each other abruptly, a pattern that is likely mediated by the interaction between suitable ecological conditions and competitive exclusion (Caeiro-Dias et al. 2018). P. carbonelli, on the contrary, is sympatric with at least one other species of the P. hispanicus complex in most parts of its range (Podarcis vaucheri, Podarcis guadarramae, or Podarcis virescens) and meets Podarcis bocagei across a narrow contact zone (Carretero et al. 2002; Sá-Sousa and Harris 2002; Pinho et al. 2009; Caeiro-Dias et al. 2018, see Fig. 1). At this contact zone the two species are known to hybridize, but nothing is known about the level of reproductive isolation between P. carbonelli and the other species.

Fig. 1: Distribution of each Podarcis species in Iberian Peninsula, the location of sampled contact zones and reference populations analyzed.
figure 1

a Map of Europe with location of the study area boxed; b Distribution of the four Podarcis species (including both Podarcis guadarramae subspecies). Black dots represent the location of the contact zones; c Map highlighting P. carbonelli distribution; Zoom-in of the Northwest (d) and Southern (e) Iberian Peninsula showing the detailed locations of the contact zones and the reference populations analyzed (black triangles).

Remarkably, P. carbonelli has a highly fragmented distribution, throughout western Portugal, west-central Spain, and very small areas of southwestern Spain (Sá-Sousa 2008, see Fig. 1c) which has likely been shaped by a significant range reduction due to climatic changes after the last glacial maximum (Sá-Sousa 2001; Sillero and Carretero 2013), resulting in moderate levels of population substructure (Pinho et al. 2007, 2011). Several field studies have detected a trend of declining population sizes in recent years (Sillero et al. 2012, 2014), which, together with P. carbonelli’s limited geographic range and population fragmentation, led to the species categorization as endangered (EN) by IUCN (Sá-Sousa et al. 2009). Hybridization could, in theory, add to the existing threats to the species and the worrisome conservation status of P. carbonelli calls for an evaluation of the reproductive isolation as well as the amount and consequences of hybridization between the species and its congeners. To do so, we collected genetic samples from areas of syntopy and analyzed patterns of hybridization and introgression between P. carbonelli and other four co-distributed wall lizard species, using single nucleotide polymorphisms (SNPs) detected with double digest RAD sequencing (ddRAD, Peterson et al. 2012). We sought to answer the following questions: (1) does P. carbonelli hybridize with conspecific co-occurring species? (2) if hybridization occurs, does it lead to interspecific gene flow or is it restricted to F1 hybrids? (3) is the strength of reproductive barriers similar across species pairs? and (4) considering that P. carbonelli contacts with at least one other Podarcis species across most of its distribution range, can hybridization threaten its persistence, and should it be taken into account when devising conservation plans?

Material and methods

Sampling

Samples were collected between spring and autumn of 2013 in four contact zones between P. carbonelli and one of four other Podarcis species. In all of the contact zones the two species were found in strict syntopy. The sampling scheme was aimed at capturing all the individuals encountered, avoiding bias of age, sex or species. Lizards were captured with a noose, which is harmless, and kept in individual cloth bags until they were processed. All adults were identified to species-level in the field based upon a combination of coloration, head shape and behavior, but most juveniles were not possible to identify. Each sample was geo-referenced and photographed. A small tail tip was collected and immediately stored in 96% ethanol for subsequent DNA extraction. Animals were released the same day at the place of capture. We added reference individuals from nearby populations outside the contact zones (Fig. 1d, e) retrieved from the tissue collections of the CIBIO-InBio, Portugal, and EPHE-CEFE, France (BEV collection). Because of the scarcity of P. carbonelli in inland locations, we could not obtain reference populations for each of the three contact zones in Northwest Iberia and we therefore used the same P. carbonelli population (II in Fig. 1d) as reference for the contact zones 1 (with P. bocagei), 2 (with P. virescens) and 3 (with P. g. lusitanicus). These three contact zones are all geographically close (Fig. 1d) and given the low level of population differentiation reported for this part of the range (Pinho et al. 2011), this should not be an issue. Reference population II comes from an area where P. virescens is present but where we did not detect any syntopy (pers. obs.). Similarly, the southern reference population (V in Fig. 1e), used for the contact zone 4 (with P. vaucheri), comes from an area of sympatry with this species but is composed of individuals collected far from the actual contact zone and in habitats where P. vaucheri have never been observed (pers. obs., see also Discussion section). Sampling information for each contact zone and reference population is provided in Table 1 and detailed information about each sample is available in Supplementary Information Table S1.

Table 1 Number of individuals analyzed in each contact zone (CZ) and detailed information about each dataset.

The syntopy area between P. bocagei and P. carbonelli (contact zone 1) is located in a narrow coastal dune stripe about 450 meters wide with dune scrub vegetation north of the locality of Espinho (Aveiro District, Portugal, see Carretero et al. 2002). Across the area where both species were sampled in syntopy no obvious ecological segregation was identified. Podarcis virescens and P. carbonelli (contact zone 2) were found in the same walls of the castle of Santa Maria da Feira (Aveiro District, Portugal) clearly occurring in the exact same places. Although P. carbonelli is found in sympatry with the two currently recognized subspecies of P. guadarramae (P. g. guadarramae and P. g. lusitanicus; Geniez et al. 2014), only P. g. lusitanicus was used in this study (see Fig. 1 for details on geographic distributions). Specimens of syntopic P. g. lusitanicus with P. carbonelli (contact zone 3) come from a 500-meter-long area in Vale do Rossim (Serra da Estrela Natural Park, Guarda District, Portugal), an area dominated by sparse pine trees with relatively dense scrub cover with open patches and rocky outcrops. While P. g. lusitanicus was collected mostly on rocks, P. carbonelli was always found on the ground. P. vaucheri specimens in syntopy with P. carbonelli (contact zone 4) were found along a very narrow strip of a few tens of meters in the far northwest of the coastal village of Matalascañas (Huelva Province, Spain). Both species inhabited distinct micro-habitats, P. vaucheri human-made structures and P. carbonelli only semi-natural dune environments with pine trees and scrub vegetation. Individuals were collected along a 500 meters stripe comprising both environment types.

RAD sequencing, data filtering and SNP calling

We obtained ddRAD sequence data using modifications to protocols from Parchman et al. (2012), Peterson et al. (2012) and Purcell et al. (2014). The complete protocol is described by Brelsford et al. (2016). The main steps were the digestion of genomic DNA with the restriction enzymes SbfI and MseI, ligation of barcoded adapters to restriction sites, amplification of each individual sample in four independent separate PCR reactions, pooling of all PCR products and fragment selection between 400 and 500 bp using a 2.5% agarose gel. Samples used in this work were included in two separate libraries constructed following the same protocol. Both libraries included other samples not used in this study. One had a total of 329 samples and included samples from contact zone 1. This library was sequenced on eight Illumina® (San Diego, CA, USA) HiSeq 2000 lanes in the Lausanne Genetic Technology Facility (Lausanne, Switzerland), with single-end 100 bp reads. The datasets for contact zones 2, 3, and 4 were obtained from a library containing a total of 665 samples that was sequenced on two Illumina® HiSeq 2000 lanes at the Lausanne Genetic Technology Facility (Lausanne, Switzerland) and on four Illumina® HiSeq 1500 lanes at the CIBIO Next Generation Sequencing Platform (Vairão, Portugal), also with single-end 100 bp reads.

We demultiplexed individual raw reads using the process_radtags module of Stacks version 2.2 (Catchen 2013) allowing one mismatch per barcode, removing reads containing adapter sequence, reads with uncalled bases and reads that failed the Illumina® ‘chastity’ filter. We then tested the optimal de novo assembly parameters for our data set following the protocol described in Rochette and Catchen (2017) adapted to Stacks version 2.2, prior to the final de novo read alignment. For this test we used the samples of one contact zone only (P. carbonelli × P. vaucheri), running consecutively ustacks (build loci), cstacks (create a catalog of loci), sstacks (match individual samples against the catalog), tsv2bam (transpose data), gstacks (align each read to a locus and call SNPs) and populations (SNP filtering and output data) units. We performed separate runs varying the number of mismatches allowed between reads within individuals (M) and between individuals (n). As suggested by Rochette and Catchen (2017), we varied M and n between 1 and 9 and kept M = n, while the minimum depth of coverage to create a stack (m) was kept constant at 3, the ustacks default. Both in these prior tests and in the final analyses we used a bounded SNP model (--model_type bounded option) in ustacks, with an upper bound for the error rate of 0.1. Other than the mentioned changes to test the parameter values, we used default settings for all other steps of the pipeline. After completion of the pipeline for the 9 different sets of parameter values, we analyzed the numbers of loci shared by at least 80% of the samples and the distribution of variable sites within loci for the range of tested values. Based on this analysis we chose to retain M = n = 5 for the final de novo read alignment and SNP calling procedure, which was conducted separately for each contact zone and their respective reference populations.

The populations module from Stacks was used to filter out potential paralog loci by removing the resulting variants with higher than 0.7 maximum observed heterozygosity across all samples. Subsequently, vcftools version 0.1.15 (Danecek et al. 2011) was used to discard SNPs with depth coverage less than 8, SNPs with alleles with minimum frequency lower than 0.05, and present in less than 80% of the loci for each dataset. We then performed two additional filtering steps using a custom Python script (available at https://github.com/catpinho/filter_RADseq_data): i) we removed loci exhibiting more than 8 SNPs per RAD tag; ii) we kept only one SNP per locus, choosing the SNPs maximizing frequency differences between species. For each contact zone we then kept two separate datasets: the complete dataset with all filtered SNPs; and the diagnostic dataset including only loci with fixed differences between reference populations but excluding the loci with private alleles, i.e., loci with alleles that were not found in the contact zone. In each dataset and subset (see details below), individuals with more than 35% of missing data for each locus were discarded. Library construction, sequencing, demultiplexing and data filtering (except data subsets) were repeated independently for ~ 6% of the samples to evaluate replicability.

Genetic characterization of the contact zones and admixture analysis

The genomic variability among individuals was visualized by performing principal component analyses (PCAs) on the complete dataset for each contact zone separately using the adegenet R package version 2.0.1 (Jombart 2008; Jombart and Ahmed 2011). We then used Structure version 2.3.4 (Pritchard et al. 2000) to evaluate the proportion (Q) of each individual’s genome originating from each of the parental species and its 90% posterior probability intervals (CI), including the reference populations. For clarity we will refer to the proportion of P. carbonelli as QC in each contact zone, as it is the common element to all. The proportion of assignment of each individual to the other species in a particular contact zone is 1 – QC. The distributions of Qc scores in 10 classes of equal size for each contact zone are also reported.

The number of loci is variable between contact zones and this influences the ability of Structure to estimate the CI (confirmed by comparing preliminary Structure results with different number of loci – results not shown). Thus, the proportion of admixed/parental individuals across contact zones are not comparable based on Qc CI if we use largely different number of SNPs. We ran the final Structure analysis with the same number of SNP’s for all contact zones, to make the analysis across contact zones more comparable. We did this by selecting a random subset of 727 SNPs, the same number of SNPs as in the contact zone with the lowest number of markers (contact zone 3). Comparison of preliminary runs with five different random subsets for each of the three contact zones showed no substantial disparities (results not shown). We thus chose randomly one of the five subsets to present the results. We ran Structure for K = 2 for each contact zone, as we were only interested in detecting admixture between each pair of species, using the complete datasets for each of the contact zones. Runs were performed five times independently with one million repetitions and a burn-in of 250 000. We used a model assuming admixture and independent allelic frequencies using a prior of individual ancestry of 0.5, as recommended for unbalanced sample sizes by Wang (2017). The 90% Qc CI were estimated. Structure Harvester web version 0.6.94 (Earl and VonHoldt 2012) was used to visualize the likelihood of the data for each contact zone. The run that maximized the likelihood was retained and is presented in the Results. We opted for this approach, instead of concatenating the five runs for each dataset, as CI estimations are used in further analyses.

Individuals from reference populations were then used to calculate the hybrid index (HI) of each individual from the admixed population using the R package Introgress version 1.2.3 (Gompert, Alex Buerkle 2010). P. carbonelli parental individuals were set to have an HI of 1, and the other species were set to an HI of 0 for each contact zone. The proportion of loci in an admixed individual’s genome with alleles inherited from both parental species, i.e., interspecific heterozygosity (He), was calculated for each admixed individual using Introgress. This method for calculating interspecific heterozygosity assumes that parental allele frequencies are known. Therefore, the same individuals used as parentals for HI estimation were also used to calculate He. A triangle plot can represent the relationship between the HI and He (Fitzpatrick 2012). He ranges from 0 to 1, where values of 1 are interpreted as “perfect” F1 hybrids and values lower than 1 indicate later generation hybrids that have either backcrossed with the parentals (overlapping the lines of the triangle plot) and/or with other hybrids (below the triangle plot lines; Fitzpatrick 2012). This analysis requires that loci are fully diagnostic between species or at least exhibit large differences for a good approximation. Here we performed this analysis by restricting the SNPs just to loci with fixed differences between reference populations and excluding loci with private alleles from each population (diagnostic dataset).

To assess whether or not the strength of reproductive isolation between all the species pairs is similar, we tested if the overall genotypic hybrid composition was effectively distinct across contact zones, using Fisher’s exact tests. We tested the global independence of overall admixed genotype composition among the four contact zones, against the alternative hypothesis that the proportion of admixed genotypes is similar across contact zones, by evaluating the frequency of occurrence of two categories: “parental genotypes” (individuals with 90% CI of Qc overlapping 0 or 1) and “admixed genotypes” (individuals with 90% CI of Qc non-overlapping 0 or 1). Additionally, we tested if recent hybridization events have different contributions to each contact zone. To do so we performed a similar global Fisher’s exact test, but in this case, we considered the frequencies of recently admixed genotypes (QC between 0.4 and 0.6) against parental genotypes and later generation backcrosses (genotypes with Qc ≥ 0.6 and Qc ≤ 0.4). In the case that the global test was statistically significant we further performed pairwise Fisher’s exact tests to identify the contact zone pairs that present differences in genotype composition. All tests were performed with R package stats version 3.3.3 (R Core Team 2017) and for pairwise tests we applied a Bonferroni correction for multiple comparisons.

Results

Data filtering and SNP calling

Characteristics of the final datasets (complete and diagnostic), after removing loci with depth coverage <8 and missing data >20% and removing individuals with more than 35% of missing data are summarized in Table 1. Across complete and diagnostic datasets, mean coverage by individuals ranged from 28 to 47 and by loci from 28 to 44 (see detailed results in Supplementary Information Table S2). The analysis of replicate samples showed high levels (>99%) of multilocus genotype replicability.

Podarcis bocagei × Podarcis carbonelli

The PCA analysis based on the 19233 SNPs separated two groups: PC1 explained 46.1% of the variance and separated P. bocagei from P. carbonelli, broadly matching the field identification of each individual (Fig. 2a) but with several individuals in-between. PC2 explained 2.8% of the variance and is due to variation between P. carbonelli populations. We found 7.8% of P. bocagei and 73.9% of P. carbonelli “parental genotypes” while 18.2% of the individuals were identified as “admixed genotypes” (according to the Structure CI criterion; Fig. 3a). Most individuals had Qc values close to the extreme values (Supplementary Information Fig. S1a) as expected in bimodal hybrid zones. When plotting He against the HI (Fig. 3b) the results were concordant with Structure and PCA analysis, where most individuals showed at least some degree of admixture (He ranges between 0.03 and 0.82) and several parental individuals had residual levels of admixture indicating the existence of gene flow between the two species.

Fig. 2: Principal Component Analysis (PCA) of SNP variation.
figure 2

PCA for a P. bocagei (red)×P. carbonelli (blue); b P. virescens (yellow)×P. carbonelli; c P. guadarramae lusitanicus (dark green)×P. carbonelli; and d P. vaucheri (light green)×P. carbonelli contact zones; the variation explained by each axis (PC) is represented as percentage; circles represent individuals in syntopy with Structure 90% CI overlapping 0 or 1, crosses denote individuals with Structure 90% CI non-overlapping 0 or 1, squares the P. carbonelli reference population and triangles the reference population of the other species; individuals were colored after Structure results; e schematic representation of phylogenetic relationships of all Podarcis hispanicus complex lineages highlighting the divergence times between species from contact zones analyzed in this study (see Kaliontzopoulou et al. 2011).

Fig. 3: Analysis of genetic variability and admixture performed with SNP data.
figure 3

Results for a and e P. bocagei (red)×P. carbonelli (blue); b and f P. virescens (yellow)×P. carbonelli; c and g P. g. lusitanicus (dark green)×P. carbonelli; and d and h P. vaucheri (light green)×P. carbonelli contact zones. Plots on the left (ad) show the results from individual multilocus genotype clustering analysis performed with Structure based on the runs with the 727 loci datasets; each individual is represented as a horizontal line partitioned into the 2 colored segments (K=2) and each segment length is proportional to the assignment to one of the two species. Horizontal black lines delimit individuals in contact zone (in the center) from the reference populations and between the horizontal dashed lines are the individuals whose 90% CI (back segments in each individual) do not overlap 0 or 1. Plots on the right (eh) show distribution of individual hybrid index (HI) and interspecific heterozygosity (He) in the contact zones based on diagnostic loci between reference populations; in each contact zone P. carbonelli reference individuals were set to have a HI of 1, and the other species was set to an HI of 0.

Podarcis virescens × Podarcis carbonelli

The PCA showed two clusters of individuals based on the 5207 SNPs analyzed (Fig. 2b). PC1 explained 51.8% of the variation, separating the two species while PC2 explained 2% of the variation and captured mostly intraspecific variability in P. virescens. The assignment analysis (Fig. 3c) and the distribution of Qc values (Supplementary Information Fig. S1b) were concordant with the results of the PCA. None of the individuals fell into the category of “admixed genotypes” and only 10.5% of the individuals had Qc values slightly deviating from 0 or 1 suggesting a residual level of admixture. The HI distributions were concordant with previous results and the estimations of He were close to zero (Fig. 3d). Therefore, no recent hybridization was detected in this contact zone.

Podarcis guadarramae lusitanicus × Podarcis carbonelli

The PCA revealed two main groups of samples based on 727 SNPs (48.9% of the variance explained by PC1) showing that the two species in the contact zone are clearly separable. PC2 explains 2.9% of the variation and captures mostly the intraspecific variability within P. g. lusitanicus. Several individuals were in an intermediate position between both groups along PC1, indicating recent hybridization (Fig. 2c). Based on Structure results 11.5% of the individuals with QC between 0.07 and 0.93 were considered “admixed genotypes”, confirming some degree of admixture, while 67.2% were assigned to P. g. lusitanicus and 21.3% to P. carbonelli (Fig. 3e), showing that most individuals had a Qc value close to zero or one (Supplementary Information Fig. S1c). Estimates of He were low for most individuals with HI close to 0 and 1 (Fig. 3f). Two individuals perfectly matched the predictions for F1 hybrids and one individual with QC = 0.52 had HI = 0.6 and He = 0.79, suggesting a backcross between an individual with mixed ancestry and a P. carbonelli “parental genotype”.

Podarcis vaucheri × Podarcis carbonelli

The first axis (PC1) of the PCA based on 3549 SNPs explained 24.6% of the variance (Fig. 2d) and, separating P. vaucheri from P. carbonelli, but with several individuals between the two groups. A large amount of variation among P. vaucheri individuals is obvious along PC2 (3.7% of the variance). Structure results showed that 7.9% of the individuals were assigned to P. carbonelli “parental genotypes” and 50.8% to P. vaucheri (Fig. 3g) but a large proportion of individuals (41.3%) showed some degree of admixture (Supplementary Information Fig. S1d), highlighting the permeability of the genomes to gene flow. Furthermore, the presence of individuals with high He revealed the occurrence of contemporary admixture (Fig. 3h). Because we used parental individuals from outside of the contact zone (virtually “pure”) to estimate heterozygosity and several parental individuals in the population of contact had residual levels of admixture (Fig. 3g), “perfect” F1 hybrids were not expected to be found, as in the contact zone 1. Surprisingly, two individuals in the contact zone 4 had He = 0 but HI ~ 0.35 and another individual had He = 0.02 with HI = 0.54 (Fig. 3h). Such results may be explained by poor estimates of heterozygosity due to the lower average coverage for these individuals compared to the others (data not shown). In this contact zone there were only four individuals, including these three, with depth of coverage lower or equal to 20x.

Comparison between contact zones

The total proportion of admixed genotypes, i.e., genotypes whose Qc 90% CIs do not overlap 0 or 1, ranged between 0 and 41% across different contact zones (Fig. 4a). In contact zone 4, the proportion of admixed individuals (41%) was almost half of the total genotypes. The global test revealed statistical differences among contact zones (Supplementary Information Table S3). When we compared the proportions of admixed genotypes between all contact zones, five comparisons were significant and we could reject the null hypothesis that the proportions are similar (p value < 0.008, the corrected significance level after employing the Bonferroni correction). The comparison between contact zones 1 and 3 (with 25% and 13% of admixed genotypes, respectively) did not differ statistically from each other (Supplementary Information Table S3).

Fig. 4: Overall genotypic composition for each contact zone.
figure 4

a Proportion of parental genotypes (90% CI of QC that overlap 0 or 1), represented in gray, and hybrid genotypes (90% CI of QC that do not overlap 0 or 1), represented in black, for each contact zone. QC results from the clustering analysis, as in Fig. 2. Fisher exact test statistically significant (p=8.29×10−9). b Proportion of parental and later generation hybrid genotypes resulting from recurrent backcrosses (QC≥0.6 and ≤ 0.4; gray) and earlier generation hybrid genotypes, (QC<0.6 and QC>0.4; black) in each contact zone. Fisher exact test not statistically significant (p=0.12). CZ1: P. bocagei×P. carbonelli, CZ2: P. virescens×P. carbonelli, CZ3: P. guadarramae lusitanicus×P. carbonelli, CZ4: P. vaucheri×P. carbonelli.

Our results show that in three out of the four contact zones studied recent hybridization events (identified by the presence of individuals showing QC between 0.4 and 0.6 and high interspecific heterozygosity) are relatively rare; evidence for recent admixture was completely absent in the other contact zone (Fig. 4b). Recently admixed genotypes (Qc between 0.4 and 0.6) made up between 0 and 8% of the admixed genotypes across the analyzed contact zones (Fig. 4b). When we compared the number of recent hybrid genotypes vs. parental and later-generation admixed genotypes, the differences across contact zones were not significant and thus we could not reject the null hypothesis that the proportions of recently admixed genotypes in each contact zone were similar (Supplementary Information Table S3).

Discussion

Our results provide evidence that P. carbonelli, a species co-occurring across most of its distribution range with at least another congener, can maintain its genetic identity in extensive sympatry in spite of incomplete reproductive isolation with the species it coexists with: in three of the four contact zones investigated we found that recent admixture has occurred, and that it is not restricted to F1 hybrids. The markedly bimodal distribution of genotypes in all contact zones suggests strong barriers to gene flow, however, even though the strength of reproductive barriers varies between species pairs. Interspecific gene flow is thus an important feature in the evolutionary history of P. carbonelli.

Species validity and persistence in the presence of gene flow

The four species studied here are indubitably distinct, valid species under all current species concepts, with divergence times based on mtDNA ranging between 3.9 Mya (P. virescens and P. carbonelli) and 10.1 Mya (P. vaucheri and P. carbonelli; Fig. 2e; also see Kaliontzopoulou et al. 2011), most of them with distinct morphology (Kaliontzopoulou et al. 2012b) and realized climatic niches (Caeiro-Dias et al. 2018). They also maintain highly distinct nuclear genomes in spite of multiple opportunities for hybridization, as evidenced by the PCAs on SNP data that reveal much higher levels of between- than within-species variation. Nevertheless, we found evidence for recent hybridization in three contact zones and our data suggest later generation hybrids and/or signal from older introgression events in the four contact zones (Fig. 3), indicating interspecific gene flow and a lack of complete reproductive isolation. As P. carbonelli is sympatric across all of its distributional range with other Podarcis species, these results suggest that hybridization is likely to occur across numerous populations of this species.

The overall deep divergence between our species (Kaliontzopoulou et al. 2011) and the strong bimodality of their hybrid zones in spite of the presence of introgressed individuals, other than first generation hybrids, is consistent with the notion that these species are in the late stages of speciation, where speciation is already well advanced but reproductive isolation is still incomplete. The bimodality observed for the hybrid zones is likely not the result of the recent establishment of these contact zones (i.e., few generations scale), as evidenced by the presence of a large proportion of individuals with some degree of admixture in contact zones 1 and 4 (Fig. 3b, h) and by the presence of individuals with residual levels of admixture in all four contact zones. In addition, hybridization is known to have occurred between P. bocagei and P. carbonelli (contact zone 1) for several generations (Pinho et al. 2009) and all the syntopic populations sampled for this study were known prior to sampling, sometimes for many years (pers. obs.).

The occurrence of regular gene flow between sympatric species at levels that do not reverse species divergence have been increasingly reported (Steeves et al. 2010; Palma‐Silva et al. 2011; McIntosh et al. 2014). Our results support the notion that introgression may be a regular component in the evolutionary history among the species of the P. hispanicus complex (Pinho et al. 2008; Renoult et al. 2009) and an integral part of the speciation process in general, not only in its initial stages but also in later stages.

Intrinsic and extrinsic factors potentially influencing hybridization dynamics

Regular interspecific gene flow, even when rare, leads to extensive admixture over time and several generations of rare introgression would quickly remove all parental genotypes in the absence of post-zygotic isolation (Arnold et al. 1999). Therefore, post-zygotic barriers, such as high mortality or reduced fertility of highly admixed individuals, must be acting in these contact zones. In contact zone 1, hybrids do not have lower fertility than either parental species (Pinho et al. 2009) but the fitness of their progeny has not been evaluated. The rarity of F1 hybrids relative to pure genotypes may also be the result of pre-zygotic barriers preventing gene flow. For example, in half contact zones studied here, syntopy is accompanied by some degree of ecological segregation, also previously verified between P. bocagei and P. g. lusitanicus (Kaliontzopoulou et al. 2012a; Gomes et al. 2016). Chemically-mediated candidate species recognition systems were also found between several species of the complex (Barbosa et al. 2005, 2006; Gabirot et al. 2010, 2012), but actual validation of their effectiveness in natural conditions is entirely lacking. Deciphering the relative contribution of these or other pre- and post-zygotic barriers would require finer experimental dissection of each contact zone, but it is clear that post-zygotic barriers are crucial in maintaining P. carbonelli as a distinct species.

Why do levels of interspecific gene flow vary across contact zones? One would expect to detect increased hybridization levels between species with closer evolutionary relationships, as genetic divergence is positively associated with reproductive isolation (Pereira et al. 2011; Sánchez‐Guillén et al. 2014; Harvey et al. 2017). Remarkably, a higher level of admixture between P. carbonelli was found with the most phylogenetically divergent species (P. vaucheri) while we found almost no admixture with the most closely related species (P. virescens; Figs. 2 and 3; see also Kaliontzopoulou et al. 2011 for phylogenetic relationships). Consequently, the level of reproductive isolation does not seem to be explained by the degree of divergence in this system. The amount of reproductive isolation between P. carbonelli and the species it coexists with seems to depend more on other factors rather than on genetic divergence. In other systems, ecology (Funk et al. 2006) or phenotype (Stelkens and Seehausen 2009) were found to influence reproductive isolation.

While genetic divergence does not seem to determine reproductive isolation, the extent and duration of sympatry appears associated with the level of admixture in our system. The two species with the most restricted introgression are P. virescens and P. carbonelli (contact zone 2), which are the species pair with the largest area of total overlap across their distribution areas. On the contrary, P. bocagei and P. carbonelli (contact zone 1) are allopatric and only come into contact at a very narrow hybrid zone, where admixture is high. The two species with the highest level of admixture are P. vaucheri and P. carbonelli (contact zone 4), which are only marginally sympatric and are naturally segregated by their distinct use of microhabitats. In contact zone 4, P. vaucheri was observed exclusively in human-made structures while P. carbonelli inhabited semi-natural environments outside the village and the syntopy between them was restricted to the edge of the town. In the Doñana area (where Matalascañas is located), P. vaucheri seems to be restricted to man-made structures such as buildings, bridges, concrete walls, and was never observed in the sandy semi-natural habitats inhabited by P. carbonelli (pers. obs). Given that Matalascañas is a village where development for tourism is relatively recent, occurring mostly after 1972 (IECA 2014), and that elsewhere in the Doñana area we did not detect areas of syntopy, the two species may have only been in contact for a short time (several tens of generations). Human-mediated habitat modifications have been widely identified as factors promoting geographic contact between species and thus contributing to hybridization when pre-mating barriers are incomplete (Levin et al. 1996; Allendorf et al. 2001). Given the apparent negative correlation between the geographic extent, duration of sympatry and the levels of hybridization, our results are in line with the idea that reinforcement is important for incompletely isolated species to persist in sympatry (Bímová et al. 2011; Yukilevich 2012; Hudson and Price 2014). Further research needs to be done to formally test this hypothesis.

Hybridization may increase as a result of an imbalance in species frequencies, as the rarity of one species promotes opportunities for interspecific mating (Burgess et al. 2005; Lepais et al. 2009; Beatty et al. 2010). We detected hybridization in the three contact zones where the proportions of both species are highly imbalanced. The contact zone with the lowest interspecific gene flow (contact zone 2) had, apparently, a more equally balanced abundance of the two species in contact (Fig. 2b). This hypothesis remains to be tested but if it is valid the declining trend of P. carbonelli (Sillero et al. 2012, 2014) may make it more susceptible to hybridization when it comes into contact with more abundant species.

Outcomes of hybridization for P. carbonelli

Hybridization has long been emphasized as a threat to species, endangered (Milián-García et al. 2015; Vuillaume et al. 2015) or not (Rhymer and Simberloff 1996; Allendorf et al. 2001; Wolf et al. 2001; Wayne and Shaffer 2016). In our system, the level of reproductive isolation detected between P. carbonelli and its sympatric congeners seems strong enough to prevent global genetic swamping (see Steeves et al. 2010 for a similar conclusion in the critically endangered bird Himantopus novaezelandiae). However interspecific mating per se may have negative effects, such as wasted reproductive effort in small populations that coexist with more abundant congeneric species (Burgess et al. 2005; Lepais et al. 2009; Beatty et al. 2010). This is particularly relevant for the southern-most P. carbonelli population that came into contact with P. vaucheri (contact zone 4), more recently as well as, some of the small fragmented populations in the north that come into contact with P. g. lusitanicus. Alternatively, interspecific gene flow may be a source of favorable alleles or allelic combinations (Anderson et al. 2009; Whitney et al. 2010; Becker et al. 2013; Leroy et al. 2019) and a potentially important contributor to the origins of evolutionary novelty. Our data does not reveal if alleles crossed the species boundaries beyond the contact zones but given the variable levels of incomplete reproductive isolation between P. carbonelli and other co-occurring Podarcis species this is a topic worth studying in the near future.

At this stage, making detailed recommendations for conservation would be premature, but monitoring the amount of hybridization should certainly be part of any forthcoming management plan, particularly for the Andalusian P. carbonelli population that has recently been affected following human-mediated habitat changes. Most management plans now include a research program, given our findings, we highlight the need for these research programs to: (1) extend the evaluation of hybridization and interspecific gene flow across the P. carbonelli distribution; (2) investigate the extent of geographic and genomic introgression; (3) correlate the levels of interspecific gene flow with intrinsic and extrinsic factors that can modulate it; (4) look for traces of adaptive introgression from other members of the P. hispanicus complex. This will allow us to better understand which factors are involved in reproductive isolation as well as the evolutionary and conservation consequences of hybridization.