Landscape resistance constrains hybridization across contact zones in a reproductively and morphologically polymorphic salamander

Explicitly accounting for phenotypic differentiation together with environmental heterogeneity is crucial to understand the evolutionary dynamics in hybrid zones. Species showing intra-specific variation in phenotypic traits that meet across environmentally heterogeneous regions constitute excellent natural settings to study the role of phenotypic differentiation and environmental factors in shaping the spatial extent and patterns of admixture in hybrid zones. We studied three environmentally distinct contact zones where morphologically and reproductively divergent subspecies of Salamandra salamandra co-occur: the pueriparous S. s. bernardezi that is mostly parapatric to its three larviparous subspecies neighbours. We used a landscape genetics framework to: (i) characterise the spatial location and extent of each contact zone; (ii) assess patterns of introgression and hybridization between subspecies pairs; and (iii) examine the role of environmental heterogeneity in the evolutionary dynamics of hybrid zones. We found high levels of introgression between parity modes, and between distinct phenotypes, thus demonstrating the evolution to pueriparity alone or morphological differentiation do not lead to reproductive isolation between these highly divergent S. salamandra morphotypes. However, we detected substantial variation in patterns of hybridization across contact zones, being lower in the contact zone located on a topographically complex area. We highlight the importance of accounting for spatial environmental heterogeneity when studying evolutionary dynamics of hybrid zones.


Material and methods
Study system. Salamandra  www.nature.com/scientificreports/ Despite being phenotypically different in northern Spain, these subspecies inhabit similar habitats, particularly, deciduous woodlands (Quercus spp. and Fagus spp.) with nearby aquatic systems (streams or ponds) where larviparous females can give birth to aquatic larvae 27,39 . They can also be found in other terrestrial habitats, such as scrublands and coniferous forests, alpine scrublands, and more rarely in eucalyptus (Eucalyptus spp.) plantations 39,40 . Previous studies showed agricultural and urban areas comprise strong barriers to dispersal and gene flow in both larviparous and pueriparous fire salamanders 38,41,42 . However, pueriparous salamanders can also survive in harsh environments where water bodies are absent 34,42 , despite water courses (streams and rivers) comprising semi-permeable obstacles to dispersal due to their fully terrestrial life style 38 . Study areas and sampling design. We carried out individual-based sampling (1-3 samples per locality) within each contact zone because this sampling approach was shown to provide greater statistical power in detecting landscape genetic patterns compared to population-based sampling 43 . We selected three heterogeneous study areas (i.e. different levels of environmental and habitat variability) where different subspecies, mor- Upper left: distribution of the fire salamander, Salamandra salamandra, with the study area denoted with an inset (A). Upper right: mtDNA and nuDNA trees to show subspecies relationships and approximate dates of major splits. The mtDNA tree was generated with cytochrome b sequences from this study while the nuDNA tree is adapted from the topology obtained with phylogenomic data in Burgon et al. 30 . Bottom: sampling sites across each of the studied contact zone: North Galicia (NG), Cantabrian Mountains (CM) and Asturias-Cantabria border (AC). Adult salamander images represent the two morphotypes, while the reproductive mode of each studied subspecies (larviparous, pueriparous or both) is represented on the polygons depicting each subspecies' distribution. The figure was created with QGIS software (version 3.2; https:// qgis. org/ en/ site/). The satellite image was retrieved from Google Earth using the QuickMapServices plugin.
photypes and the two distinct reproductive modes meet, thus covering a large portion of the secondary contact zones between S. s. bernardezi and its neighbouring lineages (Fig. 1). We performed sampling during rainy nights from November 2013 until May 2015 (Supplementary Table S1). We sampled mostly adult individuals (toe clip), but on a few occasions we also sampled larvae (tail clip). Tissue samples were stored in absolute ethanol and all individuals were released in their sampling points. At the western range of S. s. bernardezi, we collected samples of 129 salamanders from 86 localities following a west-east axis along northern Galicia (NG hereafter), where intermixed phenotypes between S. s. bernardezi and S. s. gallaica were previously observed 32,35 (Fig. 2). The landscape in NG is relatively flat (elevation ranges from 0 m up to 1016 m a.s.l. using the minimum extent of the samples, Supplementary Fig. S1) and is extensively fragmented by agricultural practices and Eucalyptus spp. plantations. Remnant populations of fire salamanders in NG occur in suitable forested and scrubland patches in lowland and hilly areas. At the southern range of S. s. bernardezi, we also followed a longitudinal sampling along the Cantabrian Mountain range (CM hereafter), where we collected samples of 73 individuals from 38 localities on both sides of the mountains, representing S. s. bernardezi and S. s. bejarae from the northern and southern slopes, respectively (Fig. 3). The landscape in this study plot is characterized by high elevation, steep terrains, and . Samples in the barplots are arranged from west to east. Genetic assignment to S. s. bernardezi or S. s. gallaica is colour-coded in blue and red, respectively. Black bars between structure plots indicate individuals with mitonuclear discordance (19%). The figure was created with QGIS software (version 3.2; https:// qgis. org/ en/ site/). www.nature.com/scientificreports/ high topographic complexity (elevation ranging from 179 m up to 2343 m above sea level (a.s.l.), Supplementary  Fig. S1), and is highly suitable for the species due to the native forests and scrublands that dominate the region with sparse agricultural patches. At the eastern range of S. s. bernardezi, we obtained samples of 123 individuals from 58 localities across the Asturias-Cantabria border (AC hereafter, Fig. 4). We did not extend the sampling further south to avoid sampling S. s. bejarae (Fig. 1). Despite the high elevation in this region (elevation ranging from 0 m up to 2394 m a.s.l., Supplementary Fig. S1), the mountains follow a west-east direction and thus, no major topographic barriers exist between these two subspecies, which frequently occur along coastal valleys. The AC area is composed of natural forests and scrublands, with eucalypt plantations, agricultural areas and human infrastructures (e.g. villages, roads) being predominant on the coast. Fieldwork for obtaining tissue samples was done with the corresponding permits from the regional administrations ( www.nature.com/scientificreports/ Molecular markers and laboratory procedures. We extracted genomic DNA using the Genomic DNA Tissue Kit (EasySpin), following the manufacturer's protocol, and verified both the quantity and quality of extracted products in a 0.8% agarose gel. We used the DNA extract as template for polymerase chain reactions (PCR) to amplify and sequence one mitochondrial (mtDNA) fragment (ca. 1100 bp) of the cytochrome b and adjacent tRNAs (hereafter cytb). We used the primers Glu14100L and Pro15500H 44 to amplify cytb following a previously described protocol 28 . We outsourced DNA sequencing to Genewiz Inc. (Leipzig, Germany), and inspected and aligned resulting chromatograms using GENEIOUS version 11.1.4 (http:// www. genei ous. com). Extracted products were also used to amplify and genotype 15 microsatellite loci (SST-A6-I, SST-A6-II, SST-B11, SST-C3, and SST-G9 45 ; SalE14, Sal29, SalE12, SalE7, SalE5, SalE2, SalE06, Sal3, and SalE8 46 ) distributed in five optimized multiplexes (see 47 for details about the protocol optimisation and genotyping). Our individual-based approach, together with some degree of continuous phenotypic variation in the contact zones, prevent us from grouping samples into a priori populations or genetic units, and thus from assessing accurately Hardy-Weinberg equilibrium (HWE) and linkage equilibrium (LE). However, previous phylogeographic and population genetic studies using the same set of markers 33,38,41,42,[48][49][50] on Iberian populations of S. salamandra support the validity of these markers for population and landscape genetic analyses.
Phylogenetic lineages. We used the cytb sequences to perform coalescent-based Bayesian analyses in BEAST version 1.8.4 51 and identify the mitochondrial lineage (mtDNA) of each sample, which was then used to estimate the extent of cyto-nuclear discordance across each contact zone. However, we note that mtDNA phylogenetic tree does not represent the true phylogenetic relationships among S. salamandra subspecies. In particular, S. s. fastuosa (with embedded S. s. gigliolii) was placed as the sister lineage to S. s. bernardezi and formed a well differentiated clade within S. salamandra according to phylogenomic data 30 . Yet, both S. s. gallaica and S. s. bejarae fall within the second clade and are sister lineages in both mtDNA and phylogenomic trees. We selected the optimal nucleotide substitution model (TrN) with JMODELTEST version 2.1.4 52 , under the Bayesian information criterion (BIC). To estimate the time to the most recent common ancestor (TMRCA) of mtDNA lineages we used both a prior substitution rate and fossil calibrations. We set the divergence rate for cytb estimated in a 2.7% Myr for Mediterranean amphibians 53 . We used a normal distribution with a mean of 0.0134 and SD of 0.0009. We also used two calibration points (divergence date estimates) based on the split between Chioglossa and Mertensiella (mean: 19.4; lower and upper 95% confidence intervals (CIs) of 14.2 and 27.5, respectively), and the split between Lyciasalamandra and Salamandra (mean: 17.59; 95% CIs 11.24 and 24.82). These estimates were obtained from a two-step protocol based on a combination of primary fossil calibrations and secondary calibrations from a multilocus nuclear tree on a multilocus mitochondrial tree 53 . However, we avoided the use of internal nodes provided within Salamandra due to some incorrect phylogenetic relationships at both interspecific (e.g. S. algira) and intra-subspecific (e.g. S. s. morenica) levels 30 . We obtained cytb sequences from Gen-Bank for Chioglossa lustitanica (AF329314), Mertensiella caucasica (EU880319), Lyciasalamandra flavimembris (EU880318) and Lyciasalamandra atifi (AF154053), which were added to our cytb alignment. We performed three independent runs using an uncorrelated relaxed clock and a constant population size model as the coalescent tree prior, with a total of 100 million generations (burn-in: 10%). We combined tree files of all runs using the software LOGCOMBINER version 1.7.5, and verified parameter convergence by examining the effective sample sizes (ESSs) in TRACER version 1.6. We obtained a maximum clade credibility summary tree with Bayesian posterior probabilities (BPP) for each node using TREEANNOTATOR version 1.8.4, and edited the resulted tree in FIGTREE version 1.4.3 (http:// tree. bio. ed. ac. uk/ softw are/ figtr ee). We ran BEAST analyses in the CIPRES Science Gateway 54 . We estimated genetic distances (uncorrected p-distances) between subspecies in GENEIOUS 11.1.4 (www. genei ous. com).
Genetic structure and hybridization analysis. We first used STRU CTU RE 2.3.4 55 to assess the individual genetic membership coefficient (q-value) for each subspecies in each contact zone. Because we were not interested in determining the genetic structure of each study area but rather assigning individuals to one of the two co-occurring subspecies, we ran STRU CTU RE for K = 2 (two clusters). We ran 10 replicates and used the correlated allele frequency and admixture models, without location prior, and with a burn-in period of 100,000 iterations followed by 1,000,000 iterations. We combined the results from the 10 replicate runs using CLUMPP 56 . To complement STRU CTU RE, we used the spatially explicit method of spatial Principal Component Analyses (sPCA) 57 , which does not assume HWE or LE expectations. To identify if the sPCA scores were able to assess global (patches and clines where neighbours tend to be similar) and local (strong genetic differences between neighbours) structure, we used a Delaunay triangulation connection network and two separate Monte Carlo tests with 100 permutations.
Because each individual has two q-values (one for each of K=2 clusters), we considered individuals as admixed when showing low levels of coefficient membership for either of the two clusters (0.1 < q-value < 0.9). Then, we used these potential hybrids to estimate the probability of assignment (Z) of early-generation hybrid or backcrossed individuals between subspecies within each contact zone in NEWHYBRIDS v.1.1 58,59 . We used six classes in each analysis: pure individuals of each subspecies (parental classes), first generation hybrid (F1), second-generation hybrid (F2), and first-generation backcrosses to each subspecies. To increase the power to detect hybrids, we also assigned all individuals with > 90% (q-value > 0.9) estimated ancestry in STRU CTU RE from a single subspecies to the appropriate pure parental category using the 'z' option. We performed three independent runs per contact zone and we set 1 million generations with a burn-in of 10% per run. We used two random starting values in each NEWHYBRIDS run, and the "Jeffreys-like" prior for the estimates of θ (i.e., allele frequencies) and π (i.e., mixing proportions). We used a posterior probability threshold of P ≥ 0.7 60 to assign potential hybrids identified in STRU CTU RE to each of the six classes. www.nature.com/scientificreports/ Landscape genetics analysis. The workflow for the landscape genetic analyses is summarised in Fig. 5 and detailed as follows.
Genetic distance matrices. The use of individual-based distances is recommended to avoid potential violations of the assumption of discrete populations that can lead to erroneous estimates of genetic parameters 61 . We used COANCESTRY 62 to calculate pairwise relatedness matrices between individuals in each studied contact zone. First, we used the relatedness coefficient from Queller & Goodnight (1989) (R QG ) 63 , which does not assume Hardy-Weinberg equilibrium. We also used the triadic likelihood relatedness (R TRI ) estimate 64 , which calculates pairwise relatedness between two individuals while using a third one as control, thus reducing the identical by descent error rate. The matrices were linearly scaled to the range between 0 and 1, where higher values indicate higher similarity. To obtain an indication of dissimilarity, we found the complement of each matrix element by subtracting to 1.
Spatial data. We evaluated the three main environmental features that can influence spatial connectivity and gene flow of fire salamanders across the study region: elevation, habitat and climate. Elevation data was obtained from the Spanish Instituto Geográfico Nacional (https:// www. ign. es) at a 5 m resolution and aggregated to a 250 m resolution for downstream analyses. We produced two variables using a different aggregating function: (1) the average to represent mean elevation at each 250 m pixel, and (2) the standard deviation as a representation of the topographic heterogeneity. This collection of two variables is hereafter referred to as elevation data. We used the Normalized Difference Vegetation Index (NDVI) as a continuous surrogate of habitat availability. The NDVI imagery was obtained with a 16-day period covering a range of 10 years (from January 2009 to December 2018) that includes the field work period. The MOD13Q1 product was downloaded from the Earth-Data website (https:// earth data. nasa. gov/) with a spatial resolution of 250 m and 16-day temporal resolution, resulting in a time-series with 230 measurement layers. To provide summaries of the 16-day NDVI for a few variables, we processed the data with a harmonic regression and extracted three spatial coefficients representing the periodic behaviour in each pixel as continuous surrogates of local habitat 65 . The collection of three variables is hereafter referred as NDVI data. All data was processed in R with packages rgdal (v.1.5.8), raster (v.3.3.13), Figure 5. Landscape genetics analyses workflow used to derive optimized variables representing the different isolation models (IBD, IBR, and IBE). Two parameters of a logistic curve (curvature and inflection point) were applied to the isolation by distance matrix and optimized with genetic algorithms. The objective of optimization is the maximization of the logLikelihood from a GLS regression. Note that this workflow is for a single genetic distance type, thus, it was executed twice for the R QG and R TRI genetic distances. The figure was created with Inkscape software (version 1.0.2; https:// inksc ape. org/). www.nature.com/scientificreports/ TSA (v.1.2.1) and zoo (v. 1.8.8). Climate data was obtained from the CHELSA database (https:// chelsa-clima te. org/ 66 ) as 19 bioclimate variables at 30" (~ 1 km) spatial resolution. We further summarized the climate data by means of a PCA after centering and scaling. Resulting spatial components from PCA are hereafter referred to as the climate data. We projected all spatial data to the WGS84 coordinate system.
Isolation by distance, environment and resistance matrices. We generated geographic, environmental and resistance distance matrices to test for the dominant model of isolation by distance (IBD), environment (IBE), or resistance (IBR) in each contact zone. We produced IBD matrices by calculating Euclidean geographic distances between genotyped individuals in each contact zone. We transformed coordinates to the ETRS89 Lambert Azimuthal Equal-Area projection before calculating the straight line distance between sampling localities. To account for environmental differentiation between sampling locations, we calculated one IBE matrix based on climate (IBE CLIM ) and another based on NDVI (IBE NDVI ) for each contact zone (six IBE matrices in total; see Fig. 5). The IBE matrices were calculated as the Euclidean distance in the environmental space 6,67 . As such, we extracted at the locations of samples the value of the PCA components from climate data and calculated the distances between them. We followed the same procedure with NDVI data.
We generated IBR matrices in relation to descriptive categories: elevation (IBR ALT ), climate (IBR CLIM ), and NDVI (IBR NDVI ). We used a common dataset combining the presence locations from the three contact zones to determine the usage of each variable in each category. Pixel duplicates were removed independently in each category due to the different spatial resolution. We extracted variable data at the presence locations and calculated a vector of average values per variable and the covariance matrix. We used this information to combine the different variables by means of a multivariate normal distribution, generating a map representing the likelihood of finding a presence for each variable category. Hence, this value is related to the usage of the multidimensional space of each combination of variables by the species. We used these maps as a resistance surfaces, after a minmax scaling, in order to generate resistance distances with the 'commuteDistance' function in 'gdistance' R package (v.1.3.6).
Optimizations and model selection. Parameter optimization procedures are common in landscape genetic studies [68][69][70][71] . These optimizations aim at finding find the best combination of parameters describing a species' environmental preferences to a resistance surface that has the best fit with genetic data. Often the aims of the optimization procedure are the parameters of a linear combination of habitat and other niche descriptors by maximizing a value relating the derived resistance distances to the genetic distances 68,70 . Optimizations algorithms have also been used to find the best parameters of a transformation function applied to the resistance distances 69,71 . Usually only the matrix referring to IBR is optimized, hence, it has an advantage over other competing models under a test of the isolation mechanism that better explains genetic divergence. Here we optimized the parameters of a logistic function relating each matrix produced (IBD, IBE and IBR) to the genetic distance matrices in each contact zone. It is expected that genetic differentiation increases with geographical distance. Yet, due to limited movement and finite population sizes and genomes, this increase cannot occur indefinitely and tends to plateau 72 . The same reasoning can be applied to the other models, including resistance and environmental distances. We optimized a logistic function with two parameters (curvature and inflection point), allowing for a near linear arrangement with original distances to a full logistic relation with steep transition. In the latter case, larger distances tend to plateau after the inflection point thus allowing a scenario where genetic differentiation accumulates until a maximum limit. We performed optimizations with genetic algorithms using the R package GA (v.3.2) to maximize the logLikelihood of a Generalized Least Squares (GLS) regression using the genetic matrix as a dependent variable and an isolation matrix as an independent variable. Data in pairwise distance matrices show high degrees of non-independence which hampers the direct application of regressionbased algorithms. The GLS regression incorporates correlation structures informing about the lack of independence between samples. We used a maximum-likelihood population effects (MLPE) model that generates such correlation structure from pairwise data 71 . We used the package nlme (v.3.1.140) for GLS with corMLPE (v.0.0.3, https:// github. com/ nspope/ corML PE). All optimizations were done with a maximum of 1000 iterations or 200 runs without change of the optimization score. All isolation model matrices were min-max scaled to the range 0-1 in order to facilitate the comparison of the optimized parameters.
We used the Akaike Information Criterion (AIC) to rank univariate models and to eliminate those isolation models that were contributing less to the overall fit of the regression ( Table 2). Models with ΔAIC lower or equal to 2 were assumed to be equivalent to best model. By comparing only univariate models and the null model, we do not use competing scenarios of resistance in the same model.

Results
Phylogenetic analysis. We obtained a total of 302 sequences after discarding sequences of poor quality and trimming some sequences (from ca. 1070 bp to 350 bp). Phylogenetic analyses identified two main monophyletic (Posterior Probability > 0.95) clusters within the three studied contact zones: one includes seven well supported mtDNA lineages of putative pueriparous S. s. bernardezi, while the second clade includes three monophyletic lineages of the three focal larviparous subspecies (S. s. gallaica, S. s. bejarae and S. s. fastuosa) (Fig. 1). According to the cytb dating approach, the split between S. s. bernardezi and the monophyletic clade including S. s. gallaica, S. s. bejarae and S. s. fastuosa is estimated at 1. Nuclear genetic structure, cyto-nuclear discordances and hybridization. After discarding individuals with missing data > 10% in the microsatellite dataset, we obtained a fully comparative mtDNA-nuDNA datasets for a total of 302 samples (127, 68 and 107 from NG, CM and AC, respectively). www.nature.com/scientificreports/ STRU CTU RE recovered two well defined genetic groups in each contact zone that largely match the distribution of each mitochondrial lineage (Figs. 2, 3, 4). The first eigenvalue of sPCA analyses confirmed the genetic structure found in STRU CTU RE, which divided the data into two groups for each contact zone (Supplementary Figs. S2-S4). We found a significant global structure in each dataset but not significant support for a local structure in sPCA.
We found relatively similar patterns of hybridization in the contact zones of NG and AC (Table 1), in which, relatively high proportions of admixed individuals and cyto-nuclear discordances were inferred. Specifically, for NG, from the 127 analysed salamanders, a total of 92 individuals (72%) were assigned either to the subspecies S. s. gallaica (49 salamanders; 38%) or S. s. bernardezi (43 salamanders; 34%), while the remaining 35 salamanders (28%) were inferred as potential hybrids (0.1 ≤ q-value ≤ 0.9 in STRU CTU RE). Among these potential hybrids, 29 salamanders were labelled as F2 hybrids in NEWHYBRIDS. Similarly, in AC, from the 107 genotypes, 57 salamanders (53%) were assigned either to the subspecies S. s. fastuosa (16 individuals; 15%) or S. s. bernardezi (41 salamanders; 38%). The remaining 50 salamanders comprised potential hybrids (47%), from which, 38 individuals were categorised as F2 hybrids, two salamanders as pure S. s. bernardezi, and two others as S. s. fastuosa in NEWHYBRIDS. Unlike the latter two contact zones, CM was characterized by a very low proportion of admixed individuals. From the 68 studied salamanders, a total of 65 individuals (96%) were assigned either to the subspecies S. s. bejarae (29 individuals; 43%) or S. s. bernardezi (36 salamanders; 53%), while the remaining three salamanders (4%) that were inferred as potential hybrids were later assigned as pure S. s. bejarae in NEWHYBRIDS. Furthermore, the number of cyto-nuclear discordances was higher for NG and AC (19% and 16%, respectively) than for CM (6%).

Landscape genetics analyses.
The PCA of climate data showed the first three components accounted for 91.4% of total variation. Optimizations of IBD and IBE models showed a high degree of success, with the ΔAIC of the optimized model showing values higher than 15 compared to the non-optimized version (see Supplementary Table S2). The IBR model optimization showed a differential success depending on the contact zone. In the contact zone NG, all IBR models achieved a ΔAIC lower than 4.5, revealing a lack of effect of the optimization. A single IBD univariate model was selected as the best model based on ΔAIC values for the contact zones of NG and AC, independently of the genetic matrix used ( Table 2). The contact zone of CM comprises a more complex scenario because univariate models indicate a dominant action of IBR in addition to IBD, particularly IBR CLIM and IBR NDVI for R QG genetic distances, and IBR CLIM with IBD for R TRI ( Table 2). The combinations of the best models suggest not only a strong effect of IBD in all contact zones, but also a strong effect of habitat and climate resistance across the CM contact zone. The relationship between pairwise genetic distance (for both R QG and R TRI ) and pairwise geographic and environmental distances for each contact zone are represented in Supplementary Figs. S5 and S6.

Discussion
The degree of reproductive isolation is expected to be larger between lineages showing differentiation in parity modes 73 , morphology 74 or behaviour 75 , though time since divergence also enhances reproductive barriers 76 . Our fine-scale sampling and molecular approach contributes to an accurate characterisation of the location and spatial extent of the three studied contact zones, where the morphologically 29 and phylogenetically 30 divergent pueriparous S. s. bernardezi meets three larviparous subspecies (S. s. gallaica, S. s. bejarae, and S. s. fastuosa). We detected clear signs of gene flow and introgression in two out of three regions, thus supporting the hypothesis of incomplete reproductive isolation between these highly divergent S. salamandra morphotypes. Despite the extensive patterns of introgression and admixture in two contact zones (NG and AC), the Cantabrian Mountains (CM) likely reduce gene flow between the larviparous S. s. bejarae and pueriparous S. s. bernardezi. This study suggests the combined effect of landscape and climate barriers shape patterns of hybridization, thus highlighting the need of genetic studies across multiple and heterogeneous contact zones to better delineate the spatial patterns of gene flow and hibridisation between non-reproductively isolated taxa.
Geographic range and spatial extent of Salamandra salamandra contact zones. Previous phylogeographic studies suggested the eastern Galician mountains comprise a barrier to gene flow between S. s. gallaica and S. s. bernardezi 31,32 . The high proportion of admixed individuals inferred from our genetic analyses Table 1. Genetic assignments based on STRU CTU RE (Pure 1, Pure 2 and Hybrids; with both total numbers and percentages by class), and NEWHYBRIDS (F1, F2, backcrosses; total numbers by class). www.nature.com/scientificreports/ demonstrates the existence of a contact zone between western S. s. bernardezi and northern S. s. gallaica populations within a relatively homogenous landscape in northern Galicia (NG), thus indicating previous works were likely biased due to an insufficient sampling in this region. We detected a north-south contact zone between S. s. bernardezi and S. s. gallaica located ca. 10-30 kms from the northern Galician coast (Fig. 2). The sampled salamanders in NG were largely characterised by a mixture of phenotypes typical of S. s. gallaica and S. s. bernardezi, as previously documented by morphological studies in this region 35,77 . The coloration pattern across this area varied considerable. Most admixed individuals showed a disrupted striped colouration, resembling S. s. bernardezi phenotype, but also showed intermediate colouration patterns with both stripes and blotches. These admixed individuals also showed intermediate body sizes, and variable snout shapes, thus hindering subspecies assignment and supporting the existence of a hybrid zone. Additionally, the presence of females with a mixed reproductive mode (i.e. females give birth to both metamorphosed juveniles and aquatic larvae 35 ; unpublished data) corroborates the presence of a hybrid zone in NG. Patterns of genetic structure confirmed a negligible hybrid zone between southern S. s. bernardezi and northern S. s. bejarae populations across the Cantabrian Mountains (CM; Fig. 3). Contrary to NG, the contact zone in CM shows an abrupt change of phenotypes between populations inhabiting southern slopes, where fire salamanders are larviparous, large, blotched, with a relatively pointed snout, and populations located on northern slopes, in which individuals exhibit pueriparity, are small, striped and often present rounded snouts 29,31,77 .
Similar to NG, we also found a high proportion of admixed individuals along the Asturias-Cantabria border (AC), where eastern populations of S. s. bernardezi hybridize with S. s. fastuosa along lowlands (west-east corridors) located between mountain chains (Sierra de Cuera and Picos de Europa; Fig. 4). This area coincides with a secondary contact between amphibian lineages with Atlantic climate affinity (Lissotriton helveticus 78 ; Rana parvipalmata-R. temporaria 79 ). The deep divergence found in Rana points to an old split (4 Mya) and a narrow hybrid zone (25 km), which led the authors to conclude these two lineages represent distinct species. Salamandra, however, shows a more recent split (late Pleistocene). Despite this study showing the existence of a contact zone between these subspecies, previous phylogeograhic work placed this hybrid zone farther east to the Pyrenees 30,31 . The identification of the contact zone between S. s. bernardezi and S. s. fastuosa is not possible at the morphological level because S. s. fastuosa appears to be phenotypically indistinguishable from S. s. bernardezi. We have also detected females with mixed reproductive modes across AC, and also cases of larviparity and pueriparity within the S. s. bernardezi and S. s. fastuosa ranges, respectively, which again supports the presence of a relatively broad hybrid zone in this area. Nevertheless, future studies should be conducted to accurately delimit the range of S. s. fastuosa and the level of introgression across its range. Table 2. Results of the GLS regression with MLPE correlation relating the genetic distance with different models of isolation (IBD, IBE and IBR). Predictors showing at least one best model are highlighted in bold. The log Likelihood (logL) ratio compares each model with an intercept-only model. The AIC differences (∆AIC) is in respect to the best model (lowest AIC) in each combination of genetic distance matrix and contact zone. Models with ΔAIC lower or equal to 2 (best model) are highlighted in bold.  81 , and between subspecies in its sister species, S. algira 82 . Similarly, we found cyto-nuclear discordances across the three studied contact zones, though these were higher in NG and AC than in CM. Despite the marked genetic 30 and phenotypic (i.e., colour patterns 35 , morphology 29 , and reproductive modes 27,83 ) differences between S. s. gallaica and S. s. bernardezi, we found extensive gene flow across the entire NG, thus confirming the lack of reproductive isolation between these divergent taxa, and the widespread presence of admixed phenotypes in this contact zone. We found an asymmetric pattern of cyto-nuclear discordances between both lineages. The majority of introgressed individuals exhibited a S. s. bernardezi mtDNA, which could suggest a northward expansion by S. s. gallaica followed by a mitochondrial capture of S. s. bernardezi. This process is potentially in agreement with the male-biased dispersal hypothesis, in which, through population expansions, fire salamander males may disperse longer distances than females 50,84 and move into their neighbouring subspecies range, thus capturing the local mitochondrial lineage after a few generations. Such a process was suggested as a potential driver of cyto-nuclear discordances among other S. salamandra subspecies 31,48,81 . However, S. s. gallaica mtDNA reached the northern Galician coast and it is known that S. s. gallaica mtDNA is present at low frequencies within the range of S. s. bernardezi 38 . Whether this is the result of a recent expansion of both males and females of S. s. gallaica into the range of S. s. bernardezi or a male-biased colonization of the latter into a historical and wider range of the former remains an open question, with important implications for the mid-and long-term viability of S. s. bernardezi, which warrants future research.

CZ
In CM, we found very few cases of genetic introgression at both windward and leeward sides of the Cantabrian Mountains, but no signs of hybridization. The negligible signs of admixture and hybridization between the larviparous and large S. s. bejarae and the pueriparous and small S. s. bernardezi across the CM are likely due to unfavourable environmental conditions of high-elevation areas (see next section). These conditions likely preclude the presence or reduce population densities and activity of fire salamanders, which in turn hinder dispersal across complex topographic areas. However, valleys (1400-1700 m a.s.l.) oriented in a north-south axis might act as mountain passages allowing sporadic dispersal events of fire salamander across the CM.
Like NG, we found high levels of gene flow and hybridization across AC and, consequently, absence of endogenous reproductive barriers between S. s. bernardezi and S. s. fastuosa. This is an expected result because of their phenotypic similarity and shared recent evolutionary histories. The relatively high proportion of cyto-nuclear discordances found for both subspecies comprises another pattern in common with NG, although the underlying drivers responsible for this genetic pattern are likely different. Selection of traits encoded in the nuclear genome could have boosted such discordances in AC along the expansion wave. Nuclear phylogenomic analyses place S. s. fastuosa as the sister taxon of S. s. bernardezi, yet S. s. fastuosa mtDNA falls within the S. s. gallaica-bejarae clade 30 . This genetic discordance fits a demographic scenario where selected traits (i.e. colouration pattern, snout shape and reproductive strategy) encoded by nuclear genes in S. s. bernardezi spread eastwards into the S. s. fastuosa area during range expansions of favourable climatic cycles in the Pleistocene 31 . It is possible that only the original mitochondrial legacy of S. s. fastuosa was maintained though historical mitochondrial capture when both lineages introgressed, a process that has been suggested for Rana temporaria in this region 79 . Alternatively, S. s. fastuosa could first diverge from S. s. bernardezi, and then followed by a process of historical mitochondrial capture of S. s. gallaica-bejarae. Further sampling across this region and the use of genomic data and demographic models will allow testing the above contrasting hypotheses.
Environmental barriers shape gene flow between S. salamandra populations. Landscape genetic analyses revealed the IBD model is a strong predictor of genetic differentiation in the NG and AC contact zones and, to a lesser extent, in CM. The marked genetic structure and absence of hybrids across the Cantabrian Mountains suggests this mountain chain is a major barrier to dispersal and gene flow between larviparous and pueriparous lineages, with mountain valleys probably allowing sporadic dispersal events. Interestingly, the observed pattern of genetic differentiation in CM could not be explained by the topographic complexity across this area, as we previously hypothesised, but it was influenced by the climate (IBRclim) and landcover (IBRndvi), albeit with some disagreement between the two genetic distance metrics used here. There is a strong environmental gradient acting between the southern and northern slopes of the Cantabrian Mountains, following the sharp transition between Atlantic and Mediterranean bioclimates, which t is reinforcing the effect that prevents rampant gene flow between S. s. bernardezi and S. s. bejarae along this CZ (see Supplementary Fig. S1). In addition, the climatic heterogeneity associated with the mountain ridges (e.g. higher wind exposure, colder climate, shorting of the growing season, or rarefaction of surface water bodies) often act as impermeable barriers to fire salamanders 38,85 , other salamander species [86][87][88][89] , and anurans 90 . On the other hand, the relatively smooth topography across NG and consequent softer climate transitions in the west-east corridors (i.e. lowlands) across AC likely facilitate salamander dispersal.
Other landscape barriers are likely affecting gene flow across the three studied contact zones. For instance, the reduction and conversion of woodlands to mosaics of agricultural fields, urban settlements, and plantations of exotic trees have been shown to hinder dispersal and enhance genetic differentiation among fire salamander populations 38,41 , though the degree of fragmentation observed in NG and AC appears to be insufficient to prevent hybridization at the spatial scale of this study. Water courses are also known to affect gene flow in S. Salamandra, particularly, in S. s. bernardezi due to its terrestrial life cycle 91 . Two main rivers (Deva and Nansa) cross the AC hybrid zone in a north-south axis and their locations coincide with a phylogeographic break found in other amphibians 78 www.nature.com/scientificreports/ fastuosa, the observed patterns of admixture between these lineages suggest they could act as semi-permeable barriers to gene flow. Also, larviparous individuals would have a greater advantage to overcome these geographic barriers during the larval stage 91 , but our data prevent us from making robust inferences about crossing rates per reproductive mode. Besides the aforementioned environmental factors, we cannot discount that other phenomena may be at play here. Adaptation to ecological environments has been proposed to explain behavioural (pond vs. stream reproduction 92 , but see 93 ) and morphological (colour polymorphism 94 ) differentiation in fire salamanders, and it may be a common underlying mechanism explaining the elevated intra-specific phenotypic variation that characterizes this species 27,29,95 . Our study did not identify divergent natural selection (i.e. support for an IBE scenario) as a driver of the distinct morphological and reproductive types across northern Iberian S. salamandra populations. Although environmental differentiation observed between southern and northern slopes of the Cantabrian Mountains (i.e. climate and habitat, see Supplementary Fig. S1) resulted in a negligible effect from the IBE model, we cannot discount possible latent adaptive divergence to the different ecological environments found at each side of this mountain chain. We note that the other contact zones also have strong gradients, yet, they occur along the genetic transition. This implies that both subspecies are in contact with the same environmental gradient available in each contact zone. In the Cantabrian Mountains, however, the contact zone coincides with a marked change of the environmental gradient, producing favourable conditions for putative adaptive divergence. These mountains constitute the clearest contact zone where subspecies occupy different environmental niches. While NG and AC are located in a heterogeneous landscape (mosaic of eucalypts plantations, and agricultural bocage in lowland areas, and native forests and scrublands in mid-elevation areas), with a soft change in climatic conditions, in the CM, the wetter northern slopes dominated by Atlantic deciduous forests strongly contrast with the dryer southern slopes exhibiting vegetation typical of Mediterranean ecosystems 96 . Hence, fire salamanders at both sides of the CM likely experience different environmental regimes that may influence their evolutionary trajectories.
The barrier effect imposed by the Cantabrian Mountains, together with a potential ecological adaptation to the local habitat and climate present in the northern and southern slopes of this mountain chain, may also contribute to the reduced levels of admixture and introgression observed between S. s. bernardezi and S. s. bejarae, which ultimately can lead to speciation events. However, considering the ongoing gene flow between S. s. bernardezi and other S. salamandra subspecies in environmentally homogeneous contact zones (e.g. NG and AC), our study reinforces the intra-specific polymorphic nature of this species at various phenotypic levels. Finally, we cannot neglect potential historical differences underlying the formation and maintenance of these contact zones. These differences may involve an older and well established hybrid zone in northern Galicia (S. s. gallaica-S. s. bernardezi), and a recent secondary contact zone where population edges are still expanding across the CM (S. s. bejarae-S. s. bernardezi).

Conclusions
Our study shows that patterns of hybridization between S. salamandra subspecies are highly heterogeneous across space, thus emphasizing the role of environmental factors in shaping patterns of hybridization and reproductive isolation. This is best illustrated by the Cantabrian Mountains, which likely comprise an effective barrier to gene flow between the pueriparous S. s. bernardezi and larviparous S. s. bejarae. Despite the fact that prezygotic and postzygotic barriers were not detected in our study, it is unclear whether hybridization among these subspecies negatively affects survival and fertility of hybrids. This study also has important conservation and taxonomic implications, as the elevation of intra-specific lineages to species level is often based on the reduction of gene flow found between lineages across a small area of their entire contact zone, thus neglecting the environmental heterogeneity, and putative different outcomes of hybridization. Future multidisciplinary studies (integrating phenotypic, behavioural, ecological, and genomic-wide data) are needed to evaluate the potential genomic consequences of hybridization in the fire salamander, the spatio-temporal dynamics of these hybrid zones (i.e. location and spatial extent through time), and the potential role of ecological divergence in the Cantabrian Mountains 9,10,48 .

Data availability
The dataset containing microsatellite genotype data, environmental layers, input files, and the R code used to perform optimisation analyses are available upon request. Sequences generated for the mtDNA cytb fragment have been submitted to GenBank.