Hybridization and extensive mitochondrial introgression among fire salamanders in peninsular Italy

Discordance between mitochondrial and nuclear patterns of population genetic structure is providing key insights into the eco-evolutionary dynamics between and within species, and their assessment is highly relevant to biodiversity monitoring practices based on DNA barcoding approaches. Here, we investigate the population genetic structure of the fire salamander Salamandra salamandra in peninsular Italy. Both mitochondrial and nuclear markers clearly identified two main population groups. However, nuclear and mitochondrial zones of geographic transition between groups were located 600 km from one another. Recent population declines in central Italy partially erased the genetic imprints of past hybridization dynamics. However, the overall pattern of genetic variation, together with morphological and fossil data, suggest that a rampant mitochondrial introgression triggered the observed mitonuclear discordance, following a post-glacial secondary contact between lineages. Our results clearly show the major role played by reticulate evolution in shaping the structure of Salamandra salamandra populations and, together with similar findings in other regions of the species’ range, contribute to identify the fire salamander as a particularly intriguing case to investigate the complexity of mechanisms triggering patterns of mitonuclear discordance in animals.

In principle, concordance between distinct phenotypic traits and/or genetic markers is a plausible expectation when analysing geographic patterns of biological diversity, above and below the species level. Derived from the shared history of individual characters of organisms, this expectation forms the basis for the extensive use of single markers (notably mitochondrial DNA) to draw inferences about the evolutionary history of populations and species 1 , as well as to assess their identity and geographic distribution using barcoding approaches 2,3 . In practice, however, discordant patterns of variation among characters and among markers are not uncommon (e.g. [4][5][6] among others). Several processes have been involved with the formation of such discordant patterns of variation, including various forms of selection, adaptive introgression, demographic disparities, hybrid zone movements, and sex-biased processes [7][8][9][10] . Consequently, discordance per se is emerging as a tremendous source of insights into the evolutionary process.
The drawbacks of using mitochondrial DNA (mtDNA) as the sole marker of genetic variation among and within populations are now widely acknowledged [11][12][13] , to the extent that a comparative approach using both mitochondrial and nuclear genetic data is becoming customary in phylogeographic investigations 14 . At the same time, however, after more than three decades when most phylogeographic studies adopted a single-marker approach (since 15 ), there is now a huge amount of mtDNA data which needs to be complemented, in order to reach reliable conclusions about the evolutionary history and current genetic structure of populations. Furthermore, integrating mtDNA data with those drawn from the nuclear genome also has major practical implications. Indeed, mtDNA has been widely employed to revise classical taxonomy 16 , and to define conservation strategies for species and populations threatened by human impacts on the natural environment 17 . While in most cases such integration will hopefully lead to only slight revisions of previous conclusions based on mtDNA data, there are compelling examples of the far reaching implications of a more integrative experimental approach to the study of population structure and history 7 .
The fire salamander Salamandra salamandra (Linnaeus, 1758) is a temperate amphibian largely distributed in Europe 18 . Owing to its remarkable phenotypic variation in traits encompassing external morphology, colour of discordance observed between phenotypic traits and mtDNA data, and (3) shed light on its possible causes in these salamanders.

Results
Nuclear genetic variation was investigated at level of 23 allozyme loci (Table 1). Five loci (Mdhp-2, Mpi, Sod-1, Aat-2, and Ldh) were found monomorphic for the same allele in all the samples studied, while further three loci (Mdh-1, Mdh-2, Icdh-2) were found polymorphic at level of a single private allele, observed at low frequency (≤0.05). Allele frequencies observed at the remaining 15 loci are shown in Table 2. No statistically significant deviations were detected (at the 5% nominal level) from the expected Hardy-Weinberg (HW) equilibrium at each locus within populations, and from the expected genotypic linkage equilibria between pairs of loci within populations.
The sample-based Bayesian clustering method implemented in BAPS, yielded results fully consistent with those obtained with TESS, for all values of K between 2 and 5 ( Fig. 2C-F). Nevertheless, it suggested K = 7 as the best clustering option. On the other hand, both with K = 6 and K = 7 single populations were assigned to distinct clusters. Identical results were obtained running this analysis either with or without using geographic location of samples as prior information.
Estimates of genetic diversity were computed for sample pools consistent with K = 5, that is, the highest level of population structure identified by both TESS and BAPS analyses. As shown in Table 2, the lowest level of diversity was observed at all parameters for the southernmost group (pooling samples 1-5), whereas the highest values were observed for the north-western group (pooling samples [16][17][18]. The final mtDNA sequence alignment comprised 1220 bp for all the 158 S. salamandra individuals analysed. The cytochrome B gene fragment analysed (hereon CYTB) was 582 bp in length, with 23 variable positions of  G3pdh           which 21 were parsimony informative and the cytochrome oxidase subunit I gene fragment (hereon COI) was 638 bp in length, and showed 29 variable positions, of which 24 were parsimony informative. Twenty-five unique haplotypes were identified in the concatenated dataset, whose distribution among the sampled populations is shown in Table 1.
The mtDNA haplotype genealogy estimated using HaplotypeViewer is presented in Fig. 2A. Two main groups of closely related haplotypes were observed, separated from one another by 25 mutational steps. One haplogroup (green in Fig. 2A; haplotype series S) was geographically restricted to the south of the Italian peninsula (samples 1-8), whereas the other haplogroup (red in Fig. 2A; haplotype series N) was widespread from the alpine arc to samples in south-central Italy (samples [8][9][10][11][12][13][14][15][16][17][18][19][20][21]. Syntopy between both haplogroups was only found at the geographically intermediate sample 8. Finally, the two sub-groups found within the southern lineage at the level of the nuclear dataset (Fig. 2E), have also been observed with the mtDNA (dark green and light green sub-clades in Fig. 2A). However, syntopy between both sub-clades was not observed at the mitochondrial level.

Discussion
Our analyses of mitochondrial and nuclear genetic variation concordantly, and not unexpectedly (see Introduction), show that two main lineages of S. salamandra occur along the Italian peninsula. However, while populations marking the geographic transition between groups have been observed both with mitochondrial and with nuclear markers (samples 8 and 16), their respective geographic locations were more than 600 km apart (see Fig. 2A,C).
Mitonuclear discordances are not uncommon in amphibians (e.g. 4,27-32 ), but the extent of discordance observed in the present study is conspicuous. The geographic displacement between mitochondrial and nuclear contact zones among the two main lineages, largely exceeds the extent of discordances previously noted among fire salamander lineages from different geographic regions [21][22][23] , and has no parallels among different taxa from peninsular Italy. Interestingly, both mtDNA and nuclear contact zones are located within well-known suture zones (sensu 33 ), where interspecific and intraspecific hybrid zones and range edges have been previously reported for various taxa, including amphibians (e.g. 4,[34][35][36][37]. Therefore, lines of cross-taxon concordance cannot be used here as inferential clues in the attempt to reconcile such discordance into a population history. The close affinity among mtDNA sequences of individuals sampled in northern Balkans, northern Europe, southern Alps, and north-western Apennines (this study and 24 ), suggests that the northern lineage (namely S. s. salamandra) colonized the Apennines as the last step of a large-scale range expansion out of the Balkans into Western Europe 24 . Here, it underwent a secondary contact with the southern, peninsular endemic lineage (namely S. s. gigliolii), most likely during the post-glacial epoch 24 . But where did the two lineages first meet and mate? And what processes might best explain the observed mito-nuclear discordance? At least two conflicting scenarios could be delineated in this respect. The first that the secondary contact between both lineages first occurred in the north-western Apennines, as depicted by nuclear data (Fig. 2C), and that this event was followed by southward introgression of the S. s. salamandra mitochondrial lineage into the S. s. gigliolii populations in northern and central Apennines. The second that the secondary contact occurred in south-central Apennines, as depicted by the mtDNA transition, and that the hybrid zone between nuclear genomes has moved northward to its current location, not followed by a movement of the mtDNA contact zone. Both rampant mitochondrial introgression triggered by positive selection and hybrid zone movement have been repeatedly invoked to explain instances of mitonuclear discordances 7 , and several underlying mechanisms have been proposed 7,8,10,38,39 . However, in the absence of replicated temporal samples or of replicated contact zones at disparate sites, disentangling these competing scenarios is a challenging task, since most of the genetic patterns of variation expected under one scenario do not entirely exclude the other 40 . In the present case, however, two main lines of evidence lead us to tentatively favour southward mtDNA introgression over northward hybrid zone movement.
A first useful indication in this regard comes from the spatial genetic structure observed with allozymes. Indeed, in a moving hybrid zone scenario, a unidirectional clinal tail of admixture is expected, as a consequence of the movement 40,41 (and references therein). For example, in a similar case concerning a Triturus newts' hybrid zone with extensive mito-nuclear discordance, Wielstra and colleagues 32 found extensive nuclear clines -and introgression of multiple distinct and geographically structured mtDNA clades -consistent with the genomic footprint of a hybrid zone movement. The highly fragmented geographic distribution of the fire salamander through north-central and central Apennines, prevented us from carrying out a formal and straightforward cline analysis. Nonetheless, the strongly structured pattern of genetic variation that emerged with the Bayesian clustering analyses of nuclear variation, does not conform with this expectation. In fact, within the range of the southern lineage, it showed the occurrence of genetic sub-clusters along the north-south axis (Fig. 2D,E), and a lack of clines in admixture coefficients within the putative area of hybrid zone movement, at all the hierarchical levels of the analysis.  A second indication comes from the Upper Pleistocene fossil record. A scenario of post-glacial colonization of northern and central Apennines by S. s. salamandra, followed by the establishment of a hybrid zone in south-central Apennine and its subsequent displacement northward to its current location, would imply the absence of fire salamanders in the northern Apennines until the post-glacial arrival of S. s. salamandra. However, this scenario is in contrast with a fire salamander fossil record found in north-western Apennines (Grotta di Equi 42 ), close to our sampling site 14, and dated back to the last glacial epoch (around 45 000 years bp). The occurrence of this fossil record is instead consistent with a scenario of a secondary contact zone primarily located in the north-western Apennines, and a mtDNA introgression southward. Under this scenario, the genetic cluster emerging at K = 3, grouping samples 12-14, would reflect the presence of S. s. gigliolii in this area during the last glaciation, and the occurrence of a glacial sub-refugium within the region, as already suggested for several temperate species (see 37 and references therein). Alternatively, the fossil record might suggest a glacial refugium for the northern lineage in the northern Apennines. As glacial conditions alleviated the northern lineage could have moved south to meet the southern lineage there. Subsequently, the southern lineage may have expanded to the north, at the expense of the northern one, with hybridization, resulting in mtDNA introgression. However, a plausible expectation under this scenario would be some level of mtDNA sequence divergence between populations S. s. salamandra in Italy, achieved during the prolonged isolation in distinct refugia (i.e. Balkan and north-western Apennines). Instead, such an instance of divergence was not observed.
Under a scenario of S. s. salamandra mtDNA moving southward, then the main question to answer is, why did S. s. salamandra mtDNA introgressed so deeply into S. s. gigliolii populations? Providing a definitive answer goes behind the scope of this study, and would require specific experimental designs. However, it is worth noting that several of the commonly invoked causes of mito-nuclear discordance could already be dismissed in the present case. First, sex-biased processes, such as female-biased dispersal 43 or disassortative preferences in female mate choice 30 , are unlikely to have played a role. In fact, although extensive surveys are still lacking, the available evidence does not support such female-biased processes in S. salamandra 43 . Second, purely stochastic processes related to demographic disparities between interacting lineages upon secondary contact 8,38 , are unlikely to have been implicated either. Literature surveys and simulation-based works 7,8,38 strongly suggest that differential introgression triggered by these processes is more likely to occur at the uniparentally inherited markers, and from the resident toward the expanding lineage. In the present case, this pattern would imply massive introgression of mtDNA from east to west of the contact zone (i.e. from S. s. gigliolii into S. s. salamandra), that is, in the opposite direction actually observed.
Noteworthy, recent evidence 44 suggests that, under a wide range of conditions, sex-biased and purely demographic processes are unlikely to yield strong mito-nuclear discordances. In contrast, adaptive mitochondrial introgression can easily generate such discordances 44 , particularly in low-dispersal organisms, as is the case of S. salamandra. Of course, given the many organismal functions in which the mitochondrial genome is implicated [45][46][47][48] , and the unsuitability of our data in this respect, we refrain from even hypothesizing which adaptive advantage might have promoted a massive mtDNA introgression. In light of the frequently observed mito-nuclear discordances in S. salamandra, at various levels of population structure 21-23 , we see this issue as a major research question, amenable to future experimental efforts.
Our results clearly show the major role played by reticulate evolution in shaping the structure of Salamandra salamandra populations and, together with similar findings in other regions of the species' range, contribute to identify the fire salamander as a particularly intriguing case to investigate the complexity of mechanisms triggering patterns of mitonuclear discordance in animals.

Sampling.
We collected 188 individuals of S. salamandra from 21 localities, ranging from the southern slope of the Alps to the southern tip of the Italian peninsula ( Fig. 1 and Table 1). Each individual was anaesthetized in the field by submersion in a 0.02% solution of MS222 (3-aminobenzoic acid ethyl ester), and tissue samples were taken through a toe-clipping procedure, then each individual was released in its sampling point. Collected tissues were brought to laboratory in liquid nitrogen, and then stored at −80 °C. Field works and collection of tissues were approved by the Italian Ministry of Environment (permit numbers: DPN-2009-0026530) and were performed in accordance with the relevant guidelines and regulations.
Laboratory procedures. Standard horizontal starch gel (10%) electrophoresis was used to investigate the genetic variation of 23 putative allozyme loci. Enzyme systems analyzed, putative loci, and buffer systems used in electrophoretic procedures are listed in Table 3. Alleles were called by their mobility (in mm) with respect to the most common (100) in a reference population (Campone).
Genomic DNA was extracted using proteinase K digestion followed by a standard phenol-chloroform protocol 49 . Polymerase chain reactions (PCR) were carried out to amplify portions of the two mitochondrial gene fragments CYTB and COI. After preliminary PCR amplifications using generic primers drawn from the literature 50,51 , more specific primer pairs were designed and used to amplify and sequence all the individuals analysed. Primers designed and used to amplify the CYTB gene fragment were 494-salamod-CATCAACATCTCCTACTGATGAAA and CYTB-salamod-GGAGTAAGCAGTGAGATTACC, whereas VF1d-TTCTCAACCAACCACAARGAYATYGG and VR1d-TAGACTTCTGGGTGGCCRAARAAYCA 52 were used to screen variation at the COI gene fragment. Allozymes data analysis. Allele frequencies and genetic diversity estimates, including allelic richness, observed heterozygosity, and unbiased expected heterozygosity 53 , were estimated using the programs GENETIX 4.05.2 software 54 . The occurrence of Hardy-Weinberg and genotypic linkage equilibria were addressed for each locus and locus-pair, respectively, in each sample by means of exact tests as implemented in FSTAT 2.9.3 55 .
We investigated the genetic structure of S. salamandra populations along peninsular Italy, using two Bayesian clustering methods: the individual-based and spatially explicit approach implemented by TESS 2.3.1 56 , and the sample-based approach implemented in BAPS v 6 57 .
The analysis with TESS was carried using the admixture model, the option to update the spatial interaction parameter activated, and all other settings left to the default options. A preliminary analysis was conducted to restrict the range of plausible K values (i.e. the number of clusters). Values of K between 1 and 21 were tested, with 10 replicates per K, each of 5000 iterations following 3000 iterations discarded as burn-in. The settings for the final run were: K between 2 and 10, 100 replicates per K, and a burn-in of 30 000 iterations followed by 50 000 iterations. For each value of K, the 10 replicates with the lowest Deviance Information Criterion (DIC) were permuted in the software CLUMPP 1.1.2 58 , and the resulting clustering was visualized with DISTRUCT 59 .
The Bayesian clustering analysis with BAPS was carried out using population samples as the units of the analysis. With this method we run two sets of analyses, with and without using geographic coordinates as prior information to infer the best number of clusters (K). In both cases, we explored values of maximal K between 2 and 10, carrying out 5 replicates for each maximal K to check for consistency among runs.
When interpreting results from both TESS and BAPS analyses, we followed recommendations by Meirmans 60 , that is, we presented and discussed results of each clustering option deserving biological interpretation, irrespective of the K-value that is deemed 'optimal' according to the summary statistics.
Mitochondrial DNA data analysis. Sequence electropherograms were controlled by eye using the software FinchTv1.4.0 (Geospiza Inc.), and the sequence alignments were obtained using CLUSTALX 2.0 61 . Sequences of the two gene fragments analysed were concatenated using the software CONCATENATOR 1.1.0 62 . The best partitioning strategy for the concatenated mtDNA dataset was assessed by means of the software PartitionFinder v1.0.1 63 , using the Bayesian information (BI) criterion and the 'greedy' search method. This analysis suggested that the HKY substitution model best fitted all the data partition considered (1 st , 2 nd , and 3 rd position for either COI or CYTB fragments).
The phylogenetic relationships between haplotypes were inferred by means of the maximum likelihood algorithm as implemented in PhyML 3.10 64 , using default settings for all parameters but the substitution model (HKY) and the type of tree improvement (SPR and NNI). The estimated tree was then converted into an haplotype network using the software HaplotypeViewer 65 .
The overall pattern of genetic variation observed at level of the entire dataset, suggested that the mtDNA variation might be not entirely reflective of the population structure and history of S. salamandra in Italy (see Results). Consequently, we refrained from using mtDNA variation to estimating historical demographic processes, phylogeographic dynamics, and their time line.