Hybridization and rampant mitochondrial introgression among fire salamanders in peninsular Italy

Discordance between mitochondrial and nuclear patterns of population genetic structure is providing key insights into the eco-evolutionary dynamics between and within species, and their assessment is highly relevant to biodiversity monitoring practices based on DNA barcoding approaches. Here, we investigate the population genetic structure of the fire salamander Salamandra salamandra in peninsular Italy. Both mitochondrial and nuclear markers clearly identified two main population groups. However, nuclear and mitochondrial zones of geographic transition between groups were located 600 km from one another. The overall pattern of genetic variation, together with morphological and fossil data, suggest that a rampant mitochondrial introgression triggered the observed mitonuclear discordance, following a post-glacial secondary contact between lineages. Moreover, at a shallower level of population structure, we observed evidence of asymmetric introgression of nuclear genes between two sub-groups in southern Italy. Our results clearly show the major role played by reticulate evolution in shaping the structure of Salamandra salamandra populations and, together with similar findings in other regions of the species’ range, contribute to identify the fire salamander as a particularly intriguing case to investigate the complexity of mechanisms triggering patterns of mitonuclear discordance in animals.


Introduction
In principle, concordance between distinct phenotypic traits and/or genetic markers is a plausible expectation when analysing geographic patterns of biological diversity, above and below the species level. Derived from the shared history of individual characters of organisms, this expectation forms the basis for the extensive use of single markers (notably mitochondrial DNA) to draw inferences about the evolutionary history of populations and species 1 , as well as to assess their identity and geographic distribution using barcoding approaches 2,3 . In practice, however, discordant patterns of variation among characters and among markers are not uncommon (e.g. 4-6 among others). Several processes have been involved with the formation of such discordant patterns of variation, including various forms of selection, adaptive introgression, demographic disparities, hybrid zone movements, and sex-biased processes [7][8][9][10] . Consequently, discordance per se is emerging as a tremendous source of insights into the evolutionary process.
The drawbacks of using mitochondrial DNA (mtDNA) as the sole marker of genetic variation among and within populations are now widely acknowledged [11][12][13] , to the extent that a comparative approach using both mitochondrial (mtDNA) and nuclear (nuDNA) genetic data is becoming customary in phylogeographic investigations 14 . At the same time, however, after more than three decades when most phylogeographic studies adopted a single-marker approach (since 15 ), there is now a huge amount of mtDNA data which needs to be complemented, in order to reach reliable conclusions about the evolutionary history and current genetic structure of populations.
Furthermore, integrating mtDNA data with those drawn from the nuclear genome also has major practical implications. Indeed, mtDNA has been widely employed to revise classical taxonomy 16 , and to define conservation strategies for species and populations threatened by human impacts on the natural environment 17 . While in most cases such integration will hopefully lead to only slightly revisions of previous conclusions based on mtDNA data, there are compelling examples of the far reaching implications of a more integrative experimental approach to the study of population structure and history 7 .
The fire salamander Salamandra salamandra (Linnaeus, 1758) is a temperate amphibian largely distributed in Europe 18 . Owing to its remarkable phenotypic variation in traits encompassing external morphology, colour patterns, and reproductive strategy, the geographic variation and population structure of S. salamandra have been investigated in several portions of its range, using various combinations of phenotypic and genetic traits 19,20 . As a consequence, the taxonomy of fire salamanders has been long discussed and repeatedly revised. Once regarded as a single highly polytypic species, the fire salamander group is now recognized as four distinct species , with more than 10 sub-species within S. salamandra alone 19 . At the same time, conflicting patterns of variation between traits have been frequently observed, and their analysis has formed the basis for intriguing insights into processes of reticulate evolution, range-change dynamics, and life-history traits evolution [21][22][23] . Consequently, the fire salamander is emerging as a compelling system for studying the contribution of multiple processes to the formation of intraspecific patterns of biological diversity 20 .
In this paper we investigate the population genetic structure of the fire salamander S. salamandra in peninsular Italy, a geographic area where instances of discordance between morphological traits and preliminary mtDNA data were previously observed 24 , without being reconciled into a population history, and whose underlying causes remain unclear. Based mostly on differences in colour patterns and body shape, fire salamander populations occurring along the Italian peninsula have long been attributed to the endemic subspecies S. s. giglioli, whereas populations from the pre-alpine and alpine areas have been attributed to the nominal subspecies S. s. salamandra 19,25,26 , with some intergradation between the two subspecies through the Liguria region (see Figure 1). On the other hand, a phylogeographic study of the mtDNA variation throughout the species range 24 found mtDNA haplotypes typical of the northern subspecies S. s. salamandra occurring in south-central Italy, that is, several hundred kilometres to the south of the area of intergradation between the two subspecies, as outlined by phenotypic trait variation.
Here, using markers of both nuclear and mitochondrial genetic variation, in combination with phylogeographic and Bayesian population structure analytical tools, we aim to 1) assess the population genetic structure of the fire salamander throughout the Italian peninsula, 2) get a comprehensive understanding of the pattern of discordance observed between phenotypic traits and mtDNA data, and 3) shed light on its possible causes in these salamanders.

Results
Nuclear genetic variation was investigated at level of 23 allozyme loci. Five loci (Mdhp-2, Mpi, Sod-1, Aat-2, and Ldh) were found monomorphic for the same allele in all the samples studied, while further three loci (Mdh-1, Mdh-2, Icdh-2) were found polymorphic at level of a single private allele, observed at low frequency (≤0.05). Allele frequencies observed at the remaining 15 loci are shown in Table 2. No statistically significant deviations were detected (at the 5% nominal level) from the expected Hardy-Weinberg (HW) equilibrium at each locus within populations, and from the expected genotypic linkage equilibria between pairs of loci within populations.
The spatially explicit Bayesian clustering analyses conducted with TESS based on allozyme variation, indicated K=5 as the best grouping option for this dataset. showing substantial evidence of admixture between both groups. When the best clustering option for K=3 was considered ( Figure 2D), peninsular samples were further subdivided into two groups with negligible evidence for admixture (see sample 9 in Figure 2D). One group clustered southern samples (1-10), while the other clustered northern peninsular samples (12)(13)(14). With the best clustering option for K=4 ( Figure 2E), southern samples were further allocated to two distinct groups arranged along the north-south axis. In this case, evidence for admixture were substantial and appeared mostly asymmetric, from south to north. Finally, with K=5 ( Figure 2F) samples drawn from the alpine arc (19)(20)(21) were assigned to a distinct cluster. Among them, sample19 (i.e. the one in closer geographic contiguity to the remaining samples), was the only one showing evidence of mixed ancestry.
The sample-based Bayesian clustering method implemented in BAPS, yielded results fully consistent with those obtained with TESS, for all values of K between 2 and 5 ( Figure 2C-2F).
Nevertheless, it suggested K=7 as the best clustering option. On the other hand, both with K=6 and K=7 single populations were assigned to distinct clusters. Identical results were obtained running this analysis either with or without using geographic location of samples as prior information.
Estimates of genetic diversity were computed for sample pools consistent with K=5, that is, the highest level of population structure identified by both TESS and BAPS analyses. As shown in Table 2,the lowest level of diversity was observed at all parameters for the southernmost group (pooling samples [1][2][3][4][5], whereas the highest values were observed for the north-western group (pooling samples [16][17][18]. The final mtDNA sequence alignment comprised 1220 bp for all the 158 S. salamandra individuals analysed. The cytochrome B gene fragment analysed (hereon CYTB) was 582 bp in length, with 23 variable positions of which 21 were parsimony informative and the cytochrome oxidase subunit I gene fragment (hereon COI) was 638 bpin length, and showed29 variable positions, of which 24 were parsimony informative. Twenty-five unique haplotypes were identified in the concatenated dataset, whose distribution among the sampled populations is shown in Table 1.
The mtDNA haplotype genealogy estimated using HaplotypeViewer is presented in Figure   2A. Two main groups of closely related haplotypes were observed, separated from one another by 25 mutational steps. One haplogroup(green in Figure 2A; haplotype series S)was geographically restricted to the south of the Italian peninsula (samples 1-8), whereas the other haplogroup (red in Figure 2A; haplotype series N) was widespread from the alpine arc to samples in south-central Italy (samples [8][9][10][11][12][13][14][15][16][17][18][19][20][21]. Syntopy between both haplogroups was only found at the geographically intermediate sample 8. Within the southern haplogroup, to sub-groups of haplotypes were found, one restricted to samples from the Calabria region, and the other restricted to more northern samples (dark and light green in Fig. 2A, respectively). Syntopy between both sub-groups was not observed.

Discussion
Our analyses of mitochondrial and nuclear genetic variation concordantly, and not unexpectedly (see Introduction), show that two main lineages of S. salamandra occur along the Italian peninsula.
However, this is the only line of concordance observed among the patterns of variation of both marker classes. First, populations marking the geographic transition between groups have been observed both with mitochondrial and with nuclear markers (samples 8 and 16), but the respective geographic locations were more than 600 km apart (see Fig. 2A and 2C). Secondly, mitonuclear discordances also emerged at shallower levels of population structure. The highest number of discernible geographic sub-groups, with nuclear and mitochondrial markers respectively, was five and three ( Fig. 2A and 2F). Thirdly, an imprint of asymmetric gene flow (from south to north) was observed between two sub-groups of the southern lineage at the nuclear dataset (green and purple sub-groups in Fig. 2E and 2F), whereas no imprints of admixture was observed between them at the mitochondrial dataset (light green and dark green in Fig. 2A).
Mitonuclear discordances are not uncommon in amphibians (e.g. 4,[27][28][29][30] ), but the extent of discordance observed in the present study is conspicuous. The geographic displacement between mitochondrial and nuclear contact zones among the two main lineages, largely exceeds the extent of discordances previously noted among fire salamander lineages from different geographic regions [21][22][23] , and has no parallels among different taxa from peninsular Italy. Interestingly, both mtDNA and nuclear contact zones are located within well known suture zones (sensu 31 ), where interspecific and intraspecific hybrid zones and range edges have been previously reported for various taxa, including amphibians (e.g. 4,[32][33][34][35]. Therefore, lines of cross-taxon concordance cannot be used here as inferential clues in the attempt to reconcile such discordance into a population history. Even a comparison of genetic and morphological patterns of variation appears uselessin this regard. In fact, the closer concordance of morphological variation with allozyme variation, rather than with mtDNA, might be accounted for either by a primary location of the contact zone in the northwestern Apennines, or by a northward migration of the contact zone, with displacement of northern traits by the advancing southern lineage. The close affinity among mtDNA sequences of individuals sampled in northern Balkans, northern Europe, southern Alps, and north-western Apennines (this study and 24 ), suggests that the northern lineage (namely S. s. salamandra) colonized the Apennines as the last step of a large-scale range expansion out of the Balkans into Western Europe 24 . Here, it underwent a secondary contact with the southern, peninsular endemic lineage (namely S. s. gigliolii), most likely during the postglacial epoch 24 . But where did the two lineages first meet and mate? And what processes might best explain the observed mito-nuclear discordance? At least two conflicting scenarios could be delineated in this respect. The first that the secondary contact between both lineages first occurred in the north-western Apennines, as depicted by nuclear data (Fig. 2C), and that this event was followed by rampant introgression of the S. s. salamandra mitochondrial lineage into the S. s. gigliolii populations in northern and central Apennines. The second that the secondary contact occurred in south-central Apennines, as depicted by the mtDNA transition, and that the hybrid zone between nuclear genomes has moved northward to its current location, not followed by a movement of the mtDNA contact zone. Both rampant mitochondrial introgression and hybrid zone movement have been repeatedly invoked to explain instances of mitonuclear discordances 7 , and several underlying mechanisms have been proposed 7,8,10,36,37 , but in the absence of replicated temporal samples or of replicated contact zones at disparate sites, disentangling these competing scenarios is a challenging task, since most of the genetic patterns of variation expected under one scenario do not entirely exclude the other 38 . In the present case, however, two main lines of evidence lead us to favour rampant mtDNA introgression over a hybrid zone movement from south to north.
A first useful indication in this regard comes from the spatial genetic structure observed with allozymes. Indeed, in a moving hybrid zone scenario, a unidirectional clinal tail of admixture is expected, as a consequence of the movement ( 38 and references therein). Our scattered sampling and, more generally, the highly fragmented geographic distribution of the fire salamander through north- Under the scenario of rampant mtDNA introgression, then the main question to answer is, why did S. s. salamandra mtDNA introgressed so deeply into S. s. gigliolii populations? Providing a definitive answer goes behind the scope of this study, and would require specific experimental designs. However, it is worth noting that several of the most commonly invoked causes of mitonuclear discordance can already be dismissed in the present case, allowing us to substantially narrow future research efforts around a restricted set of putative causes. First, sex-biased processes, such as female-biased dispersal 40 or disassortative preferences in female mate choice 30 , are unlikely to have played a role. In fact, given the modest dispersal abilities of S. salamandra 41 , these processes would have had take active for many generations in order to allow the uniparentally inherited mtDNA wake to travel more than 600 km to the south, without leaving imprints at the nuclear genome. Moreover, although extensive surveys are still lacking, the available evidence does not support female biased dispersal in S. salamandra 41 . Second, purely stochastic processes related to demographic disparities between interactive lineages upon secondary contact 8,36 , are unlikely to have been implicated either. Literature surveys and simulation-based works 7, 8, 36 strongly suggest that differential introgression triggered by these processes is more likely to occur at the uniparentally inherited markers, and from the resident toward the expanding lineage. In the present case, this pattern would imply massive introgression of mtDNA from east to west of the contact zone, that is, in the opposite direction actually observed. In this regard, it is also worth mentioning that our analyses cast doubts on the hypothesized long-distance range expansion of S. s. salamandra, from the Balkan into the Apennine chain. Indeed, at the shallower level of the hierarchical population structure, clustering analyses suggest the occurrence of a differentiated cluster in north-western Italy (Fig. 2F), whereas haplotype sharing between populations sampled in north-west (samples [17][18][19] and the Alps (samples 20-21) was not observed (see Table 1). In turn, this indication of a shallow genetic structure within the S. s. salamandra lineage in western Italy suggests that its eastward range expansion might have begun close to the most north-eastern cluster of S. s. gigliolii, as already observed in the alpine newt and several other taxa within the same area (see 35 and references therein). In turn, this scenario would further weaken the plausibility of a major demographic disparity between lineages at the time of formation of the secondary contact zone, since (loosely speaking) the secondary contact would have occurred between two "residents", rather than between one resident and one expanding lineage. Consequently, an adaptive introgression of the S. s. salamandra mtDNA into S. s. gigliolii appears the most plausible explanation for the observed mitonuclear discordance. Of course, given the many organismal functions in which the mitochondrial genome is implicated [42][43][44][45] , and the unsuitability of our data in this respect, we refrain from even hypothesizing which adaptive advantage promoted the massive mtDNA introgression. In light of the frequently observed mitonuclear discordances in S. salamandra, at various levels of population structure 21-23 , we see this issue as a major research question, amenable to future experimental efforts.
Finally, in addition to the major mitonuclear discordance discussed above, another instance of discordant patterns of gene exchange, emerged also at the shallower level of population structure represented by the two sub-gropus identified within the southern lineage (Fig. 2). In this case, the structure of the discordance does not match the major one discussed above, as mtDNA and allozyme transition zones overlap (see Fig. 2A and 2E). However, while no evidence of admixture was apparent at the mtDNA, substantial and asymmetrical introgression (from south to north) was observed with allozymes. This geographic pattern of variation closely mirror those observed within several contact zones between differentiated lineages of fire salamander in the Iberian peninsula 21,23 . Nuclear mediated gene-flow, driven by male-biased dispersal across the contact zone, or by the competitive advantage of males of one lineage over the other, might comfortably explain these geographic patterns (see also 30 ). However, as already mentioned above, sex-biased dispersal has not been observed in S. salamandra populations 41 and, as pointed out by Pereira, Martínez Solano & Buckley 23 , we still lack fundamental knowledge of the mechanisms underlying the observed (and recurrent) patterns. Besides going deeper into the genomic patterns of geographic variation, future studies aimed at shedding light on these mechanisms, will have to gather and conjugate information across the secondary contact zones about variation of life-history traits and the multivariate phenotype, hopefully taking advantage of a comparative trans-geographic experimental approach.

Sampling
We collected 188 individuals of S. salamandra from 21 localities, ranging from the southern slope of the Alps to the southern tip of the Italian peninsula ( Figure 1 and Table 1

Laboratory procedures
Standard horizontal starch gel (10%) electrophoresis was used to investigate the genetic variation of 23 putative allozyme loci. Enzyme systems analyzed, putative loci, and buffer systems used in electrophoretic procedures are listed in Table 3. Alleles were called by their mobility (in mm) with respect to the to the most common (100) in a reference population (Campone).
Genomic DNA was extracted using proteinase K digestion followed by a standard phenolchloroform protocol 46  (http://www.macrogen.com). All sequences were deposited in GenBank (accession numbers: XXXX-XXXX).

Allozymes data analysis
Allele frequencies and genetic diversity estimates, including allelic richness, observed heterozygosity, and unbiased expected heterozygosity 50 , were estimated using the programs The Bayesian clustering analysis with BAPS was carried out using population samples as the units of the analysis. With this method we run two sets of analyses, with and without using geographic coordinates as prior information to infer the best number of clusters (K). In both cases, we explored values of maximal K between 2 and 10, carrying out 5 replicates for each maximal K to check for consistency among runs.
When interpreting results from both TESS and BAPS analyses, we followed recommendations by Meirmans 57 , that is, we presented and discussed results of each clustering option deserving biological interpretation, irrespective of the K-value that is deemed 'optimal' according to the summary statistics.

Mitochondrial DNA data analysis
Sequence electropherograms were controlled by eye using the software FinchTv1 The phylogenetic relationships between haplotypes were inferred by means of the maximum likelihood algorithm as implemented in PhyML 3.10 61 , using default settings for all parameters but the substitution model (HKY) and the type of tree improvement (SPR and NNI). The estimated tree was then converted into an haplotype network using the software HaplotypeViewer 62 .
The overall pattern of genetic variation observed at level of the entire dataset, suggested that the mtDNA variation might be not entirely reflective of the population structure and history of S. salamandra in Italy (see Results). Consequently, we refrained from using mtDNA variation to estimating historical demographic processes, phylogeographic dynamics, and their time line.