Introduction

In principle, concordance between distinct phenotypic traits and/or genetic markers is a plausible expectation when analysing geographic patterns of biological diversity, above and below the species level. Derived from the shared history of individual characters of organisms, this expectation forms the basis for the extensive use of single markers (notably mitochondrial DNA) to draw inferences about the evolutionary history of populations and species1, as well as to assess their identity and geographic distribution using barcoding approaches2,3. In practice, however, discordant patterns of variation among characters and among markers are not uncommon (e.g.4,5,6 among others). Several processes have been involved with the formation of such discordant patterns of variation, including various forms of selection, adaptive introgression, demographic disparities, hybrid zone movements, and sex-biased processes7,8,9,10. Consequently, discordance per se is emerging as a tremendous source of insights into the evolutionary process.

The drawbacks of using mitochondrial DNA (mtDNA) as the sole marker of genetic variation among and within populations are now widely acknowledged11,12,13, to the extent that a comparative approach using both mitochondrial and nuclear genetic data is becoming customary in phylogeographic investigations14. At the same time, however, after more than three decades when most phylogeographic studies adopted a single-marker approach (since15), there is now a huge amount of mtDNA data which needs to be complemented, in order to reach reliable conclusions about the evolutionary history and current genetic structure of populations. Furthermore, integrating mtDNA data with those drawn from the nuclear genome also has major practical implications. Indeed, mtDNA has been widely employed to revise classical taxonomy16, and to define conservation strategies for species and populations threatened by human impacts on the natural environment17. While in most cases such integration will hopefully lead to only slight revisions of previous conclusions based on mtDNA data, there are compelling examples of the far reaching implications of a more integrative experimental approach to the study of population structure and history7.

The fire salamander Salamandra salamandra (Linnaeus, 1758) is a temperate amphibian largely distributed in Europe18. Owing to its remarkable phenotypic variation in traits encompassing external morphology, colour patterns, and reproductive strategy, the geographic variation and population structure of S. salamandra have been investigated in several portions of its range, using various combinations of phenotypic and genetic traits19,20. As a consequence, the taxonomy of fire salamanders has been long discussed and repeatedly revised. Once regarded as a single highly polytypic species, the fire salamander group is now recognized as made up of four distinct species, with more than 10 sub-species within S. salamandra alone19. At the same time, conflicting patterns of variation between traits have been frequently observed, and their analysis has formed the basis for intriguing insights into processes of reticulate evolution, range-change dynamics, and life-history traits evolution21,22,23. Consequently, the fire salamander is emerging as a compelling system for studying the contribution of multiple processes to the formation of intraspecific patterns of biological diversity20.

In this paper we investigate the population genetic structure of the fire salamander S. salamandra in peninsular Italy, a geographic area where instances of discordance between morphological traits and preliminary mtDNA data were previously observed24, without being reconciled into a population history, and whose underlying causes remain unclear. Based mostly on differences in colour patterns and body shape, fire salamander populations occurring along the Italian peninsula have long been attributed to the endemic subspecies S. s. giglioli, whereas populations from the pre-alpine and alpine areas have been attributed to the nominal subspecies S. s. salamandra19,25,26, with some intergradation between the two subspecies through the Liguria region (see Fig. 1). On the other hand, a phylogeographic study of the mtDNA variation throughout the species range24 found mtDNA haplotypes typical of the northern subspecies S. s. salamandra occurring in south-central Italy, that is, several hundred kilometres to the south of the area of intergradation between the two subspecies, as outlined by phenotypic trait variation.

Figure 1
figure 1

Geographic location of the 21 sampled populations of S. salamandra, and its approximate species’ distribution in Italy (light green; following66). Maps and diagrams were drawn using the software Canvas 11 (ACD Systems of America, Inc.). Photo: D. Canestrelli.

Here, using markers of both nuclear and mitochondrial genetic variation, in combination with phylogeographic and Bayesian population structure analytical tools, we aim to (1) assess the population genetic structure of the fire salamander throughout the Italian peninsula, (2) get a comprehensive understanding of the pattern of discordance observed between phenotypic traits and mtDNA data, and (3) shed light on its possible causes in these salamanders.

Results

Nuclear genetic variation was investigated at level of 23 allozyme loci (Table 1). Five loci (Mdhp-2, Mpi, Sod-1, Aat-2, and Ldh) were found monomorphic for the same allele in all the samples studied, while further three loci (Mdh-1, Mdh-2, Icdh-2) were found polymorphic at level of a single private allele, observed at low frequency (≤0.05). Allele frequencies observed at the remaining 15 loci are shown in Table 2. No statistically significant deviations were detected (at the 5% nominal level) from the expected Hardy-Weinberg (HW) equilibrium at each locus within populations, and from the expected genotypic linkage equilibria between pairs of loci within populations.

Table 1 Geographic location and sample size (n) of the 21 populations of Salamandra salamandra investigated in this study.
Table 2 Allele frequencies of the 15 allozyme loci found polymorphic among the 21 sampled populations of Salamandra salamandra.

The spatially explicit Bayesian clustering analyses conducted with TESS based on allozyme variation, indicated K = 5 as the best grouping option for this dataset. Indeed, higher values of K yielded only minimal variation in values of the DIC statistic (Fig. 2B), and did not turn into biologically interpretable results (not shown).With K = 2 (Fig. 2C), individuals were grouped into a peninsular cluster (samples 1–14), and an alpine cluster (samples 18–21), thus fully matching the geographic distribution of the Italian endemic subspecies S. s. gigliolii and the nominal subspecies S. s. salamandra, respectively. The geographically intermediate sample 16 was the only one showing substantial evidence of admixture between both groups. When the best clustering option for K = 3 was considered (Fig. 2D), peninsular samples were further subdivided into two groups with negligible evidence for admixture (see sample 9 in Fig. 2D). One group clustered southern samples (1–10), while the other clustered northern peninsular samples (12–14). With the best clustering option for K = 4 (Fig. 2E), southern samples were further allocated to two distinct groups arranged along the north-south axis. In this case, evidence for admixture were substantial and appeared mostly asymmetric, from south to north. Finally, with K = 5 (Fig. 2F) samples drawn from the alpine arc (19–21) were assigned to a distinct cluster. Among them, sample 19 (i.e. the one in closer geographic contiguity to the remaining samples), was the only one showing evidence of mixed ancestry.

Figure 2
figure 2

Population genetic structure of S. salamandra in Italy. (A) mtDNA haplotype genealogy generated using HaplotypeViewer, based on the ML phylogenetic tree, and geographic distribution of the main haplotype groups. Circle sizes are proportional to haplotype frequency (see inset, lower right), whereas missing intermediate haplotypes are shown as dots. Population samples are coloured as pie diagrams, showing the geographic distribution of the main haplogroups. (B) Mean values of the DIC statistic (averaged over 10 runs) estimated for models with K ranging from 2 to 9. (C–F) Results of the Bayesian clustering analyses carried out with TESS and BAPS for values of K between 2 and 5. Bar-plots show individual admixture proportions for the genetic clusters inferred using TESS. Populations assigned to the same cluster by BAPS are marked by distinct colours on the maps. Maps and diagrams were drawn using the software Canvas 11 (ACD Systems of America, Inc.).

The sample-based Bayesian clustering method implemented in BAPS, yielded results fully consistent with those obtained with TESS, for all values of K between 2 and 5 (Fig. 2C–F). Nevertheless, it suggested K = 7 as the best clustering option. On the other hand, both with K = 6 and K = 7 single populations were assigned to distinct clusters. Identical results were obtained running this analysis either with or without using geographic location of samples as prior information.

Estimates of genetic diversity were computed for sample pools consistent with K = 5, that is, the highest level of population structure identified by both TESS and BAPS analyses. As shown in Table 2, the lowest level of diversity was observed at all parameters for the southernmost group (pooling samples 1–5), whereas the highest values were observed for the north-western group (pooling samples 16–18).

The final mtDNA sequence alignment comprised 1220 bp for all the 158 S. salamandra individuals analysed. The cytochrome B gene fragment analysed (hereon CYTB) was 582 bp in length, with 23 variable positions of which 21 were parsimony informative and the cytochrome oxidase subunit I gene fragment (hereon COI) was 638 bp in length, and showed 29 variable positions, of which 24 were parsimony informative. Twenty-five unique haplotypes were identified in the concatenated dataset, whose distribution among the sampled populations is shown in Table 1.

The mtDNA haplotype genealogy estimated using HaplotypeViewer is presented in Fig. 2A. Two main groups of closely related haplotypes were observed, separated from one another by 25 mutational steps. One haplogroup (green in Fig. 2A; haplotype series S) was geographically restricted to the south of the Italian peninsula (samples 1–8), whereas the other haplogroup (red in Fig. 2A; haplotype series N) was widespread from the alpine arc to samples in south-central Italy (samples 8–21). Syntopy between both haplogroups was only found at the geographically intermediate sample 8.

Finally, the two sub-groups found within the southern lineage at the level of the nuclear dataset (Fig. 2E), have also been observed with the mtDNA (dark green and light green sub-clades in Fig. 2A). However, syntopy between both sub-clades was not observed at the mitochondrial level.

Discussion

Our analyses of mitochondrial and nuclear genetic variation concordantly, and not unexpectedly (see Introduction), show that two main lineages of S. salamandra occur along the Italian peninsula. However, while populations marking the geographic transition between groups have been observed both with mitochondrial and with nuclear markers (samples 8 and 16), their respective geographic locations were more than 600 km apart (see Fig. 2A,C).

Mitonuclear discordances are not uncommon in amphibians (e.g.4,27,28,29,30,31,32), but the extent of discordance observed in the present study is conspicuous. The geographic displacement between mitochondrial and nuclear contact zones among the two main lineages, largely exceeds the extent of discordances previously noted among fire salamander lineages from different geographic regions21,22,23, and has no parallels among different taxa from peninsular Italy. Interestingly, both mtDNA and nuclear contact zones are located within well-known suture zones (sensu33), where interspecific and intraspecific hybrid zones and range edges have been previously reported for various taxa, including amphibians (e.g.4,34,35,36,37). Therefore, lines of cross-taxon concordance cannot be used here as inferential clues in the attempt to reconcile such discordance into a population history.

The close affinity among mtDNA sequences of individuals sampled in northern Balkans, northern Europe, southern Alps, and north-western Apennines (this study and24), suggests that the northern lineage (namely S. s. salamandra) colonized the Apennines as the last step of a large-scale range expansion out of the Balkans into Western Europe24. Here, it underwent a secondary contact with the southern, peninsular endemic lineage (namely S. s. gigliolii), most likely during the post-glacial epoch24. But where did the two lineages first meet and mate? And what processes might best explain the observed mito-nuclear discordance? At least two conflicting scenarios could be delineated in this respect. The first that the secondary contact between both lineages first occurred in the north-western Apennines, as depicted by nuclear data (Fig. 2C), and that this event was followed by southward introgression of the S. s. salamandra mitochondrial lineage into the S. s. gigliolii populations in northern and central Apennines. The second that the secondary contact occurred in south-central Apennines, as depicted by the mtDNA transition, and that the hybrid zone between nuclear genomes has moved northward to its current location, not followed by a movement of the mtDNA contact zone. Both rampant mitochondrial introgression triggered by positive selection and hybrid zone movement have been repeatedly invoked to explain instances of mitonuclear discordances7, and several underlying mechanisms have been proposed7,8,10,38,39. However, in the absence of replicated temporal samples or of replicated contact zones at disparate sites, disentangling these competing scenarios is a challenging task, since most of the genetic patterns of variation expected under one scenario do not entirely exclude the other40. In the present case, however, two main lines of evidence lead us to tentatively favour southward mtDNA introgression over northward hybrid zone movement.

A first useful indication in this regard comes from the spatial genetic structure observed with allozymes. Indeed, in a moving hybrid zone scenario, a unidirectional clinal tail of admixture is expected, as a consequence of the movement40,41 (and references therein). For example, in a similar case concerning a Triturus newts’ hybrid zone with extensive mito-nuclear discordance, Wielstra and colleagues32 found extensive nuclear clines - and introgression of multiple distinct and geographically structured mtDNA clades - consistent with the genomic footprint of a hybrid zone movement. The highly fragmented geographic distribution of the fire salamander through north-central and central Apennines, prevented us from carrying out a formal and straightforward cline analysis. Nonetheless, the strongly structured pattern of genetic variation that emerged with the Bayesian clustering analyses of nuclear variation, does not conform with this expectation. In fact, within the range of the southern lineage, it showed the occurrence of genetic sub-clusters along the north-south axis (Fig. 2D,E), and a lack of clines in admixture coefficients within the putative area of hybrid zone movement, at all the hierarchical levels of the analysis.

A second indication comes from the Upper Pleistocene fossil record. A scenario of post-glacial colonization of northern and central Apennines by S. s. salamandra, followed by the establishment of a hybrid zone in south-central Apennine and its subsequent displacement northward to its current location, would imply the absence of fire salamanders in the northern Apennines until the post-glacial arrival of S. s. salamandra. However, this scenario is in contrast with a fire salamander fossil record found in north-western Apennines (Grotta di Equi42), close to our sampling site 14, and dated back to the last glacial epoch (around 45 000 years bp). The occurrence of this fossil record is instead consistent with a scenario of a secondary contact zone primarily located in the north-western Apennines, and a mtDNA introgression southward. Under this scenario, the genetic cluster emerging at K = 3, grouping samples 12–14, would reflect the presence of S. s. gigliolii in this area during the last glaciation, and the occurrence of a glacial sub-refugium within the region, as already suggested for several temperate species (see37 and references therein). Alternatively, the fossil record might suggest a glacial refugium for the northern lineage in the northern Apennines. As glacial conditions alleviated the northern lineage could have moved south to meet the southern lineage there. Subsequently, the southern lineage may have expanded to the north, at the expense of the northern one, with hybridization, resulting in mtDNA introgression. However, a plausible expectation under this scenario would be some level of mtDNA sequence divergence between populations S. s. salamandra in Italy, achieved during the prolonged isolation in distinct refugia (i.e. Balkan and north-western Apennines). Instead, such an instance of divergence was not observed.

Under a scenario of S. s. salamandra mtDNA moving southward, then the main question to answer is, why did S. s. salamandra mtDNA introgressed so deeply into S. s. gigliolii populations? Providing a definitive answer goes behind the scope of this study, and would require specific experimental designs. However, it is worth noting that several of the commonly invoked causes of mito-nuclear discordance could already be dismissed in the present case. First, sex-biased processes, such as female-biased dispersal43 or disassortative preferences in female mate choice30, are unlikely to have played a role. In fact, although extensive surveys are still lacking, the available evidence does not support such female-biased processes in S. salamandra43. Second, purely stochastic processes related to demographic disparities between interacting lineages upon secondary contact8,38, are unlikely to have been implicated either. Literature surveys and simulation-based works7,8,38 strongly suggest that differential introgression triggered by these processes is more likely to occur at the uniparentally inherited markers, and from the resident toward the expanding lineage. In the present case, this pattern would imply massive introgression of mtDNA from east to west of the contact zone (i.e. from S. s. gigliolii into S. s. salamandra), that is, in the opposite direction actually observed.

Noteworthy, recent evidence44 suggests that, under a wide range of conditions, sex-biased and purely demographic processes are unlikely to yield strong mito-nuclear discordances. In contrast, adaptive mitochondrial introgression can easily generate such discordances44, particularly in low-dispersal organisms, as is the case of S. salamandra. Of course, given the many organismal functions in which the mitochondrial genome is implicated45,46,47,48, and the unsuitability of our data in this respect, we refrain from even hypothesizing which adaptive advantage might have promoted a massive mtDNA introgression. In light of the frequently observed mito-nuclear discordances in S. salamandra, at various levels of population structure21,22,23, we see this issue as a major research question, amenable to future experimental efforts.

Our results clearly show the major role played by reticulate evolution in shaping the structure of Salamandra salamandra populations and, together with similar findings in other regions of the species’ range, contribute to identify the fire salamander as a particularly intriguing case to investigate the complexity of mechanisms triggering patterns of mitonuclear discordance in animals.

Methods

Sampling

We collected 188 individuals of S. salamandra from 21 localities, ranging from the southern slope of the Alps to the southern tip of the Italian peninsula (Fig. 1 and Table 1). Each individual was anaesthetized in the field by submersion in a 0.02% solution of MS222 (3-aminobenzoic acid ethyl ester), and tissue samples were taken through a toe-clipping procedure, then each individual was released in its sampling point. Collected tissues were brought to laboratory in liquid nitrogen, and then stored at −80 °C. Field works and collection of tissues were approved by the Italian Ministry of Environment (permit numbers: DPN-2009–0026530) and were performed in accordance with the relevant guidelines and regulations.

Laboratory procedures

Standard horizontal starch gel (10%) electrophoresis was used to investigate the genetic variation of 23 putative allozyme loci. Enzyme systems analyzed, putative loci, and buffer systems used in electrophoretic procedures are listed in Table 3. Alleles were called by their mobility (in mm) with respect to the most common (100) in a reference population (Campone).

Table 3 Enzymes systems analysed in S. salamandra (EC: Enzyme Commission number), encoding loci, and buffer systems used for the allozyme electrophoresis procedure.

Genomic DNA was extracted using proteinase K digestion followed by a standard phenol-chloroform protocol49. Polymerase chain reactions (PCR) were carried out to amplify portions of the two mitochondrial gene fragments CYTB and COI. After preliminary PCR amplifications using generic primers drawn from the literature50,51, more specific primer pairs were designed and used to amplify and sequence all the individuals analysed. Primers designed and used to amplify the CYTB gene fragment were 494-salamod-CATCAACATCTCCTACTGATGAAA and CYTB-salamod-GGAGTAAGCAGTGAGATTACC, whereas VF1d-TTCTCAACCAACCACAARGAYATYGG and VR1d-TAGACTTCTGGGTGGCCRAARAAYCA52 were used to screen variation at the COI gene fragment. Amplifications were carried out in a final volume of 15 μl containing: MgCl2 (2 mM), four dNTPs (0.2 mM each), primers (0.2 μM each), the enzyme Taq polymerase (0.5 U, Promega), reaction buffer (1×, Promega) and 2 μl of 20 ng/μl DNA template. PCR cycling conditions were the same for both the genes: a step at 94 °C for 5 min followed by 36 cycles of 45 sec at 94 °C, 1 min at 55 °C (CYTB) or 51 °C (COI), and 1.5 min at 72 °C, followed by a single final step at 72 °C lasting 10 min. The PCR products were purified and sequenced by Macrogen Inc. (http://www.macrogen.com). All sequences were deposited in GenBank (accession numbers: MH668103-MH668152).

Allozymes data analysis

Allele frequencies and genetic diversity estimates, including allelic richness, observed heterozygosity, and unbiased expected heterozygosity53, were estimated using the programs GENETIX 4.05.2 software54. The occurrence of Hardy-Weinberg and genotypic linkage equilibria were addressed for each locus and locus-pair, respectively, in each sample by means of exact tests as implemented in FSTAT 2.9.355.

We investigated the genetic structure of S. salamandra populations along peninsular Italy, using two Bayesian clustering methods: the individual-based and spatially explicit approach implemented by TESS 2.3.156, and the sample-based approach implemented in BAPS v 657.

The analysis with TESS was carried using the admixture model, the option to update the spatial interaction parameter activated, and all other settings left to the default options. A preliminary analysis was conducted to restrict the range of plausible K values (i.e. the number of clusters). Values of K between 1 and 21 were tested, with 10 replicates per K, each of 5000 iterations following 3000 iterations discarded as burn-in. The settings for the final run were: K between 2 and 10, 100 replicates per K, and a burn-in of 30 000 iterations followed by 50 000 iterations. For each value of K, the 10 replicates with the lowest Deviance Information Criterion (DIC) were permuted in the software CLUMPP 1.1.258, and the resulting clustering was visualized with DISTRUCT59.

The Bayesian clustering analysis with BAPS was carried out using population samples as the units of the analysis. With this method we run two sets of analyses, with and without using geographic coordinates as prior information to infer the best number of clusters (K). In both cases, we explored values of maximal K between 2 and 10, carrying out 5 replicates for each maximal K to check for consistency among runs.

When interpreting results from both TESS and BAPS analyses, we followed recommendations by Meirmans60, that is, we presented and discussed results of each clustering option deserving biological interpretation, irrespective of the K-value that is deemed ‘optimal’ according to the summary statistics.

Mitochondrial DNA data analysis

Sequence electropherograms were controlled by eye using the software FinchTv1.4.0 (Geospiza Inc.), and the sequence alignments were obtained using CLUSTALX 2.061. Sequences of the two gene fragments analysed were concatenated using the software CONCATENATOR 1.1.062. The best partitioning strategy for the concatenated mtDNA dataset was assessed by means of the software PartitionFinder v1.0.163, using the Bayesian information (BI) criterion and the ‘greedy’ search method. This analysis suggested that the HKY substitution model best fitted all the data partition considered (1st, 2nd, and 3rd position for either COI or CYTB fragments).

The phylogenetic relationships between haplotypes were inferred by means of the maximum likelihood algorithm as implemented in PhyML 3.1064, using default settings for all parameters but the substitution model (HKY) and the type of tree improvement (SPR and NNI). The estimated tree was then converted into an haplotype network using the software HaplotypeViewer65.

The overall pattern of genetic variation observed at level of the entire dataset, suggested that the mtDNA variation might be not entirely reflective of the population structure and history of S. salamandra in Italy (see Results). Consequently, we refrained from using mtDNA variation to estimating historical demographic processes, phylogeographic dynamics, and their time line.