Evolutionary history and postglacial colonization of an Asian pit viper (Gloydius halys caucasicus) into Transcaucasia revealed by phylogenetic and phylogeographic analyses

It has been generally acknowledged that glacial climates at the time of the Pleistocene altered the patterns of species distributions, prompting latitudinal and altitudinal distribution shifts in several species, including poikilothermic species commonly known for their thermal sensitivity. However, the historical phylogeographic patterns of such species have remained largely unknown. Here, we present the historical biogeographic, phylogenetic, and phylogeographic relationships of the Caucasian pit viper, G. h. caucasicus, based on two mtDNA (cyt b and ND4) and one nDNA (c-mos) genes. This pit viper represents the westernmost member of the Crotalinae subfamily in the Palearctic and occurs in a variety of habitats, from 30 m to 3,000 m above sea level. In Iran, it is distributed on the northern and southern slopes of the Alborz Mountains, rendering it a target for phylogenetic and phylogeographic studies of a terrestrial poikilothermic animal. Our study identified four Iranian lineages of G. h. caucasicus along the northeastern to northwestern slopes of the Alborz Mountains and southern Azerbaijan (Talysh Mountains). Diversification of the Iranian lineages highlights population expansion and subsequent isolation into four plausible refugial areas during the Quaternary paleo-climatic oscillations, confirmed by our molecular dating and historical biogeographic analyses. The results of coalescence-based simulations support the incursion of the species from northeastern Iran to the western end of the Alborz, and then toward Transcaucasia via two directions: northern and southern slopes of the Alborz Mountains. Furthermore, our results clearly implied that G. h. caucasicus should be elevated to species rank and further referred to as G. caucasicus (Nikolsky, 1916).

Bayesian species delimitation and genetic distances among the G. halys/G. intermedius complex species. Our BPP analysis with algorithm 1 and under three different scenarios of prior values revealed that the different prior values for θ and τ0 did not alter the posterior speciation probabilities. Based on the results of our first scenario, each of the nine delimited taxa corresponding to G. halys halys, G. h. caucasicus, G. caraganus, G. cognatus, G. stejnegeri, G. rickmersi, G. shedaoensis, G. changdaoensis, and G. intermedius were strongly supported with 95% Bayesian posterior probability (PP = 1), considering three models with small and large ancestral sizes, as well as deep and shallow divergences. The genetic distances among G. changdaoensis, G. intermedius, G. shedaoensis, G. cognatus, G. stejnegeri, and G. caraganus vary from 1.0%-5.4% (the blue cells in Table S3), whereas the average genetic distance between the four Iranian lineages of G. h. caucasicus and the other taxa in the complex varies from 3.7-5.6% (the grey cells in Table S3), which is comparable to that among valid species members of this complex. According to our second scenario, we did not obtain high Bayesian posterior probabilities for the four Iranian clades of G. h. caucasicus (0.93-95%) in our BPP analysis, under the three models with small and large ancestral sizes, as well as deep and shallow divergences. Moreover, the genetic distances among the Iranian lineages including the two clades of Central Alborz (LarNP-CA and CA), the eastern clade (KD-EA), and the western clade (WE-Az) vary from 2.4-2.9% (the green cells in Table S3); however, the distance drops to 0.15% between the two clades of Central Alborz (LarNP-CA and CA), in concordance with node support values of BI and ML gene trees.  (Fig. 2). Within G. h. caucasicus, in accordance with the phylogenetic tree, we found four significant clusters from northeastern Iran to western Alborz-Azerbaijan. These four clusters were separated from each other by 18-40 mutational steps. Most central haplotypes in LarNP-CA, CA, and WA-Az clusters were shared by 2-5 individuals, while within the KD-EA cluster, haplotypes mostly corresponded to single individuals. The WA-Az cluster included six closely related haplotypes comprising the most western known locality for the complex in the western Palearctic.
Maximum and minimum nucleotide diversity was attributed to CA and LarNP-CA lineages, respectively. Additionally, haplotype diversity ranged from 0.667 to 0.978 in WA-Az and KD-EA lineages, respectively (Table 1). spatial genetic structure of G. h. caucasicus lineages in northern Iran and Azerbaijan. Using the concatenated mtDNA + nDNA dataset, the BAPS results divided the samples into the four clusters concordant with the phylogenetic tree and haplotype network; (i) KD-EA including Hezar Masjid Mountains, Khorasan, and Golestan provinces, (ii) LarNP-CA comprising eastern and central parts of Mazandaran province as well as Lar National Park, (iii) CA containing western Mazandaran and eastern Gilan provinces, and (iv) WA-Az including western Gilan and Ardebil provinces, as well as the southern mountains of Azerbaijan. Additionally, AMOVA on our concatenated mtDNA dataset proved a high rate of variation (89.38%) among the four clusters, and the fixation index (FST) confirmed a significant genetic structure among the clusters (Table S1)  Historical biogeography at the palearctic scale. Based on the Akaike information criterion, the historical biogeography analysis using the concatenated mtDNA + nDNA dataset revealed that the DIVALIKE + J model holds the strongest support compared to other analyses performed in BioGeoBEARS (Table S2) and dispersal, extinction, and cladogenesis parameters were estimated at 0.000431, 0.022, and 1.00E-12, respectively. We have found some allopatric speciation and founder-event speciation in evolution of this complex in different time slices (Fig. 4). Furthermore, among the four different scenarios tested, the second one (Scenario HB in Table S2) is characterized by the highest AICc value, under the DIVALIKE + J model (Fig. 4). The best hypothesis suggests that the ancestor of the complex was widespread through regions A and G at 3.62 (2.30-5.32

Coalescent analyses and simulations of G. h. caucasicus.
Both ABC coalescence simulations (direct and the logistic approaches) using the concatenated mtDNA + nDNA dataset led to the conclusion that the Scenario EH 3 is the best supported scenario (Pp = 0.844) compared to the other hypotheses (Pp = 0.458 and 0.411). Furthermore, model choice validation using PODS indicates that adequate power exists for selecting the true hypothesis among competing hypotheses with an acceptable Type I error rate (35%) and low Type II error rate (16%). These results hint at the incursion of the species from northeastern Iran to the western end of the Alborz and then toward the southern mountains of Azerbaijan (Talysh Mountains) through two directions, via northern and southern slopes of the Alborz Mountains. Furthermore, our results may suggest that the observed regional phylogeographic pattern may likely reflect colonization into four allopatric refugia during the Pleistocene glaciation episodes. species Distribution Modelling. Our SDM based on the Biomod framework showed high average values of discrimination capacity (AUC = 0.93-0.97) and classification accuracy (TSS = 0.89-0.92) for the four modelling methods. The final ensemble model illustrates that the most suitable landscape for G. h. caucasicus is the southern and northern slopes of the Alborz Mountains (Fig. 5). The variables of proximity to forest cover, roughness, maximum temperature of the warmest month, annual precipitation, and temperature seasonality had the highest average contribution over the ten replications and four models, indicating that the suitable habitat for the species in northern Iran is characterized by both landscape-based (i.e. woody alpine habitats) and climatic variables. We assessed the amount of protection granted to G. h. caucasicus in northern Iran by overlaying the Biomod suitability map with areas occupied by the lineages and the newest map of protected areas network of Iran, including national parks, protected areas, and wildlife refuges (Fig. 5). We estimated the effectiveness of the current protected area network to be 29.7% for KD-EA, 39.1% for LarNP-CA, 57.6% for CA, and 24.8% for WA-Az.

Reassessment of the taxonomic status of the Caucasian pit viper (G. h. caucasicus).
Recently, a number of taxa formerly viewed as subspecies of G. halys have been raised to full species rank based on genetic evidence. Among these are G. h. caraganus, submitted for reconsideration as a species (G. caraganus) (Eichwald, 1831) by Wagner et al. 17 and two other subspecies (G. h. cognatus and G. h. stejnegeri), elevated to full species rank by Shi et al. 18,19 . Simonov et al. 40 proposed the elevation of all currently recognized subspecies, G. halys to species   18 ) indicate that the distances vary from 1.0-5.4% ( Fig. S1 and the blue cells in Table S3), which conforms to the results of Shi et al. 19 . They also estimated the genetic distance among G. stejnegeri, G. cognatus, and G. h. halys to range from 2.7-5.0% 19 . However, the average genetic distance between the four Iranian clades of G. h. caucasicus and the other species varies from 3.7-5.6% ( Fig. S1 and the grey cells in Table S3), which is even greater than the distance observed by Shi et al. 19 between G. cognatus and G. stejnegeri (2.7-4.84%). In addition, the genetic distance between G. changdaoensis, G. intermedius, G. shedaoensis, G. cognatus, G. stejnegeri, and G. caraganus varies from 1.0%-5.4% (the blue cells in Table S3), while the average genetic distance among the four Iranian clades of G. h. caucasicus and the other taxa in the complex varies from 4.2-5.2% (the grey cells in Table S3). This is also noticeable in the haplotype network (Fig. 2). The results suggest that G. h. caucasicus could be considered as a distinct species. This is reconfirmed by our Bayesian multi-locus species delimitation approach (under different scenarios), which strongly supported each of the nine previously delimited taxa in the complex with maximum Bayesian posterior probability (P = 1).
However, the pairwise genetic distance among the Iranian lineages of G. h. caucasicus (clades KD-EA, WE-Az, LarNP-CA, and CA) drops to 1.5-2.9%, (Fig. S1 and the green cells in Table S3), as Bayesian posterior probabilities of our second scenario dropped to 0.93%-95% under all different priors.
Furthermore, Khani et al. 29 separated three populations of G. caucasicus in the Alborz Mountains based on seven metric and 21 meristic traits. Their study showed that populations of eastern Alborz, Lar National Park and western Alborz, corresponding respectively to KD-EA, LarNP-CA, and WA-Az in our study, are significantly different with respect to morphological traits. This means that they failed to differentiate between LarNP-CA and CA, apparently due to the lack of samples from Central Alborz (including western Mazandaran and eastern Gilan provinces).
However, the results of a study conducted by Malek-Mohammadi et al. 25 based on a 775 bp D-Loop dataset did not find any significant distinction among populations of G. caucasicus in the Alborz Mountains 25 . Another study on the phylogenetic relationships of G. caucasicus in Iran using 629 bp of the cyt b gene from 16 individuals in north-east of Iran (Khorasan province) and Central Alborz (Lar National Park and Gachsar area) concludes that the species belongs to the G. halys/G. intermedius complex, within which controversial phylogenetic relationships still remain 26 .
They also suggest that all samples of G. caucasicus in northern Iran belong to a single population. However, the distinction between the G. caucasicus clade and other closely related clades (G. intermedius, "G. saxatilis", and G. shedaoensis) have low support (Pp = 0.36 and bootstrap values = 59% 26 ). The genetic distance between G. caucasicus and the species previously mentioned was much lower in comparison to our findings (0.6%-0.9% versus 4.2%-5.2%). This discrepancy may result from taxonomic misidentification for the sequences that Rastegar-Pouyani et al. 26 obtained from GenBank. As a result of the high taxonomic uncertainty within this group, along with difficulties with morphological identification of the taxa, many GenBank entries for this complex have erroneous taxonomic names. This mislabeling of species names may result in incorrect estimation of genetic distances between taxa.
Thus, in the light of molecular and morphological evidences, we argue that G. h. caucasicus in northern Iran should be elevated to species rank and further referred to as G. caucasicus (Nikolsky, 1916). We therefore refer to G. caucasicus as such and use this labeling throughout subsequent text.
Allopatric divergence and dispersal during the pleistocene oscillations. Diversification of the Gloydius genus. In view of our biogeographical results, the DIVALIKE + J model was regarded as the most probable pattern of dispersal, vicariance, and extinction for the Gloydius genus in the Palaearctic (Table S2, Scenario HB 2). We found no evidence of ancestral range switching throughout the distribution range of the complex. Accordingly, this model appeared as the best biogeographical model of ancestral expansion for the complex [41][42][43] . During the Pleistocene, historical expansion of Gloydius populations was in alignment with dispersal patterns of other species occupying similar geographical distribution ranges. They all dispersed in a similar manner from the eastern Palearctic toward central regions and lower latitudes, a movement which led to the establishment of new populations in a discrete refugium 44,45 . Moreover, the absence or minor levels of gene flow among eastern (G. h. halys, G. stejnegeri, and G. cognatus) and central (G. caucasicus, G. rickmersi, and G. caraganus) Palearctic species around 1.0-2.8 Mya was proposed as the best biogeographical scenario based on AIC and AICc weight (Table S2), which could be indicative of vicariance due to geographical isolation of populations in multiple and isolated refugia. During the Last Glacial Maximum (LGM), glaciers advanced upon lands toward lower latitudes and covered more than 30% of the earth's surface, creating a major impact on the overall climate of the planet, including Iran's 46 . Some geomorphological evidence suggests that Iran has substantially and prolongedly altered during the Quaternary period 47 . According to Ehlers 48 , the climate of Iran has experienced a severe reduction in temperature, and consequently, a moderate increase in precipitation in montane habitats during the early Würm (approximately 100 kya). Kehl 49 notes that the Quaternary climate in northern and western Iran remained dry and cold during glacial periods, while it was warm and wet during interglacial periods. The effects of glacial periods on mountains of Iran (including Azerbaijan, Kurdistan, Alam-Kouh, and Damavand) and adjacent     Fig. 3), while the divergence among populations of G. caucasicus in northern Iran comprised three phases of divergence during the Pleistocene, which highlights the effects of climate oscillations on isolation of populations. During the early Pleistocene (2.5-1.84 Myr), Earth's climate was cooler and drier compared to the mid-Pliocene 58 . As a result of the reduction in Earth's temperature, suitable habitats were confined to more southerly latitudes (40-50 degrees of latitude) and lower elevations 59 . Given that G. rickmersi and G. caraganus are sister clades to G. caucasicus (Fig. 1), it could be inferred that as a consequence of the cooling of high latitudes, one could expect gene flow to be driven toward more southerly latitudes such as Afghanistan and northeastern Iran (Fig. 4). The first divergence due to vicariance in populations of G. caucasicus in northern Iran is estimated to have occurred around 1. It could be concluded that populations of G. caucasicus in northern Iran have undergone multiple expansions and contractions during glacial and interglacial periods. Nevertheless, no shared haplotypes were observed between the lineages. This can be explained upon the assumption that the gene flow from northeastern to central and western Alborz was unidirectional, meaning that no genes were transferred back to ancestral populations. It could be presumed that the warming of the earth from the early Holocene to the present might have contributed to population fragmentation in mountains and led to a breakdown of gene flow between the different regions.
Finally, our results could underline the existence of multiple glacial refugia in the Alborz during climatic oscillations of Pleistocene. This complies with the results of previous studies [60][61][62][63][64][65][66] , which suggested the Hyrcanian forests as an isolated refugium during Quaternary oscillations. Isolation of organisms eventually gave rise to vicariance in a middle-sized geographical area in the central Palearctic. Such multiple glacial refugia contributed to the interruption or reduction of gene flow, and consequently increased genetic drift, resulting in formation of endemic haplotypes and new subspecies 67 , which, following the last glacial period when the climate became more favorable, expanded toward suitable habitats and shaped the current patterns of distribution.

Conservation Units and Management propositions. Conservationists have long been in a quandary
regarding the continuing controversy over the definition of conservation units at the species or subspecies level, stressing the need for its reconsideration 39,68,69 . Evolutionary significant units (ESUs) generally refer to taxa that merit independent conservation management because they have evolved separately 70,71 .
Among the many definitions assigned to the concept of ESUs, we chose the one adopted by Fraser and Bernatchez 39 , as its overall focus remains on isolated populations. They state that lineages with particularly restricted patterns and levels of intraspecific gene flow are to be considered as ESUs for conservation. Therefore, we proposed the four isolated lineages of G. caucasicus identified in northern Iran as four ESUs including (i) KD-EA ESU comprising the Hezar Masjid Mountains, Khorasan, and Golestan provinces, (ii) LarNP-CA ESU including Lar area as well as Mazandaran province, (iii) CA ESU including northern and western slopes of the Alborz Mountains in western Mazandaran and eastern Gilan provinces, and (iv) WA-Az ESU comprising western Gilan and Ardebil provinces.
The International Union for Conservation of Nature (IUCN) has classified G. monticola (Likiang Pit Viper) from China as Data Deficient (DD), "G. saxatilis" (Rock Mamushi) from China, Korea, and Russia as Least Concern (LC), and G. shedaoensis (Shedao Island Pit Viper) from China as Vulnerable (VU). However, the conservation statuses of the other species in the two complex groups of G. blomhoffi and G. halys/G. intermedius (including G. caucasicus) have received little consideration. Although our results confirmed that about 37.8% of suitable G. caucasicus habitats are located within the network of protected areas in northern Iran, this species is currently threatened by various factors such as agricultural development, overgrazing of livestock, destruction of rangelands, hunting and killing by local people and/or tourists, mortality due to vehicle collisions on roads, restricted movement of individuals among population patches, and large-scale hunting and capturing for vaccine and serum production 26 (about 1000 capturing licenses are certified annually). Unfortunately, hunting/capturing of this species is mostly done when snakes emerge from hibernation and have not yet had the chance to reproduce. In general, populations of this species have experienced dramatic declines over the past decades, to the extent that snake catchers complain that it is now hardly possible to locate well-populated sites for this species.
However, G. caucasicus is not listed as a protected species according to the Department of Environment of Iran, accompanying many other reptiles that have been largely neglected from the list of protected species. Moreover, the lack of awareness and limited knowledge regarding venomous snakes has instilled deep fear in local people and even park rangers to such an extent that they would habitually kill a snake upon their very first encounter. Therefore, we recommend that the Department of Environment of Iran should (i) establish and declare new protected areas throughout the distribution range of the Caucasian pit viper, according to the four lineages identified in this study, (ii) establish safe zones in current protected areas that cover suitable habitats of the Caucasian pit viper; (iii) prevent or reduce legal hunting and venom collection from populations of the Caucasian pit viper until

Material and Methods
Sampling, PCR amplification, and DNA sequencing. We obtained sequence information for 41 individuals of the Caucasian pit viper, representing 15 regions from northeastern to northwestern Iran and Azerbaijan (Fig. 6, Table S4). Tissue samples contained three clips from the outer edge of ventral scales for each specimen, except two museum samples from Azerbaijan, for which muscle tissue was used. Captured snakes were released into their capture location immediately. All methods were performed in accordance with the relevant guidelines and regulations. This study was licensed by the Iranian Department of Environment under permits No. 94/6049 and 96/3631. Total genomic DNA was extracted from tissue samples using a Qiagen DNeasy Tissue kit (Qiagen, Courtaboeuf, France) or by phenol/chloroform protocol 72 . We amplified two fragments of the mitochondrial genome including 1125 base pairs (bp) of the cyt b and 678 bp of the ND4. Furthermore, we partially sequenced one nuclear proto oncogene c-mos (567 bp), which evolves at a slower rate than mtDNA 73 . All genes were amplified by polymerase chain reaction (see Table S5 for PCR protocols) and sequenced using primers designed by previous studies (L14910/H16064 74 and ND4/Leu 75 for the two mtDNA genes, and S77 and S78 for c-mos 76 ). PCR products were sent to Eurofins Genomics (Ebersberg, Germany) or SYNTOL Company (Moscow, Russia) for sequencing on an ABI 3730 automated DNA sequencer (Applied Biosystems).

Sequence alignment and data analyses.
Sequences of samples were examined using SeqScape version 2.6 (Applied Biosystems). We also obtained 56 sequences (comprising 13 species) from GenBank [17][18][19]26,27,[77][78][79] (Table S4). All sequences were aligned using ClustalW implemented in MEGA v.6 80 . Protein coding sequences were converted into amino acid residues to check for stop codons as a result of pseudogene generation. We used DnaSP version 5.0 81 to calculate mitochondrial diversity indices including haplotype and nucleotide diversities, number of haplotypes, and polymorphic sites. Nucleotide composition and genetic distances were analyzed using the uncorrected genetic distance including 1000 bootstraps in MEGA v.6.  49 to identify the best partitioning schemes and model of sequence evolution for each partition using the "greedy" algorithm and the Bayesian Information Criterion (BIC). Bayesian Inference (BI) and Maximum Likelihood (ML) phylogenetic analyses were carried out using the selected scheme. For the BI analysis, we used MrBayes 3.1.2 82 and the analysis was run using one cold and three heated chains (MC3) for 40 million generations, sampling every 1000th generation and discarding the first 25% of the trees as burn-in. Convergence was examined using Tracer v1.5 83 and checked with the convergence diagnostic parameters performed in MrBayes. A ML phylogenetic analysis was carried out using the selected model in IQ-TREE version 1.6.2 84,85 and 1000 non-parametric bootstrap replicates. Haplotype network. The concatenated sequences of the two mitochondrial genes in this study were 1803 bp in length, whereas the majority of sequences downloaded from GenBank were shorter. To avoid any potential bias caused by uneven sequence lengths among samples, we used an evenly-matched length of sequences for our analyses. We made an mtDNA matrix with 70 individuals (39 samples of G. caucasicus and 31 samples of the other taxa of G. halys/G. intermedius complex) with 1618 bp. We then developed a haplotype network in order to visualize the genetic relationships among the complex haplotypes. Haplotype networks were estimated with TCS 86 implemented in PopART (http://popart.otago.ac.nz).

Phylogenetic relationship and
Spatial analyses of genetic variability. We analyzed population structure and determined the amount of mixture between population clusters on our concatenated mtDNA + nDNA dataset (including 2370 bp) using the Bayesian Analysis of Population Structure software (BAPS v.6.0) 87,88 . We allowed K (number of clusters) to vary from 1-9 in order to calculate the best value for K. Then, an AMOVA was executed through Arlequin version 3.1 89 with populations grouped into the best number of K identified for population clusters. Also, we used F-statistics with 10000 permutations to estimate the proportion of genetic variability among different fixation indices, FST (i.e. variance among populations relative to the total variance).
Barcoding gap analysis. We used the concept of the 'barcoding gap' to determine the threshold of species level in the G. halys/G. intermedius complex using the mtDNA dataset. We calculated genetic distances among the taxa of the complex (clade F in Shi et al. 18 ) along with the four lineages for G. caucasicus in northern Iran, using an evenly-matched length of sequences from our mtDNA dataset. Pairwise inter-and intraspecific genetic distances were calculated using uncorrected p-distances by MEGA v.6 80 . Then, we plotted frequency distribution histograms of pairwise inter-and intraspecific distances.
Bayesian species delimitation. We adopted a Bayesian multi-locus species delimitation approach implemented in BPP 3.1 [90][91][92] to verify the speciation patterns within the complex based on our concatenated mtDNA + nDNA dataset (gene-partitioned), with the ML and BI topologies from this study serving as the guide tree. We tested two different scenarios: (i) considering the nine major clades corresponding to the nine taxa of the complex obtained by the BI and ML analyses, and (2) including 12 putative clades (the four Iranian clades of G. caucasicus and the remaining species of the complex). This method estimates population size (θ) and divergence time (τ) parameters and then applies a reverse-jump MCMC (rjMCMC) algorithm to calculate posterior probabilities for species delimitation. BPP assumes that there is neither recombination within a locus nor gene flow between species 92 . It also assumes neutral clock-like evolution and employs the JC69 mutation model; therefore, it could only be used for closely related species with sequence divergences not much higher than 10% 93 . We evaluated the neutrality of the two mtDNA genes by Hudson-Kreitman-Aguadé tests (HKA) 94 in DnaSP v 5.10 81 . Moreover, we assessed our nDNA recombination through the pairwise homoplasy index (PHI) test 95 in Splitstree4 96 .
A Dirichlet distribution was employed with α = 2 to compensate for variation in mutation rates among loci. A gamma prior (G) was applied to specify the population size parameter θ ( ) and root age (τ0) of the species tree. As BPP is sensitive to the prior values 97 , we made three replicate runs under three different combinations of gamma-distributed priors for ancestral θ and root age (τ0) 98-101 : (i) assuming relatively large ancestral population sizes and deep divergences, θ ~ G(1,10) and τ0 ~ G(1,10); (ii) assuming relatively small ancestral population sizes and shallow divergences among species, θ ~ G (2,2000) and τ0 ~ G (2,2000); and (iii) a conservative combination of priors that could fit models with fewer species θ ~ G(1,10) and τ0 ~ G (2,2000). Each analysis of 10 6 rjMCMC generations was run twice from different starting seeds (+1 and −1) with a burn-in period of 105 using algorithm 1 (α = 2 and m = 1). For all speciation events, we conservatively regarded speciation probability values > 0.95 as strong support.
Historical biogeography analysis. Molecular dating and divergence time. We estimated divergence times using BEAST 1.8.0 102 on our concatenated mtDNA + nDNA dataset with three calibration points including (i) divergence of three populations of the genus Porthidium in South America, some 3.5 Mya 103 , using a normal distribution model (mean = 3.5 Mya, SD = 0.51 Myr, and 95% CI = 2.5-4.5 Myr), (ii) divergence between Crotalus and Sistrurus before 9 Mya 104 , using a lognormal prior model with a zero offset of 9 Mya (mean = 1 Mya and SD = 1 Myr) 105 , and finally (iii) divergence of the Eurasian vipers clade (genera Macrovipera, Montivipera, and Vipera) about 20 Mya suggested by fossil data 106,107 , using a lognormal prior model with a zero offset of 17 Myr (mean = 1 Mya, SD = 1 Myr, and 95% CI = 17-36 Myr) 105 .
We also included some sequences of four species of Montivipera, two species of Macrovipera, two species of Vipera, two species of Sistrurus, four species of Crotalus, four species of Porthidium, as well as seven outgroups (G. brevicaudus, G. ussuriensis, G. blomhoffi, G. strauchi, G. rubromaculatus, G. liupanensis, and G. tsushimaensis) in our dataset as calibration points (see Table S4). PartitionFinder 1.1.1 49 was used to select the best data partition and evolutionary models in our molecular dating. We also adopted the Birth-Death Process model because it is a proper model when sequences from different species are included in a dataset 108 . The fitness of three molecular clock models (Strict, Exponential relaxed, and Lognormal relaxed) was tested by Bayes factor analysis in Tracer v1.5 83 , using value of 2LnBF 109 . Each analysis was performed using two independent runs of 40 million generations, sampled every 1000 generations, with the first 25% discarded as burn-in. Tracer was used to evaluate acceptable levels of MCMC chain mixing, the stationary likelihoods and appropriate lengths of burn-in (25%), as well as to estimate effective sample sizes for all parameters.   (Fig. 4). We also estimated the geographic location of the ancestors of the G. halys/G. intermedius complex, employing a statistical method implemented in PhyloMapper 1b1 112 , optimized by 10,000 replications.
Evolutionary hypothesis testing. We applied coalescent simulations to test three alternative hypotheses (Scenarios EH 1-3) regarding the demographic history of G. h. caucasicus, using an Approximate Bayesian Computation (ABC) approach on our concatenated mtDNA + nDNA dataset. In the first scenario tested, (i) we considered fragmentation of a single ancestral source population, which supposes that the four lineages could have diverged from a single ancestral refugium, approximately, at most, up to the Last Glacial Maximum, consistent with Weichselian (in Scandinavia and northern Europe), Würm (in Alps), and Wisconsin (in North America) glaciations, and then colonized different climatic and environmental niches throughout the Alborz Mountains (Fig. 7a). In the two other scenarios, we supposed multiple glacial refugia (or long-term geographical isolation among regions), which predicts that the four lineages may have developed under the effects of diversification from multiple refugia in northern Iran and two demographic history scenarios were tested to demonstrate whether (ii) the gene flow from northeastern Iran to western Alborz and Azerbaijan has experienced a one-way flow through the northern slopes, or (iii) a two-way gene flow through the northern and southern slopes of the Alborz Mountains. In a one-way gene flow, we considered an incursion from KD-EA to LarNP-CA, then to CA, and proceeding to WA-Az (Fig. 7b), whereas in a two-way gene flow, we tested a diversification from KD-WA to LarNP-CA, then to CA through the northern slopes of the Alborz Mountains, along with another diversification from KD-EA to WA-Az through the southern slopes of the Alborz Mountains (Fig. 7c). We used an Approximate Bayesian Computation (ABC) approach in the program DIYABC 2.1.0 113 to obtain the relative probabilities for the competing hypotheses. In this approach, summary statistics of our molecular data were calculated and compared to the dataset simulated earlier based on the modelled scenarios. Then, Euclidean distances between our simulated dataset and the observed dataset were calculated by a local linear regression. Finally, we only kept 10 subsets of the closest 2% of our simulated data to the observed data in order to compute posterior distributions, which enabled us to prioritize our modelled scenarios based on approximate marginal likelihoods and find the best-fit model. We used summary statistics including number of haplotypes as well as segregating sites, mean pairwise differences, mean between-sample pairwise differences, number of private segregating sites, and FST values.
We also used uniform priors with a lower and an upper bound for population size of 10 to 7 × 10 5 , and divergence times of 2.46 × 10 5 for t1, 4.30 × 10 5 for t2, 4.86 × 10 5 for t3, and 6.86 × 10 5 for Tt generations in the past. Furthermore, we assumed a generation time of 3 years, the age of sexual maturity for G. brevicaudus 6 , as a closely related species to Gloydius 114 . In addition, Yakovleva 115 has reported the minimum body length of the sexually matured specimens of G. halys sensu lato in Kyrgyzstan as 330 mm, which corresponded to the age of 3 years in our data. Moreover, all populations could include a discrete size-change event. We estimated the posterior probability of each hypothesis using both the direct approach (on the 500 closest datasets) as well as the logistic approach (on the 1% closest to the observed data). Then, to calculate posterior distributions of parameters, we used a local linear regression on 1% of the accepted closest simulated data merely based on the most likely hypothesis. To evaluate the strength and accuracy of our ABC model selection, we simulated 1000 test datasets (pseudo-observed datasets) under each of the competing hypotheses and calculated the probability of type I and type II errors, assuming the defined priors in the historical model. We determined the effective population sizes (Ne) using θ-values estimated by the ML and the coalescent-theory approach in MIGRATE 3.2.1 116 . We ran the analysis with 10 short chains of 200,000 steps, followed by three long chains of two million steps, sampling every 20 steps following a burn-in of 10,000 steps. Then, we calculated Ne using the equation for maternally inherited mtDNA (θ = Ne µ). We considered µ = . × − 3 9 10 8 , based on the mean rate of sequence evolution of approximately 0.01306 substitutions per Myr, using BEAST 1.8.2 102 . species Distribution Modelling. We first compiled data of the species occurrence (84 points) with 10 climatic, land cover, and physiographic variables to build a species distribution model (SDM) for the species. For climatic variables, we used the WorldClim dataset 117 , a set of 19 climatic variables with ~1 km resolution. Due to the high correlation between the climatic variables, we first calculated pairwise correlation coefficients among the variables and then screened them to low correlated (r < 0.75) variables. Accordingly, we obtained Bio1 (annual mean temperature), Bio4 (temperature seasonality), Bio5 (maximum temperature for the warmest month), Bio12 (annual precipitation) and Bio13 (maximum precipitation for the wettest month). Land cover variables including distance to forest patches, distance to herbaceous cover with shrubs and sparse trees, and distance to herbaceous cover, were generated in ArcMap 10.3 based on cover types of Globcover v. 2.1 118 . Based on the Shuttle Radar Topography Mission (SRTM) elevation model (http://srtm.csi.cgiar.org), we also used altitude and topographic roughness (i.e., standard deviation of altitude for all raster cells within a 5 × 5 km moving window) as the most important variables depicting physiographic heterogeneity.
To generate a habitat suitability map, we then conducted four SDM algorithms, including generalized linear models (GLM), generalized boosting models (GBM), maximum entropy (MaxEnt), and random forest (RF), and combined them into a final ensemble model using Biomod 2 package in R v.3.3.2 119 . To reduce bias caused by randomly selected occurrence points for model construction, we replicated the modelling based on a 10-fold cross-validation approach, using a different subset of 25% of the occurrence records to test each model. Model performance was evaluated based on the area under the curve (AUC) of a receiver operating characteristic (ROC) plot and the true skill statistic (TSS). The final ensemble model was obtained by weighted averaging the individual models proportional to their AUC scores. Finally, we used the boundaries of populations, estimated from BAPS, to separate the boundary of the continuous suitability map of the lineages from northeastern to northwestern Iran.