The Asian plethodontid salamander preserves historical genetic imprints of recent northern expansion

The Korean Peninsula, located at the southern tip of Northeast Asia, has never been covered by ice sheets and was a temperate refugium during the Pleistocene. Karsenia koreana, the sole Asian plethodontid salamander species, occurs only on the southern half of the Korean Peninsula and is thought to have found various climatic refugia. Despite its phylogenetic and biogeographic importance, no population-level genetic analysis has been performed on this species. Here we study the population genetic structure of K. koreana using mitochondrial and microsatellite loci to understand the recent historical dispersion process that shaped its current distribution. Overall, the genetic distance between populations correlated well with the spatial distance, and the genetic structure among populations showed signs of a unilateral northward expansion from a southernmost refugium population. Given the distinct genetic structure formed among the populations, the level of historical gene flow among populations appears to have been very low. As the estimated effective population size of K. koreana was also small, these results suggest that the small, restricted populations of K. koreana are extremely vulnerable to environmental changes that may require high levels of genetic diversity to cope with. Thus, special management strategies are needed to preserve these remnant populations.

This study was designed to uncover the population genetic structure of K. koreana on the Korean Peninsula to understand the historical dispersion process shaping the structure. Two types of genetic markers were used. First, mitochondrial COI (cytochrome c oxidase I) and Cyt b (cytochrome b) were used to estimate the historical migration and isolation processes of K. koreana populations. Second, microsatellites were used to determine the level of gene flow among populations and to estimate more recent demographic information compared to mitochondrial markers. Novel microsatellite markers were developed in this study. Considering that K. koreana is a rare species worldwide that needs careful management, the results of this study provide important baseline data that will inform future conservation strategies.

Methods
Sample collection. Karsenia koreana individuals were collected from April 2018 to August 2019 throughout all regions where this species is known to exist (Fig. 1). A tip of tail was clipped for each individual and stored in 70% EtOH until DNA extraction. After sample collection, the living individuals were released back to the original site of capture. A total of 204 individuals from 11 populations, whose localities are geographically separated, were used for this study (Table 1).
Ethics declaration. The sampling protocol was approved by the Korean Ministry of Environment. The sample collection in the field was carried out under the strict guideline on ethical animal experimentation protocols provided by Seoul National University Institutional Animal Care and Use Committee (SNUIACUC) and the guideline provided in the permits conforming to the Wildlife Protection and Management Act of the Korean Ministry of Environment.
Laboratory protocols for the loci characterization. Genomic DNA was extracted using a DNeasy Blood and Tissue Kit (QIAGEN, Hilden, Germany) following the manufacturer's protocol. The quantity and quality of each DNA sample were assessed using an Epoch Microplate Spectrophotometer (BioTek, Winooski, VT, USA). Successfully extracted DNA samples were diluted to 10-20 ng/μL and stored at − 20 ℃ until use for the genetic analyses.
For the two mitochondrial loci, two novel primers sets were developed (Supplementary Information Table S1) utilizing Primer3 31 Table S1). Polymerase chain reaction (PCR) for both mitochondrial loci were performed in a volume of 30 μL containing 1X Ex Taq buffer with 2 mM MgCl 2 , 0.2 mM dNTP mixture, 1 μL of 10 μM forward and reverse primers each, 1 U Ex Taq polymerase (Takara Bio, Shiga, Japan) and 1 μL of template DNA using a TaKaRa PCR Thermal Cycler Dice Gradient (Takara Bio). The thermal cycle profile for the loci consisted of an initial denaturing at 94 ℃ for 5 min; 35 cycles of a denaturing at 94 ℃ for 45 s, an annealing at 60 ℃ for 1 min and an extension at 72 ℃ for 3 min (for Cyt b: 65 ℃ for 5 min); and a final extension at 72 ℃ for 10 min. This extension step at lower temperature for longer time of Cyt b was necessary, probably due to the putative secondary structure of flanking regions by poly A and T 32 , which was verified in our preliminary experiments. The PCR products were purified and sequenced by Macrogen Inc. (Seoul, South Korea) and Cosmo Genetech Inc. (Seoul, South Korea) on an AB 3730xl DNA analyzer (Applied Biosystems, Foster City, CA, USA).
The novel microsatellite markers were isolated by Macogen. Whole genomic DNA was subjected to pair-end sequencing using a MiSeq platform (Illumina, San Diego, CA, USA). The extracted reads were processed to remove adaptor sequences, and RepeatModeler 33 (http:// www. repea tmask er. org/) was used for the identification of repetitive DNA sequences. SSR Finder 34 (ftp:// ftp. grame ne. org/ pub/ grame ne/ archi ves/ softw are/ scrip ts/) was used for annotating reads containing simple repeat sequence motifs. A total of 52 candidate markers (32 tetra-, Mitochondrial diversity and structure. The quality checking, trimming and editing of the mitochondrial sequence data were carried out using Geneious Prime. The sequences were aligned using ClustalW 35 under the default setting, implemented in MEGA X 36 . No gaps or ambiguous bases were found. We estimated the following parameters for each population based on the concatenated sequences of COI and Cyt b using DnaSP 5.10.01 37 ; number of haplotypes, haplotype diversity (H d ) 38 , nucleotide diversity (π) 38 and sequence diversity (k, average number of nucleotide differences) 39 . Haplotype sequences of the two loci obtained in this study were deposited in GenBank (Accession Nos. MT106778-MT106825; Supplementary Information Table S2). The number and frequency of private haplotypes ( P H) were also calculated based on the haplotype data.  49 . For the BI analysis, two independent Metropolis Coupled Markov Chain Monte Carlo (MCMC) runs of 10 7 generations were conducted using four chains per run with three heated and one cold (temperature set to 0.1). We sampled trees every 500 generations and discarded 25% of the trees as burn-in. We used TRACER 1.7.1 50 to assess convergence of parameter estimates and posterior probabilities. The remaining trees were summarized to obtain a 50% majority-rule consensus tree. The tree was visualized using  19 . The means and standard deviations of the normal distribution for these priors were chosen to reflect arithmetical medians of 95% credible intervals.
Given the population-level phylogenetic relationships among K. koreana lineages may be relatively close, we assigned a strict clock model 52 , coalescent constant population model and GTR + I + G substitution model under the default setting. Nested sampling algorithm 53,54 posteriorly confirmed this combination of model parameters among multiple alternatives. The MCMC of BEAST consisted of three independent runs of 10 8 generations with sampling log and tree files every 1000 generations. After running, the convergence of chains was verified and burn-in periods were determined in TRACER to ensure the effective sample sizes (ESS) for all parameters were over 200. Three independent MCMC runs were combined in LogCombiner 2.6.1. Predetermined 10% burn-in trees were discarded and the final phylogenetic tree was annotated by maximum clade credibility type and median node heights in TreeAnnotator 2.6.0. The tree was drawn by FigTree 1.4.3.
Extended Bayesian skyline plot (EBSP) 55 was implemented in BEAST to estimate historical demographic change. An unpartitioned HKY + I substitution model 56 was applied, and prior selection, clock model, parameter settings and MCMC setup were the same as those used for the divergence time estimation. To incorporate a time scale into the analysis, we assigned two additional priors; normal distributed clock rate (mean: 6.8629E-3, sigma: 5.9743E-4) and lognormal distributed MRCA prior (mean: 2.318, sigma: 0.3031). The result was plotted with log-scale population size by time (Ma) using R package 57 based on 'plotEBSP' function provided in BEAST. The likelihood of historical demographic expansion was tested based on Tajima's D 58  To infer the evolutionary history of K. koreana populations, over 100 scenarios were predefined to include both northward and southward dispersal patterns of different geographically probable orders among genetic clusters. The scenarios were examined in DIYABC 2.1.0 78 in a tournament fashion. Genetically close populations were grouped into a single cluster to lower the computational load, resulting in seven populations/clusters. More than 10 5 simulations per scenario were performed and scenarios were compared based on posterior probabilities. If the comparison results of the direct and logistic approaches were inconsistent, the confidence of high-ranked scenarios were evaluated again to discriminate among them. For the last two selected scenarios, 10 6 simulations were implemented for each to estimate the parameters of effective population size and branching point.  shared haplotypes, the other populations had their own unique haplotypes. The haplotypes of the populations in close geographical proximity tended to be close to each other in the haplotype network (Fig. 2).

Mitochondrial analysis.
The overall topology of ML and BI trees were nearly identical (Supplementary Information Figure S1 and S2 The historical demography analysis inferred a continuously stable population size, represented by a typical J-shaped skyline plot that reflects faster molecular evolution on a shorter time scale 79 (Fig. 3). Tajima's D and Fu's F S tests were conducted for each mitochondrial locus and the results are summarized in Supplementary Information Table S3. None of these tests rejected the null hypothesis of neutral evolution with constant population size.
Microsatellite analysis. Ten tetra-, two tri-and two di-nucleotide microsatellite markers were chosen, as they successfully amplified and contained adequate levels of polymorphism within and among populations. Although no large allelic dropout was detected, the presence of null alleles was suspected in all loci analyzed. Null alleles were found in only one or two populations across the loci and were not found associated with any specific population. This suggests that the potential for substructure or inbreeding within populations was low. The overall null allele frequencies in loci K1039 and K1040 were higher than that in the low-frequency zone (see Dakin and Avise 80 ), and loci K1011 and K1040 had null alleles in two or more populations. We included these three loci (K1011, K1039 and K1040) in subsequent analyses as the analyses results did not differ with or without these loci (see Oromi et al. 81 ). We did not detect any signature of linkage disequilibrium among the loci used (data now shown).  Information Table S6). Moreover, all populations had an M-ratio greater than 0.68, considered as the threshold level of historical bottleneck (Supplementary Information Table S6).
Based on pairwise-F ST and -R ST , the overall level of genetic differentiation among populations was fairly high (Table 3). Since the overall values of pairwise-R ST were much larger than those of pairwise-F ST (Table 3), this genetic differentiation is likely due to spatial isolation between populations rather than genetic drift, resulting from population size fluctuations. In particular, populations [JA], [JE] and [GY] exhibited higher levels of genetic differentiation compared to those of the other populations (Table 3) (Table 3). This pattern was also evident in the IBD test results (Fig. 4). Figure S3). Group [JA + JE] was clearly separated from the other groups without overlap (Supplementary Information Figure S3). The membership of each of these four groups is related to spatial proximity (Supplementary Information Figure S3). In our Bayesian Structure analysis, the delta K method implemented in Structure Harvester was unable to identify an optimal number of genetically distinguishable clusters. The result, K = 2, is the minimal value, and may be a result of underestimation 77,82 . The overall clustering pattern generated by PCoA is reflected in the Structure analysis (Fig. 5) Figure S4).

PCoA grouped populations into four clusters: [JS + PC + SC], [BE + JC], [DJ + GJ + GY + HC] and [JA + JE], with slight overlaps between groups (Supplementary Information
DIYABC found two scenarios having the highest statistical probability (Supplementary Information Table S7 and Figure S5). In the first scenario (Supplementary Information Table S7 and Figure S5 Table 1.  Information Table S8 and Figure S5).

Discussion
In general, Karsenia koreana showed distinct population genetic structure. The various approaches that we   ana (π = 0.00011-0.00237) are relatively lower than those of other plethodontid species, e.g., European Hydromantes species (π = 0.003-0.037) 83,84 and Gyrinophilus porphyriticus (π = 0.031) 85 ; and East Asian salamander species, e.g., Pachyhynobius shangchengensis (π = 0.0345) 86 and Hynobius quelpaertensis (π = 0.0174-0.0214) 87  Overall, the results of genetic structure at the population-level were congruent for mitochondrial DNA and microsatellite data. The spatial location of the populations correlated well with the genetic distance between populations using both genetic markers. Considering that this species does not actively move long distances and tends to stay in a limited habitat, it is unlikely that gene flow over long distances occurs. However, results from both mitochondrial and microsatellite loci, suggest strong gene flow has occurred between populations [PC], [JS] and [SC]. This is probably because populations occupying a complex, continuous mountain terrain have access to a shady, humid environment that allows for the relatively active movement of individuals.
An incongruence between the population structures results based on the two genetic markers (microsatellite, mitochondrial DNA) should be noted. Microsatellite analysis via Structure and PCoA clustered together populations [DJ + GJ] and [HC], while [HC] was relatively well separated from the other populations based on mitochondrial analyses (the haplotype network and phylogenetic trees). In the mitochondrial results, [GJ] was relatively well separated from [DJ]; yet one haplotype of [GJ] grouped with the [DJ] haplotypes. However, these two genetic markers exhibit different modes of inheritance and act on different evolutionary time scales, thus it is not uncommon to observe such incongruences, and many theories and hypotheses provide plausible explanations for such findings [90][91][92][93] .
Microsatellites and mitochondrial DNA provided different estimates in the effective population size, a finding that should be noted considering that this species is endangered 90,94 . Fluctuation by genetic drift is likely to occur in mitochondrial data, which provided smaller estimates of effective population size. Thus genetic drift may have caused the differences in the patterns of diversity and structure between the two markers. Historical demographic analyses based on both of the markers used in this study was unable to reject the null hypothesis (a constant population size) and additionally indicated small population sizes.
Phylogeography. During the glacial advances of the Pliocene and the glacial cycles of the Pleistocene, animal species in the Northern Hemisphere were restricted to a southern refugium 3,95 . Later, the retreating ice sheets allowed many animals to recolonize northward. However, if the environments in the southern refugium contain sufficient resources, populations may stay in the refugium and/or adapt to the surrounding areas, rather than recolonize their original habitats. Gómez and Lunt 4 proposed that within a big refugium composed of complex mountain ranges, multiple small refugia could form. The idea of small refugia existing within a big refugium has been proposed in many studies 5,[96][97][98] , including empirical studies on plethodontids 99,100 and other East Asian animals [101][102][103][104][105] . This scenario also provides a reasonable interpretation of our data, i.e., genetically distinct K. koreana populations exist in small refugia that are distributed in or around the mountainous terrain of the Korean Peninsula, which in total may be considered one big refugium.
Previous studies inferred that the ancestor of K. koreana crossed from western North America to East Asia approximately 65 Ma 19,22 . Since K. koreana is confined to the Korean Peninsula, we hypothesize that the current populations are the surviving relics of the species, and the present-day population genetic structure is the   Fig. 3). To clarify this, additional phylogenetic analyses are needed using different types of genetic markers, such as nuclear genes or ddRAD markers.
Implications for conservation. The delimitation of a species′ 'Management Unit' (MU) is important in conserving its current population genetic diversity 106 . A MU is defined as "a functionally independent population of a species that has formed under restricted levels of gene flow." 106  Presently, the IUCN red list designates Karsenia koreana as a species of "Least Concern," 23 mostly because it is widespread on the Korean Peninsula. However, our data suggest that historical gene flow between the populations was low, isolating populations and increasing the influence of genetic drift. Therefore, although this species is widespread, the average effective population size is very small. Conserving this species requires special conservation strategies that take into account the preservation of all seven MUs.

Data availability
All data needed to evaluate the conclusions in the paper are present in the paper and the Supplementary Information materials. Sequences in the paper have been deposited at GenBank (Accession Nos. MT106778-MT106825).