An unexpected genetic diversity pattern and a complex demographic history of a rare medicinal herb, Chinese asparagus (Asparagus cochinchinensis) in Korea

Range-wide population studies of wide spread species are often associated with complex diversity patterns resulting from genetically divergent evolutionary significant units (ESUs). The compound evolutionary history creating such a pattern of diversity can be inferred through molecular analyses. Asparagus cochinchinensis, a medicinally important perennial herb, is in decline due to overharvesting in Korea. Eight A. cochinchinensis populations in Korea and three populations from neighboring countries (China, Japan and Taiwan) were examined using nine nuclear microsatellite loci and three chloroplast microsatellite loci to characterize molecular diversity patterns. The average within-population diversity was limited likely due to long-term bottlenecks observed in all eight populations. High pairwise FST values indicated that the populations have largely diverged, but the divergences were not correlated with geographic distances. Clustering analyses revealed a highly complex spatial structure pattern associated with two ESUs. Approximate Bayesian Computation (ABC) suggests that the two ESUs split about 21,000 BP were independently introduced to Korea approximately 1,800 years ago, and admixed in secondary contact zones. The two ESUs found in our study may have different habitat preferences and growth conditions, implying that the two genetically divergent groups should be considered not only for conservation and management but also for breeding programs in agricultural areas.


Results
Genetically identical clones were very rare in all eight populations (150 genets in 158 samples). We found no scoring error in nuclear microsatellite profiles during the genotyping procedure. Null alleles were present for two loci with low (0.1) to moderate (0.35) frequencies. One of the two loci (AC069; see Kim et al. 2017 for detailed information about the marker loci) 21 appeared to have null alleles present in all eight populations while null alleles were present in only two populations for the other locus (AC079). A significant departure from Hardy-Weinberg Equilibrium (HWE) was shown in locus 3 (AC014, high heterozygosity) and locus 8 (AC069; an excess of homozygotes with null allele present). We found that two loci (AC008, AC017) were linked with the remaining 11 loci in most populations (significant linkage disequilibrium, LD, at P < 0.005). Because some genetic analyses such as STRUCTURE and DIYABC depend on strong assumptions of marker independence and HWE, we purged four markers (AC008, AC014, AC017, and AC069) that violated the assumptions from all downstream analyses.
The expected heterozygosity (He), observed heterozygosity (Ho) and number of alleles per population (Na) averaged over individuals varied among the populations (Table 1). He ranged from 0.22 to 0.39 and Na was between 1.44 and 2.56 (Table 1). In most populations, the F IS values were close to zero except for HAM (F IS = 0.34 ± 0.15; the latter number is the Standard Error, SE) and OKN (F IS = 0.31 ± 0.20). We do not present F IS values here because most values were not significantly different from zero. Pairwise F ST values were all statistically significant and differed greatly among population pairs; the lowest was 0.097 for the YGW/TPE pair (see Table 1 for population abbreviations) and the highest was 0.631 for HAM/OKN (Table 2). According to a Mantel test, there was no correlation between genetic divergence (F ST ) and geographic distance (Euclidean distance; r = 0.01, P = 0.43; Fig. S1).
Overall, we found limited haplotype variation (Table 1). With the three cpSSRs, 15 length variations were detected (4-6 alleles per locus), resulting in 15 haplotypes for 158 samples from eight populations (Table S1; Fig. 1). Of the 15 haplotypes, 10 (haplotype codes 1, 2, 3, 5, 8, 9, 10, 11, 14, and 15) were found in a single or in only two individuals (frequency ~1%). About 40% of the low frequency haplotypes were found in population TPE. About 60% of all samples were predominantly assigned to a single haplotype (12), followed by the next most frequent haplotype (7; comprising ~20% of the total sample; Table 1). Haplotype diversity was highest in TPE (see Table 1 for population abbreviations and locations), whereas two populations exhibited haplotype diversity of zero (GND & HAM; Table 1). The pairwise genetic differentiation among the populations varied widely from zero to 1.0 across population pairs ( Table 2). The Φ PT values between population pairs within Korea were low (less than 0.1), whereas those estimated for population pairs between Korean populations and the Chinese population (GND) were much higher (0.77-1.00; Table 2). Population differentiation between Korean populations and the Japanese population (OKN) was also very high (0.71-0.87; Table 2). AMOVA results revealed that ~70% of the total haplotype variation stems from regional groups (df = 3; sum squares = 37.21), while less than 30% of the variation comes from within-populations (df = 150; sum squares = 25. 19).
K, the number of randomly mating subgroups that best explain our data, was equal to 2 based on delta K as estimated from the STRUCTURE results (Fig. S2). Along with the best K, we present bar plots for K = 2 to 8 to show the clustering patterns with various numbers of K clusters (Fig. S3). The STRUCTURE results for the optimal K (K = 2) clearly show that the populations largely diverge into two groups with an admixture pattern observed in WND and OKN ( Fig. 2; see Table 1 for populations details). The Chinese (GND) and Taiwanese (TPE) populations were genetically distinct, while the Japanese population (OKN) shared genetic affinity with both populations (GND & TPE; Fig. 2). Of five Korean populations, two (BOR & HAM) shared alleles with the Chinese population, whereas the other two populations (NAH & YGW) were genetically affiliated with the Taiwanese population (Fig. 2). One southern population, WND, revealed a pattern similar to that of the Japanese population, which is associated with the admixture between the Chinese and Taiwanese populations (Fig. 2). As the K values become larger, YGW and OKN show unique assignment patterns, although overall the split pattern of two groups remains consistent throughout varying K values (Fig. S3).
PCoA results were consistent with the STRUCTURE results, which largely separated two distinct groups particularly on the first PC axis (PC1, explained 26% of the molecular variance; Fig. 3). Likewise, BOR and HAM were clustered with GND, whereas NAH and YGW were clustered with TPE (Fig. 3). OKN and WND were positioned between the two clusters, although they were rather closer to the latter cluster with TPE (Fig. 3). A post-hoc AMOVA analysis with the arranged groups based on both STRUCTURE and PCoA analyses (3 groups: BOR, GND & HAM; NAH, TPE & YGW; OKN & WND) revealed that the populations were genetically more diverged within groups (F SC = 0.25) than across groups (F CT = 0.31; Table 3). The genetic variance was largely partitioned to within individuals (47%), followed by among groups (25%; Table 3).
Based on both Sign and Wilcoxon tests, there was no evidence of recent population bottlenecks for all eight populations under both IAM and SMM mutation models (Table 4). In contrast, we found signatures of long-term bottlenecks in all populations. G-W indices (M) were much lower than the critical value of M = 0.73, as estimated from the data (Table 4). Of the total 28 population pairs (56 combinations of pairs for both directions),   (Table S1).  Table 1  www.nature.com/scientificreports www.nature.com/scientificreports/ approximately 90% of population pairs showed ~1% or fewer migrants per generation (Table S2). We found a slightly higher frequency of migrants between two population pairs (BOR-HAM & NAH-WND; ~5% or less). Overall, the migration rates were approximately equal for both directions, but a few population pairs showed migration rates which were significantly skewed toward one direction (Table S2). Notably, the Chinese, Taiwanese and Korean population pairs, TPE-YGW and GND-BOR, exhibited the highest migration rates (~20%; Table S2). The frequencies of migrants from the Chinese and Taiwanese populations (TPE and GND to Korean YGW and BOR were fairly high (~20%), whereas frequencies of migrants in the opposite directions were much lower (~2% or less; Table S2).
In the DIYABC results, of the nine demographic history models, scenario 7 was the model of choice with the highest posterior probability for both direct and logistic posterior probability estimates, followed by scenario 8 and scenario 4 ( Table 5). The best evolutionary model suggested that the two distinct genetic sources (GND, China and TPE, Taiwan) found in the assignment analyses were diversified about 7,000 generations ago from  Table 1 for population abbreviations, sample locations and sample sizes.  www.nature.com/scientificreports www.nature.com/scientificreports/ the ancestral population (Table 5; Figs 4 and S5). The Korean and Japanese populations then derived from the GND and TPE nearly 600 generations ago (Table 5 (Table 5; Fig. 4). Of the two admixed populations, the Japanese population (OKN) diverged earlier than the Korean admixed population (WND; Table 5; Figs 4 and S5). In the model checking, the summary statistics of the observed data were within the first three PCs estimated from the posterior and prior distributions of the simulated data for scenario 7 (Fig. S4). Also, for the most part, in scenario 7, the summary statistics of observed data did not show significant differences from the ones estimated from the simulated data (Table S3). Type I error of scenario 7 was high (direct measure & logistic measure = ~0.3), indicating that there is a high probability for the chosen scenario to be rejected when it is true. In contrast, Type II errors of scenario 7 from both the direct and logistic estimates were very low (direct measure = ~0.03; logistic measure = ~0.05) providing high confidence with regard to avoiding false positives.

Discussion
Population divergence is a strong driving force for species range shift and lineage divergence. Asparagus cochinchinensis is distributed widely in east Asia, yet the habitat types vary greatly throughout the range. Besides the complex pattern of within and among population-level genetic diversity, our molecular analyses on A. cochinchinensis provide two major findings. First, our assignment analyses and the ABC model of choice suggest that there are two distinct ESUs (GND, China & TPE, Taiwan) which diverged around LGM and that later each ESU migrated to neighboring regions. Secondly, with the migration (~1,800 BP), the two ESUs were admixed perhaps through secondary contacts independently in Korea (WND) and Japan (OKN).
We found significant departures from HWE in two loci. Heterozygote deficiency observed in one of the two loci likely derived from the presence of null alleles. In contrast, the other locus that deviated from HWE showed an excess of heterozygotes (average observed heterozygosity = 0.75), yet what caused the heterozygote excess is not clear. A few explanations can be considered: small sexual and self-incompatible populations, over dominance, negative assortative mating and asexual reproduction 22 . A. cochinchinensis is an outcrossing dioecious plant that can clonally propagate through tuberous roots 9,23 . Thus, all four mechanisms are likely causes of heterozygote excess. Unfortunately, with the current data set, we cannot explicitly determine the underlying mechanism of the heterozygote excess for the locus. Future study may further examine factors leading to heterozygote excess for certain loci using both molecular tools and breeding experiments with a variety of sample sizes, as in Stoeckel et al. 22 .  Table 5. Summary of the DIYABC analysis. The direct posterior probability estimates (95% confidence interval) and those calculated by logistic regression for the top 1% of simulated data sets closest to the empirical data for all scenarios are present. For the most probable evolutionary scenario (7), the posterior mean and median values with 95% confidence intervals of parameters integrated in the scenario of choice are provided. The most probable scenario is marked in bold for the posterior probability estimates.
www.nature.com/scientificreports www.nature.com/scientificreports/ We found much lower within-population genetic variation in A. cochinchinensis (mean He = 0.32; mean Ho = 0.29 Table 1) than in other short-lived perennials estimated from microsatellite data (He = 0.55; Ho = 0.53) 24 . In plants, life history traits and several evolutionary forces (i.e. effective population size (Ne), selection, and demographic changes) influence the level of genetic diversity 25,26 . Since outcrossing plants often show higher within-population genetic diversity 25 , the lowered genetic variation in our results is rather surprising. Inbreeding and clonal reproduction may negatively influence the level of genetic diversity depending on the dispersal and pollination modes, even in dioecious species 27,28 . However, clonal assignment results failed to find significant contribution of clonal reproduction in A. cochinchinensis collected for the study. We only found a significant inbreeding rate in one population, HAM (F IS = 0.34). However, the genetic diversity in HAM was not the lowest among the eight populations (Table 1). No recent bottleneck events (within the last 3-5 generations) were detected in both the Bottleneck and DIYABC results (Tables 4 and 5). Furthermore, the best evolutionary scenario identified by DIYABC suggested that the Ne estimates are moderate to large (>325; Table 5). One plausible explanation for the reduction in genetic diversity is long-term demographic changes. In fact, the M ratios (G-W index) computed for all eight populations were much lower than the critical value (M = 0.73) estimated using Critical_M, which strongly suggests long-term population bottlenecks 29 .
The average pairwise F ST values of A. cochinchinensis across all population pairs (0.43; Table 2) were clearly higher than the average F ST values estimated in long-lived perennial plants (microsatellite-based data F ST = 0.31) 24 . In general, outcrossing plants show lower population divergence (e.g. average F ST = 0.22 for 71 outcrossing spp.) 24 , yet A. cochinchinensis populations exhibited high divergence for an outcrossing species. Coupled with outcrossing nature, long distance dispersal is also likely to alleviate population divergence. Although there is a complete lack of numerical data examined for A. cochinchinensis thus far, in several congeneric species the most common dispersal vector, birds, can transport the seeds up to 12 km 30,31 . Long distance migration by avian vectors facilitates gene flow among local populations, which genetically homogenizes populations. Despite the dispersal capability and the outcrossing nature of A. cochinchinensis, we found inflated population divergence, somewhat opposite to what was expected. The migration rates estimated among population pairs also supported the limited gene flow for population pairs. For example, the population pair with the largest F ST (OKN-HAM, F ST = 0.63) had very low migration rate (0.01; Table S2). As shown in the lowered migration rates, limited gene flow among population pairs over long geographic distances might partly be responsible for the large F ST values. This can be supported by our large geographic-scale data collection effort (the longest distance between population pairs, ~2000km). To investigate more closely whether the limited gene flow is associated with geographic distance, we performed a correlation test between genetic divergence and geographic distance (Mantel test).  Table 1  www.nature.com/scientificreports www.nature.com/scientificreports/ Notably, we found no significant relationship between genetic divergences and geographic distances. Population pairs within Korea showed greater F ST values (e.g. F ST between HAM and YGW = 0.5) compared to those from population pairs across different nations (F ST between TPE and YGW = 0.1; Table 2). The estimated migration rates also revealed a similar pattern. Despite the long geographic distance, the highest migration rate assessed was between the Chinese population (GND) and the Korean population (BOR; 0.21; Table S2). The F ST values found here are closely associated with the genetically affiliated groups of populations as identified in the STRUCTURE and PCoA results ( Table 2; Figs 2 and 3). AMOVA results from both maternally inherited cpSSRs and nrSSRs were also rather complex for an outcrossing plant, where most of genetic and haplotype variance derived from the group level and, not from the individual level. The complex patterns of genetic variation and population divergence are consistent with the hypothesis of multiple ESUs. As explained by the hypothesis, past and recent species range expansions of divergent lineages accompanied with demographic changes is likely the causal mechanism of the unusual patterns of population divergence. One can argue that the divergence results are due to stochasticity and/or unrealistic assumptions built into the indirect measure of gene flow (F ST estimates), i.e., no mutation, no selection and approximately equal population sizes 32 . To rule out the alternative hypotheses, we examined the spatial population structure among the eight populations through assignment analyses and competed plausible evolutionary hypotheses utilizing the ABC approach.
Consistent with the population divergence pattern, the clustering analyses (PCoA & STRUCTURE) revealed a complex pattern of population structure that does not reflect geographical distances among populations (Figs 2  and 3). Despite the geographic proximity, four Korean A. cochinchinensis populations were separately allied with two genetically distinct Chinese and Taiwanese population (Figs 2 and 3). One Korean population (WND) was genetically affiliated with the Japanese population, which is geographically more distant than the four remaining Korean populations (Figs 2 and 3). Because the delta K measures showed multiple peaks in association with varying Ks, we presented assignment plots for multiple K values and more closely examined the clustering patterns. The overall divergence pattern dividing the eight populations into two lineages, the Chinese population represented by GND and the Taiwanese population (TPE) remains consistent throughout varying K clusters (K 2 to K 6) with a few exceptions (Fig. S3). For example, K 6 (the cluster number with third highest delta K; Table S4; Fig. S2) revealed the same split pattern of two major clusters as in K 2, but with more complexity. YGW and TPE predominantly consisted of the same cluster (yellow) while BOR and GND shared a different cluster (red; Fig. S2). HAM, however, was assigned to a unique pink cluster. Similarly, NAH was mostly assigned to the blue cluster that is distinct from the YGW cluster pattern. Although the higher K values showed more complex genetic structure patterns, there are a couple of important patterns in common across various K numbers. First, the Chinese and the Taiwanese populations always appeared to be distinct from each other resulting in at least two diverged clusters. The genetic affinities among Korean populations did not mirror the geographic proximity, instead, the Korean populations consisted of the two largely diverged clusters (ESUs).
Cryptic diversity associated with ESUs and/or cryptic species greatly influence the spatial structure among populations. One approach to diagnose the cryptic diversity was to determine if groups share greater genetic affinity across regions than within the same region 1,33 . The clustering pattern observed in our nrSSR data indicated that A. cochinchinensis in Korean populations likely consists of at least two ESUs with admixture between the two clusters, possibly in secondary contact zones. The clustering results are also congruent with the hypothesis proposed in previous studies based on the distribution and ecological characteristics of the relevant habitats 14,18,34 . In addition, the weak correlation between geographic distance and population divergence from the Mantel test (r = 0.01, P > 0.5; Fig. S1) supports this cryptic diversity pattern.
The divergence pattern observed in our haplotype analysis (cpSSR) differed notably from the pattern found in the nrSSR data. Compared to biparental nuclear markers, maternally transmitted chloroplast markers often show greater population divergence with a high level of variance (near zero to complete fixation) due to various factors, including a seed-only migration mode, the absence of meiotic recombination and increased sensitivity to stochasticity 35,36 . Our cpSSR data showed a rather high level of population divergence with highly inflated heterogeneity (Φ PT values ranging from 0 to 1; Table 2). The Korean populations were not highly diversified, whereas the haplotype divergences were very high between population pairs across Korea and China as well as Korea and Japan. As the cpSSR has much lower evolutionary rates 37 , the pattern of haplotype divergence among populations observed might only be the signature of ancient diversification. The time since demographic events, such as past and recent migrations may not be long enough to accumulate new mutations; thus, the demographic events may not have contributed to the haplotype divergence pattern.
Alternatively, the haplotype divergence pattern may solely stem from random drift, as chloroplast markers are much more sensitive to stochastic events than biparental nuclear markers 36 . Moreover, the pattern might result from sampling bias and/or from the extremely limited variation; thus, extreme caution should be taken when interpreting the pattern. Of 15 haplotypes, most were identified by rare alleles and were in very low frequencies. A single haplotype (12) predominantly occurred in most Korean populations, while the Chinese population (GND) was solely assigned to another single haplotype (7). The Taiwanese population (TPE) and one Korean population (NAH) showed the largest haplotype diversity and shared similar haplotypes ( Fig. 1; Table S1).
To shed light on the complex divergence pattern and reconstruct the evolutionary path of A. cochinchinensis in Korea, we utilized the ABC approach and devised nine final evolutionary scenarios. The most probable scenario (scenario 7) demonstrated that lineage separation between two ancestral populations (GND, China; TPE, Taiwan) dated back to LGM (~21,000 BP; considering an average generation time of three years). This is a crude estimation to some extent, but it is consistent with the major changes in the distribution of the vegetation during LGM throughout east Asia 38 . The ABC results also ruled out the stochasticity hypothesis, as our data did not support the scenario with independent divergence from an ancestor with subsequent founding events (see scenario 2 & scenario 9 in Table 5; low probabilities shown in models associated with founding events in preliminary runs, (2019) 9:9757 | https://doi.org/10.1038/s41598-019-46275-9 www.nature.com/scientificreports www.nature.com/scientificreports/ results not provided). Particularly, scenario 9 was added in the DIYABC analysis to determine if there were recent population bottlenecks after the initial splits, migrations and admixtures set in the scenario 7, the most likely model ( Fig. 4; Appendix S1). The probability of scenario 9 was low in the direct measure and zero in the logistic estimation (Table 5). Although our data could not confirm an precise number of ESUs due to limited sampling in China, in the best scenario, there were at least two ESUs in A. cochinchinensis supporting the hypothesis of ESUs. Each ESU can be an important management unit and should be treated separately for conservation and management practices. Based on the pattern of the admixture between the two ESUs in WND and OKN, there is no complete reproductive barrier between the two ESUs, thus the two ESUs may not be as largely divergent as separate species that are morphologically cryptic.
According to the best model, the Korean populations diverged from the Chinese and Taiwanese populations approximately 1,800 years ago when there were active trading events among the three countries (Korea, China and Japan; Table 5; Fig. 4). The species has been heavily used as a traditional medicine for a long time in Korea. The name of the species and its usages were listed in several ancient medicine books, yet there were no records of natural populations 39 Figs 2 and 4). The scenario is again consistent with the species introduction hypothesis proposed by distribution and genetic studies 14,18 . The introduction history was also explained by the clustering pattern we found in the PCoA and STRUCTURE results (Figs 2 and 3). Notably, one Korean (WND) and one Japanese (OKN) population shared alleles with both ESUs, perhaps through secondary contact in Korea and Japan.
Scenario 8 with the second highest probability suggested nearly identical demographic history except for the WND admixture event whereas the third best scenario (4) largely differ from the top two scenarios ( Fig. 4; Appendix S1). The direct estimate of posterior probability for the scenario was nearly as good as that in our scenario of choice; however, the statistical support was certainly not significant (95% CI overlapping with zero in both the direct and logistic estimates of probability; Table 5). In scenario 4, most of the Korean populations are derivatives of GND except for one group (YGW & NAH, pop 5), which derived from TPE. Scenario 4 showed a high logistic posterior probability (about half that of the scenario of choice), but the direct estimate was nearly zero and the 95% CI completely overlaps with zero. Among five repeated runs with varying parameter sets, we compared two runs showing good shapes of parameter posterior probability-distributions. One of the runs with a broader t1 range had a clear peak for the t1 posterior probability-distribution; however, the run exhibited a poor overall model fit (i.e. the first three PCs in the observed data were not present within the simulated prior distribution). Therefore, we selected the run with a better model fit for further discussion and presented the results here.
Despite its ecological and economical importance as a rare medicinal plant, there is a serious lack of knowledge about the evolutionary features for A. cochinchinensis. To date, our study is the first empirical study to examine the patterns of genetic diversity across East Asia for the species. Through a genetic diversity analysis, we determined that there are at least two ESUs and inferred the detailed demographic history of A. cochinchinensis in Korea. The plants have been widely used for several medicinal purposes and farmers started to cultivate the plants in Korea without knowledge of their genetic entities. Our findings of multiple ESUs will provide valuable information to those who cultivate the species. For example, the two distinct ESUs may require different growth conditions. In addition, the two ESUs may have to be closely monitored to prevent losing one of them through hybridization, as found in two locations in Korea and Japan.

Materials and Methods study species. Asparagus cochinchinensis is a dioecious perennial herb in the genus Asparagus, subgenus
Asparagus, a group of dioecious taxa 9 . Sexual differentiation of flowers in Asparagus occur in the late stage of development by aborting pistils or stamens selectively, which suggests that the evolution of sex in Asparagus was derived from hermaphrodite ancestors 9,40 . The species can be identified from its close relatives by having three cladodes coming from one fascicle 13 . The climbing stems of A. cochinchinensis grow up to 2 m and are slightly woody 13,23 . The plants flower from May to June and produce green berries with 1-2 black seeds maturing in early fall 13,23 . The fleshy berries attract frugivorous birds as seed disperser in the genus 10,41 . Although studies empirically investigating the reproduction mode of A. cochinchinensis are lacking, the plant is likely to reach maturity in approximately two to five years for first flowering based on observations made of allied species in the same genus 10 . The species can also clonally grow from the root 34 . sample collection and DNA isolation. In the springs of both 2015 and 2016, we collected young leaves of 158 individuals from eight populations (~20 samples from each population) from Korea, China, Japan and Taiwan (Table 1; Fig. 2). Under the supervision of local botanists, we chose one population confirmed to be a natural population from neighboring countries, China, Japan and Taiwan. Due to over harvesting and private farming, only a small number of natural populations remain in Korea. Among those, we randomly chose five collecting sites with a minimum distance of 50 km between populations (Fig. 2). As A. cochinchinensis is only distributed south of Chung-Cheong province on the Korean peninsula 14 , there was no need of sampling in North Korea. The plants were identified based on the taxonomic keys formerly described 13,42 . Prior to field sampling, we obtained all required permits from the Ministry of Environment and from the local governments. Two voucher specimens were prepared for each population and were deposited in the National Institute of Natural Resources herbarium (voucher numbers, NIBRVP0000556137-NIBRVP0000556141; NIBRVP0000601487-NIBRVP0000601500). Leaf samples were stored in plastic bags with silica-gel desiccant until DNA extraction. Genomic DNA from dried leaves was isolated using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) (2019) 9:9757 | https://doi.org/10.1038/s41598-019-46275-9 www.nature.com/scientificreports www.nature.com/scientificreports/ following the manufacturer's protocol. We measured the quantity and quality of extracted DNAs in a NanoDrop ND1000 (Thermo Fisher Scientific, Massachusetts, USA; quality cutoff, OD 260/280 ratio between 1.7-1.9) and visualized in 1% agarose-gel electrophoresis. The isolated DNAs then were stored at −20 °C until further use.
Nuclear and chloroplast microsatellite genotyping. PCR amplification for 13 microsatellite markers (nrSSR) was conducted to genotype all 158 individuals using primer sets developed by Kim et al. 21 . PCR reactions were performed in a 25 μl volume containing 2.5 μL of 10× Ex Taq buffer (TaKaRa Bio, Otsu, Japan), 2 μL of 2.5 mM dNTPs, 0.01 μM of each of the forward and reverse primers, 0.1 μL of Ex Taq DNA polymerase (5 units/ μL) (TaKaRa Bio), 5-10 ng of template DNA, and distilled water (Sigma-Aldrich Co., St. Louis, Missouri, USA). The PCR cycling conditions were as follows: an initial denaturation step at 98 °C for 5 min followed by 30 cycles of denaturation at 95 °C for 1 min, annealing at 55-57 °C for 1 min, and an extension at 72 °C for 1.5 min, with a final extension step at 72 °C for 10 min. The fluorescently labelled (HEX, FAM) PCR products were pooled with the size standard GS500LIZ (Applied Biosystems, USA) and the amplified fragments were separated out in an ABI 3730XL automated sequencer (Applied Biosystems, USA). Microsatellite profiles were examined on GeneMarker program v. 2.40 (Softgenetics LLC) with automated scoring. The scoring results were manually checked in the final step. We evaluated presence of null alleles and possible scoring errors using FreeNA 43 . Additionally, we amplified chloroplast microsatellites (cpSSR) for 50 individual genotypes subsampled from the total of 158 samples genotyped by nuclear microsatellites using three universal primer pairs 44 and three primer pairs developed for the allied genus Maianthemum in the family, Asparagaceae 45 . Of the six primer pairs, three were successfully amplified and polymorphic at the population level. We assayed genetic variation with the three cpSSRs for all 158 genotypes and used the variation for further analyses. Data analysis. Given the possibility of clonal propagation, we first tested for clonality to avoid presence of multiple clones within each population. All 158 samples were assigned to genets based on the 13 nrSSRs in GenoDive v. 2.0b23 46 . To discriminate true clones from clones produced through sexual reproduction by chance, the probability of identical MLG due to sexual reproduction by chance was calculated using the binomial probability function Psex 47 . Because there were only a few identical clones, we did not account for clonality in the subsequent analyses. We estimated the genetic diversity parameters; He, Ho, Na and F IS using Arlequin v. 3.5 and GENALEX v. 6.502 48,49 . We tested for significant deviation from HWE and LD in 13 microsatellite loci within each population using Fisher's exact test 50 in Arlequin. Bonferroni corrections for multiple comparisons (adjusted P values) were applied. Four of thirteen microsatellite markers significantly deviated from HWE and/or were not independent of the remaining markers. Certain genetic analyses, such as STRUCTURE and ABC assume HWE and/or independence among markers. To avoid bias influenced by the number of markers used between different analyses, we screened out those 4 markers from all downstream analyses. We calculated pairwise F ST between all population pairs in Arlequin with 1,000 permutations for the significant alpha. We tested for Isolation by Distance based on the pairwise genetic distance (Slatkin's linearized F ST = F ST /(1 − F ST )) and log-transformed Euclidean distance for all population pairs in GENALEX 49,51 . For statistical significance of the correlation coefficient (r), 1,000 random permutations with replacement were used.
Haplotype diversity of the cpSSRs was assessed as the total number of haplotypes (Nh), haplotype diversity (Hehap) and pairwise genetic differentiation among populations (Φ PT ), and the analogue of GST for haplotype data (Nei's coefficient of gene differentiation; Nei, 1973) in GENALEX 49 . We used 1,000 permutations with replacements for the significance tests. The hierarchical distribution of haplotype variation among four regional groups (Korea, China, Taiwan, and Japan) was evaluated by analysis of molecular variance (AMOVA) implemented in Arlequin 48 . Given the limited variation found in our haplotype analysis, we examined genetic structures only by plotting the distributions of haplotypes using a median-joining network in Network 5 (http:// www.fluxus-engineering. com). Alleles were recoded as size of fragments, where 1-bp size change was given a weight of 1 52 .
Population structures was examined using a Bayesian model-based clustering approach with the correlated allele frequencies model 53 implemented in STRUCTURE v. 2.3.4 54 . As described above, we screened out four microsatellite markers that were violating the assumptions of panmixia and independence among markers and used the admixture model. Ten repeats of independent runs were conducted with 1,000,000 MCMC iterations following 100,000 steps as burn-in for each K from 1 to 8. We inferred the optimal number of clusters, K based on ΔK as estimated following Evanno et al. 55 in STRUCTURE HARVESTER v. 0.6.94 56 . We used CLUMPP v. 1.1.2 to summarize the individual ancestry coefficients from 10 STRUCTURE repeats with the greedy option 57 . The summary of the results was visualized in DISTRUCT v. 1.1 58 . We also performed a Principal Coordinate Analysis (PCoA) on the pairwise Nei's genetic distance estimated for all 158 individuals using GENALEX. AMOVA was used to partition molecular variance hierarchically within and between clusters defined from both STRUCTURE and PCoA in Arlequin. We compared the estimated values against values calculated from 1,000 resampled data sets for statistical significance.
We examined reduction in population size derived by historical and recent bottlenecks using the Garza-Williamson index (G-W index; M-ratio) implemented in Arlequin and BOTTLENECK v. 1.2.02 59,60 , respectively. Because bottleneck events are assumed to reduce the allele number more rapidly than the allele size ranges, the M-ratio test uses the ratio between the number of alleles and the allele size range 60 . BOTTLENECK detects recent population decline within a few generations by finding a significant excess or deficit of heterozygosity relative to an equilibrium state 61 . A simulation study revealed that the M-ratio is better suited for identifying a historical bottleneck than BOTTLENECK 29 . We ran BOTTLENECK under the infinite allele model (IAM) and stepwise mutation model (SMM) with Sign and Wilcoxon's sign rank tests for statistical significance. We also estimated the proportion of migrants among the eight populations within the recent past (the last few generations) www.nature.com/scientificreports www.nature.com/scientificreports/ using BAYESASS 3.0.1 62 . We used 10,000,000 MCMC iterations with a 1,000,000 burn-in and, then sampled every 2,000 generations during the process. The default settings were used for the mixing parameters.
The most probable evolutionary history of A. cochinchinensis in Korea was determined through the ABC model approach implemented in DIYABC v. 2.1.0 63 . For the ABC computation, we used all 12 microsatellite loci by combining nine nrSSRs and three cpSSRs. Using both cytoplasmic and nuclear molecular data would allow more insights into admixture among genetically distinct lineages. We generated nine likely evolutionary scenarios that summarize the divergence and admixture history among the five Korean populations and the three populations of neighboring countries. (Fig. 4). We formulated a "ghost" population to account for an un-sampled genetic source that might have contributed to divergence patterns in the Korean populations 64 . To simplify the models and ease computational challenges, we delimited the populations to 6 clusters based on the clustering pattern resulting from STRUCTURE, PCoA and the geographic data. Prior to the final nine scenarios, two preliminary DIYABC runs with various numbers of scenarios were conducted to finalize the scenarios to be investigated. We eliminated scenarios that were not biologically justified and chose the final nine plausible evolutionary scenarios (see Appendix S1 for detailed descriptions of the scenarios).
For the nine nrSSRs, we used the default priors of the mutation rate, whereas the prior values of the mutation rate for the three cpSSRs were changed to achieve better posterior distributions (Table S5). Given much lower mutation rates of cpSSRs than nrSSRs as estimated from former studies 37,65 , we scaled the mutation rate of the cpSSRs to be low (Table S5). We used the following summary statistics for each population: mean expected heterozygosity, mean allele size variation and mean number of alleles. For two sample summary statistics, F ST , the classification index and the genetic distance were used. We set uniform priors on all demographic parameters (Table S63). As LGM strongly influenced the current distribution of plant species in north east Asia 38 , the historical divergence time (t2) for the species might have dated back to as early as 21,000 years using a generation time of three years (see Vivian-Smith & Gosper 10 for the generation time). To retain a clear signature for the posterior distribution of each parameter, we repeated the DIYABC runs five times with different mutation models and parameter sets, particularly for divergence times (t1 and t2) with the finalized nine scenarios. A simulation of each scenario was performed for 3 × 10 6 iterations based on neutral coalescence.
We determined the most probable scenario based on posterior supports of each scenario estimated by two approaches: (1) direct estimates by the frequency of a given scenario with the 500 data sets generating summary statistics that most closely matched the observed data and (2) logistic regression 63 . We determined the demographic parameters of the most probable scenario from 10,000 simulated datasets closest to the observed data. We assessed the goodness-of-fit for each model by the Principal Component Analysis (PCA) approach implemented in DIYABC using by model checking function. The function evaluates the discrepancies between the simulated and the observed data. To determine the level of confidence of the chosen scenario, we estimated Type I and Type II errors using the confidence option in DIYABC with a simulation of 500 datasets. Type I error for scenario 7, the scenario of choice, was estimated as the probability at which scenario 7 is rejected when it is the true scenario. Type II error was calculated as the probability of choosing a scenario when it is not the true scenario.