Introduction

The Southwestern China, with 13 000 vascular plant species (López-Pujol et al., 2011), constitutes one of the world’s major biodiversity hot spots (Myers et al., 2000). Approximately 29.2% of plant species are endemic to this region (López-Pujol et al., 2011). Of these endemic plant species, some are defined as ‘palaeoendemic lineages’ whose historical distribution was much wider and today is restricted to a few, often disjunct refugia. Designing effective conservation actions for these narrow palaeoendemics often requires knowledge on the contemporary population structure of the species, along with an understanding of the main processes shaping it (Schwartz et al., 2007). It is now well established that, even within the glacial sanctuary for plants in Southwestern China, Pleistocene climate cycling had an important influence on their distribution, genetic structure and phylogeographic patterns (for example, Qiu et al., 2011; Liu et al., 2012). Besides, all forest stands in Southwestern China suffered some kind of land-cover change since the mid-twentieth century because of logging and clearing for agriculture (Zhang, 2000), including the destruction of natural habitat of subtropical plant species (Long et al., 2003), as possibly revealed by distinct genetic signatures of population bottlenecks, reduced within-population genetic diversity and/or elevated levels of inbreeding (Dubreuil et al., 2010). Regardless of the widespread recognition that both contemporary and historical processes may play important roles in shaping contemporary genetic patterns, it has proved challenging to disentangle their respective contributions (Zellmer and Knowles, 2009). Various approaches have been used to disentangle the relative role of contemporary and historical processes, including utilizing historical (museum) samples to compare the genetic variation before and after the decline (Gutiérrez-Rodríguez et al., 2011), or using different types of analytical methods (Johnson et al., 2009; Chiucchi and Gibbs, 2010) to test the historical and contemporary gene flow or effective population size (Ne) based on extant samples. Recently, a very powerful and flexible approach, approximate Bayesian computation (ABC), has been developed to estimate demographic and historical parameters (for example, effective population size, divergence time) and quantitatively compare alternative scenarios (Beaumont, 2010). In addition, it should be stressed that molecular markers with different rates of substitution can capture signatures of historical and contemporary processes, and could therefore help to determine the relative contribution of postglacial migration patterns and more recent gene flow to the genetic structure. However, no studies have so far attempted to disentangle the relative contributions of long-term (Late Quaternary) range fragmentation vs more recent (human-induced) deforestation to the spatial distribution of genetic diversity of paleoendemic plant species from Southwestern China. In this study, we investigated the influence of past climate changes and contemporary anthropogenic disturbance on population genetic structure and connectivity of Dipteronia dyeriana, a paleoendemic species.

Dipteronia Oliv. (Angiosperm Phylogeny Group (APG), 1998; McClain and Manchester, 2001) is one of the numerous Cenozoic woody relict genera of the North Temperate floristic region that were once more species rich and widely distributed in both Asia and North America during the Paleocene and Paleocene–Oligocene, respectively (McClain and Manchester, 2001; Manchester et al., 2009). Today, however, the genus just contains two extant species, D. sinensis Oliv. (Oliver, 1889) and D. dyeriana Henry (Henry, 1903), both of which are hypothesized to be palaeoendemisms in mainland China (Manchester et al., 2009). Phylogenetic results are congruent with the palaeoendemic status of two extant species (Renner et al., 2008). The extant distribution of D. dyeriana, the focal species of the present study, is restricted to the southeastern part of Yunnan Province (6 km2) (Figure 1), although its occurrences are historically documented in both south-central Yunnan and the neighboring Province of Guizhou, but now extinct (Zhang, 2000; Su et al., 2006). In contrast, D. sinensis has a wider allopatric distribution in central and southern China (McClain and Manchester, 2001). D. dyeriana is a diploid (2n=18), deciduous shrub or small tree, up to 10–15 m tall, with pinnately compound leaves (Ying et al., 1993). Currently, there are only five natural populations of D. dyeriana in southeast Yunnan, that is, three in the county of Wenshan, and one each in the counties of Mengzi and Pingbian, respectively (Figure 1). However, there is no documented evidence that forest-clearing activities of humans since the mid-twentieth century have contributed to population extinction and/or disruption of population connectivity of this species. D. dyeriana is andromonoecious with flowering from April to May and fruiting between August and October. The species, at least in part, reproduces by insect-mediated outcrossing. These winged fruits are apparently dispersed by wind (Tian et al., 2001). As to its overall conservation status, D. dyeriana is listed by both the Chinese Plant Red Book (Wang and Xie, 2004) and the IUCN (International Union for Conservation of Nature; Sun, 1998) as being ‘endangered’. The geographic patterns of genetic variation in five and three natural populations of D. dyeriana have been surveyed previously for, respectively, inter-simple sequence repeats (ISSRs) (Qiu et al., 2007) and random amplified polymorphic DNA (RAPD) markers (Li et al., 2005). The results revealed high genetic differentiation (ISSRs: GST=0.375; RAPD: GST=0.421) and low within-population diversity (ISSRs: 0.091 for Nei’s gene diversity; RAPD: 0.159 for Nei’s gene diversity). This information has been critical in the conservation of this species. However, because of their dominant nature, previous ISSR and RAPD data failed to provide detailed insights into the timescales of population divergence and the roles of contemporary vs historical processes in determining the current patterns of disjunct population remnants and genetic variation.

Figure 1
figure 1

Location of the five extant natural populations of Dipteronia dyeriana in Yunnan Province, China. Also shown is the geographic distribution of 10 cpSSR haplotypes (H1–H10) found in 74 individuals from these 5 populations.

Here, we gathered both nuclear (codominant) and chloroplast microsatellite (SSR) data from the five natural populations of D. dyeriana to gain further insights into the historical and contemporary evolutionary processes shaping the genetic architecture of this paleoendemic species. This will allow us to assess the effects of anthropogenic habitat modification on this taxon, thereby informing conservation efforts towards this species. Our specific goals were to (1) characterize the overall patterns of genetic diversity as well as range-wide genetic structure of D. dyeriana, (2) investigate the levels of recent and historical gene flow among populations and (3) test several alternative hypotheses regarding the demographic history of the species (including changes in population size and diversification among populations over evolutionary time) using coalescent-based ABC methods.

Materials and methods

Sample collection

Silica-dried leaf material of D. dyeriana was obtained from all presently known natural populations (WZ, WM, WC, PB, MZ; Figure 1; Table 1). Approximately 14 to 15 plants were sampled from each site, with 74 individuals in total (Table 1), and these samples were previously surveyed for ISSRs (Qiu et al., 2007). Voucher specimens representative of all populations sampled are stored at the Herbarium of Yunnan Laboratory for Conservation of Rare, Endangered and Endemic Forest Plants (YCP) (Qiu et al., 2007). All of these 74 individuals were employed for both nuclear microsatellite (nSSR) and chloroplast microsatellite (cpSSR) analysis.

Table 1 Geographic and genetic characteristics of the six Dipteronia dyeriana populations studied

Chloroplast and nuclear microsatellite genotyping

Total genomic DNA was extracted from the dried leaf tissue using DNA Plantzol Reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s protocol. We PCR amplified 10 cpSSR regions (Ycf5, Ycf5-2, PsbC-TrnS, Rrn5-TrnR, RpoB, Rps3-Rp122, TrnR-Rrn5, Rp12-Rp123, TrnL-16SrRNA and TrnL) and 13 nSSR loci using primers developed specifically for Dipteronia (Yang et al., 2008) and D. dyeriana (Chen et al., 2011), respectively. Primers were labeled with either 6-FAM or HEX fluorescent dyes (Applied Biosystems, Foster City, CA, USA). PCR amplifications were performed on a GeneAmp9700 DNA Thermal Cycler (Perkin-Elmer, Waltham, MA, USA) using protocols described in He et al. (2012) and Chen et al. (2011), and PCR products were separated on a MegaBACE 1000 (GE Healthcare Biosciences, Pittsburgh, PA, USA). Alleles were scored manually with the aid of GENETIC PROFILER (version 2.2; GE Healthcare Biosciences).

Genetic variation and population differentiation at chloroplast SSR loci

Because of the non-recombining nature of the chloroplast genome, different alleles at each of the cpSSR loci in each individual were combined into a haplotype. For each population, we computed haplotypic diversity (Hcp) (Pons and Petit, 1995), the number of chlorotypes (nacp) and rarefied haplotypic richness (HR) (standardized for 14 individuals) using the program CONTRIB (version 1.02; Petit et al., 1998). Population structure was analyzed by comparing two coefficients of population divergence, GST and RST, using the program PERMUT and CPSSR (version 2.0; Pons and Petit, 1996). GST is only based on haplotype frequencies, whereas RST takes into account both haplotype frequencies and haplotype similarities. Significantly higher RST versus GST values (1000 permutations; P<0.05) indicate that genealogically closely related haplotypes tend to occur together within populations, implying the presence of phylogeographical structure (Pons and Petit, 1996).

Genetic variation and population differentiation at nuclear SSR loci

CERVUS (version 2.0; Kalinowski et al., 2007) and FSTAT (version 2.9.3.2; Goudet, 2012) were used to calculate, for each nSSR locus, the number of alleles (A), observed heterozygosity (HO), expected heterozygosity (HE) and the within-population inbreeding coefficient (FIS). The significance of departures from Hardy–Weinberg equilibrium (HWE) was tested using a Markov chain (dememorization 1000, 100 batches, 1000 iterations per batch) in GENEPOP (version 4.2; Rousset, 2008). Null allele frequencies were obtained using the program FREENA (Chapuis and Estoup, 2007) following the Expectation Maximization method described by Dempster et al. (1977). For each population, we estimated two measures of genetic variability, that is, allelic richness, AR (standardized for 14 individuals), HO and H E, as well as FIS, across all loci by the software FSTAT (version 2.9.3.2; Goudet, 2012). Using the same program, differentiation among populations was calculated as FST (Weir and Cockerham, 1984).

Genetic subgroups in the entire nSSR sample were identified using the Bayesian clustering model implemented in STRUCTURE (version 2.3.4; Pritchard et al., 2000). This model infers population structure by clustering individual multilocus genotypes into a given number of populations (K) by minimizing Hardy–Weinberg and linkage disequilibrium (Gao et al., 2007). This Bayesian analysis was performed under the admixture model, assuming independent allele frequencies among populations. The number of K was set to vary from 1 to 5. For each value of K, we performed 10 runs with a burn-in of 104 and a run length of 105 iterations. The posterior probability of the data (lnP(D)) for a given K was computed using the runs with the highest probability for each K (see Pritchard et al., 2000). In addition, we used the method suggested by Evanno et al. (2005) to estimate a parameter (ΔK) that evaluates the second-order rate of change of the likelihood function with respect to K. To further explore the genetic relationships between populations, genetic distances (DA, Nei et al., 1983) among all pairs of populations were calculated from allele frequencies using POPTREE2 (Takezaki et al., 2010). The resulting DA matrix was subjected to a cluster analysis using the unweighted pair-group method with arithmetic means (UPGMA), and bootstrap values from 5000 replicates was performed (Sneath and Sokal, 1973). Isolation-by-distance (IBD) effects were tested based on pairwise geographical and genetic distances (FST) with the Isolation by Distance Web Service (Jensen et al., 2005) (http://ibdws.sdsu.edu/~ibdws/), and running 10 000 permutations.

Estimating contemporary and historical gene flow

We estimated ‘contemporary’ interpopulation migration rates (over the past few generations; mc, fraction of immigrant individuals) using a Bayesian approach in BAYESASS (version 3.03; Wilson and Rannala, 2003). We took contemporary timescale to be <5 generation for D. dyeriana. This method allows for deviation from HWE but assumes linkage disequilibrium and constant migration rates for two generations before sampling. To maximize the log likelihood values, Δvalues were adjusted to optimize terminal proposed changes between chains (40–60% of the total iterations) to ensure sufficient parameter space was searched (Wilson and Rannala, 2003). Each Markov chain Monte Carlo run was performed with 3 × 106 iterations after a burn-in of the first 106 iterations. Model convergence was assessed by comparison of posterior probability densities of inbreeding coefficients and allele frequencies across 10 replicate runs, each with a different initial seed. We compared the log-probability of the each run with the program TRACER (version 1.4.8; Rambaut and Drummond, 2007). The Bayesian deviance measure was used to determine the run that displayed the best model fit (Spiegelhalter et al., 2002). Historical gene flow (much longer period of time, 4 Ne generations in the past; Beerli and Felsenstein, 2001) was estimated using Bayesian inference in MIGRATE (version 6.0; Beerli, 2008). We estimated mutation-scaled rates of migration (M; M=mh/μ, where mh is the historical migration rate and μ is the mutation rate per generation (estimated for nSSRs as 10−3; Udupa and Baum 2001)). We ran MIGRATE using a Brownian motion mutation model with constant mutation rates for all loci and starting parameters based on FST calculations. We used slice sampling and uniform prior distribution to estimate M (range=0–1000, mean=500, and delta=100). Following a burn-in of 50 000 iterations, each run visited a total of 1 000 000 parameter values and recorded 20 000 genealogies at a sampling increment of 50. We used a static heating scheme at four temperatures (1, 1.5, 3 and 6) to efficiently search the genealogy space. In the results, we report the mean and 95% credible intervals for mh.

ABC analysis of population demography

To infer the population demography of D. dyeriana we used ABC approach. ABC enables us to compare the different population demographic models and to estimate their parameters without calculating complex likelihood (Beaumont, 2010). Five populations were divided into two genetic groups (WC, WM and WZ for east group; MZ and PB for west group) according to the result of the STRUCTURE analysis (see details in the Results section). First, four different population size change models (Figure 2), which were nearly the same as used in Tamaki et al. (2016), were built and were applied to the each group (for detailed description of the four models see Supplementary Appendix S1). Generalized stepwise mutation model was used for the mutation model of nSSR and cpSSR (Estoup et al., 2002). Generalized stepwise mutation model has two parameters, mutation rate per generation (μ) and geometric parameter for GSM (P). P ranges from 0 to 1 and represents the proportion of mutations that will change the allele size by more than one step. A value of zero means a strict stepwise mutation model (SMM). For nSSR, we did not use six loci that contained high frequencies of null alleles (DAG14, DAG38, DAG88, DAG327 and DAG368, see Results) and large indel (DAG307) and only used remaining seven loci in this analysis. We simulated seven independent loci whose upper limit in repeat number was set to up to 30. Mean μ among seven loci was fixed into 10−3 (Udupa and Baum, 2001) and each locus value of μ was randomly drawn from Gamma (shape, rate) in every simulation. Prior distribution of the shape parameter was drawn from Uniform (0.5, 5) and the rate parameter was then calculated by shape/mean μ. Prior distribution of mean value of P was drawn from Uniform (0, 1) and each locus value was randomly drawn from β (a, b). The value of a and b were calculated by 0.5+199 × mean P and a × (1−mean P)/mean P, respectively, according to Excoffier et al. (2005). For cpSSR, we simulated five linked regions whose upper limit in repeat number was set to up to 5. Prior distributions of μ was drawn from log-uniform (10−5, 10−3), because reported values of μ in cpSSR was much lower than those in nSSR (Provan et al., 1999). P was set to 0 and SMM was assumed. The same values of μ and P were used for all five linked regions. Therefore, all four models have three additional free parameters related to the mutation model, shape for nSSR, mean P for nSSR and μ for cpSSR.

Figure 2
figure 2

Compared four population size change models that were applied for each east and west group. Model 1, standard neutral model; model 2, exponential growth model; model 3, instantaneous size change model; and model 4, exponential growth after instantaneous size change model. NCUR, current effective population size in the number of diploid individuals; G, growth rate (NT=NCUR × exp(G × T)); T, time when the population size changed and is scaled in generations; NANC, ancestral effective population size in the number of diploid individuals.

All priors were generated using R (version 3.3.1; R Core Team, 2016) and simulations were conducted using FASTSIMCOAL2 (version 2.5.2.21; Excoffier and Foll, 2011). When simulating cpSSR, effective population size in cpSSR was set to half of that in nSSR because D. dyeriana was andromonoecious and all individuals could become a maternal tree. Therefore, the values of 2 × NCUR and NCUR, that is, numbers of gene copies, were set to the coalescent simulator for nSSR and cpSSR, respectively. Simulations were repeated 106 times and summary statistics were calculated using ARLSUMSTAT (version 3.5.2; Excoffier and Lischer, 2010) and our own R script for nSSR and cpSSR, respectively, in each model and group. Average and s.d. of number of alleles, expected heterozygosity and allele size range were used for summary statistics in nSSR. Number of haplotypes and gene diversity were used for summary statistics in cpSSR. Therefore, a total of eight summary statistics was used for the following analysis. Tolerance rate was set to 0.001 and 1000 data sets nearest to the observed data set were used for model comparison and parameter estimation. Neural network regression method implemented in ABC package (version 2.1; Wegmann et al., 2010) was used for calculating posterior probability of the models and posterior distribution of the parameters (Csilléry et al., 2012). Logistic transformation of parameters were applied when the estimation of the posterior distributions of them to keep the adjusted values within a prior range. Posterior mode and 95% highest posterior density were calculated using CODA package (version 0.18; Plummer et al., 2006). To convert the time parameters from generation to year, we used 20 years per generation.

Second, we built the four population divergence models (Figure 3; for detailed description of the four models see Supplementary Appendix S1). For the mutation model, although there were three free parameters, shape, mean P and μ for cpSSR, in the population size change analysis, we fixed the mean P and μ for cpSSR into 0.6 and 2.2 × 10−4, respectively, according to the results of the population size change analysis (see details in the Results section). Prior distribution of shape parameter was also drawn from Uniform (0.5, 5). Therefore, all population divergence models have one additional free parameter related to the mutation model, shape. Simulations were conducted as like a population size change analysis. We also calculated an average FST among 7 nSSR loci and FST of cpSSR haplotypes for summary statistics, and a total of 18 summary statistics were used for model comparison and parameter estimation. Model comparison and parameter estimation were also conducted as like a population size change analysis. Finally, to verify the model fitting for the observed data, posterior predictive simulations using 1000 samples randomly drawn from the posterior distribution were conducted in both population size change and divergence analyses. Summary statistics were calculated and were compared with the observed data.

Figure 3
figure 3

Compared four population divergence models. NCUR_E and NCUR_W, current effective population sizes in the number of diploid individuals for east and west groups, respectively; G, growth rate (NT=NCUR × exp(G × T)); TDIV, divergence time that is scaled in generations; mEW and mWE, migration rate per generation from east to west and from west to east for backward in time, respectively; NANC, ancestral effective population size in the number of diploid individuals. Fixed values were used for parameters shown in bold (NCUR_E and NCUR_W=1100, and G=−0.0084).

Analysis of recent population bottlenecks

We used Wilcoxon’s signed rank test vs the mode-shift test implemented in BOTTLENECK (version 1.2.02; Piry et al., 1999) to detect population declines over extended vs more recent time scales (2Ne−4Ne generations in the past vs a few dozen generations ago; Luikart and Cornuet, 1998). In Wilcoxon’s test, recently bottlenecked populations are assumed to exhibit higher observed heterozygosity than expected under mutation–drift equilibrium. In the mode-shift test, non-bottlenecked populations at mutation–drift equilibrium are expected to have a larger proportion of low-frequency alleles (>10%) compared with those at intermediate frequency (‘L-shaped’ distribution), whereas in bottlenecked populations the situation should be reversed (‘shifted mode’ distribution; Luikart and Cornuet, 1998; Spencer et al., 2000). We performed 10 000 simulations under the SMM and the two-phase model, with a variance of 12 as recommended by Piry et al. (1999). Significant P-values from Wilcoxon’s test and/or ‘shifted mode’ distributions were taken as evidence of bottlenecks.

Results

Genetic diversity and differentiation for cpSSRs

Ten cpSSR markers were used to genotype 74 individuals from 5 populations of D. dyeriana, yet only 5 markers (Ycf5, Ycf5-2, Rrn5-TrnR, RpoB and Rps3-Rp122) showed polymorphism, and the other 5 loci (PsbC-TrnS, Rp12-Rp123, TrnL-16SrRNA, TrnL and TrnR-Rrn5) showed monomorphism. Here, we selected the five polymorphic loci for further analysis of variation.

When combined, the 5 polymorphic cpSSR markers identified 10 haplotypes (H1–H10; Supplementary Appendix S2; Figure 1), resulting in high total haplotypic diversity with Hcp=0.627 at species level. The haplotypic diversity per population (Hcp) ranged from 0.257 (WZ) to 0.867 (MZ), with an average of 0.607 (Table 1). Similarly, we observed major variations in haplotypic richness HR among populations, from 1.067 (WZ) to 4.383 (MZ), with an average of 2.757. Populations from western group (MZ, PB) had on average higher value of haplotypic diversity (Hcp=0.676) than those from eastern group (WZ, WM, PC with the mean Hcp was 0.449). H1 was present in all populations, and most other haplotypes were found in two (that is, H4; H5, H8, H9) or more populations (that is, H2, H3, H5), except two (H7, H10) restricted to (the most diverse) population MZ. Overall levels of population differentiation were very low (GST=0.062, RST=0.023), whereby RST was not significantly greater than GST (P=0.732>0.05) that thus rejected the presence of a phylogeographic structure at cpDNA in D. dyeriana.

Genetic diversity and differentiation for nuclear microsatellites

Screening all 74 individuals of D. dyeriana at the 13 nSSR loci revealed a total of 208 alleles, with 9 to 38 alleles per locus (Supplementary Appendix S3). The observed (HO) and expected (HE) heterozygosity per locus over all populations ranged from 0.068 to 0.649 and from 0.391 to 0.952, respectively. Of the 13 loci, 5 (DAG88, DAG14, DAG368, DAG327 and DAG38) significantly deviated from HWE after Bonferroni correction (Supplementary Appendix S3), and exceeded the threshold null allele frequency (ν=0.15) across all five populations. As the presence of null alleles is known to cause bias in estimators of genetic diversity and differentiation (Chapuis and Estoup, 2007), the five non-HWE loci with high frequencies of null alleles were excluded from subsequent analyses.

At the population level, average estimates of genetic diversity were generally high (AR=2.140, HE=0.540, HO=0.450), being lowest in population WZ (1.598, 0.308 and 0.359) and highest in population MZ (2.522, 0.708 and 0.735; Table 1). Populations from western group had on average higher levels of diversity (for example, HE=0.829) than those from eastern group (HE=0.460). The values of inbreeding (FIS) ranged from 0.036 to 0.318, with an average of 0.160 (Table 1). The overall FST value (0.140) was significantly different from zero at the species level, indicating moderate genetic differentiation among populations (Balloux and Lugon-Moulin, 2002).

In the STRUCTURE analysis, the true number of gene pools (K) in the data were not straightforward to determine from the posterior probability of the data, as values of ln P(D) increased progressively from K=1 to K=9 (Supplementary Appendix S4). In contrast, the ΔK statistic suggested a rate change in ln P(D) corresponding to K=2 (Supplementary Appendix S4). At K=2, all populations were separated into two clusters. Cluster I (red) was present at high frequency (87%) in three eastern populations (WC, WM, WZ), whereas the majority of individuals (84%) from two western populations MZ and PB were assigned to cluster II (green; Figure 4). In the UPGMA phenogram (Supplementary Appendix S5), populations tended to cluster according to geography: the three adjacent populations from the east (WC, WM, WZ) formed a strongly supported cluster (93% bootstrap support), whereas those in the west (PB, MZ), being further apart from each other (see Figure 1), were also genetically more distinct. Mantel tests of IBD revealed no significant correlation between geographical and genetic distances (r=0.796, P=0.968).

Figure 4
figure 4

Histogram of the STRUCTURE analysis for the model with K=2 (showing the highest ΔK). Each color corresponds to a suggested cluster, and a vertical bar represents a single individual. The x axis corresponds to population codes. The y axis presents the estimated membership coefficient (Q) for each individual in the different clusters.

Contemporary and historical migration rates

Multiple runs of BAYESASS yielded low levels of contemporary gene flow (mc, fraction of individuals that are immigrants) among most populations (Table 2). Of the 20 pairwise estimates, only three showed some evidence of moderate contemporary gene flow/migration (mc) between populations, with WZ serving as a main source of migrants (that is, from WZ to WC, WM and PB at mc=0.240, 0.268 and 0.247, respectively), whereas 17 had mc values less than 0.035 and their 95% confidence intervals overlapped zero (Table 2). Thus, when taken together, contemporary gene flow in D. dyeriana appears to be low to moderate and occur in an east to (north) west direction. In contrast, historical rates of migration (mh=Mμ), as calculated in MIGRATE, were mostly significantly different from zero and generally relatively high, ranging from 0.018 to 0.071 (Table 2).

Table 2 Mean (±95% CI) contemporary migration rate (mc, fraction of immigrant individuals) estimated from BAYEASS and historical migration rate (mh=Mμ) estimated from MIGRATE across the five natural populations of Dipteronia dyeriana

Population size change history

In the east group, model 2 showed the highest posterior probability (0.566; Table 3). Structural parameters of model 2, NCUR and G, showed a clear single peak (Supplementary Appendix S6). Although model 4 also showed relatively high value of posterior probability (0.345), common parameters of these two models, NCUR and G, showed nearly equal values 1100 and 8.4 × 10−3, respectively and trajectories of these two models showed nearly the same shape for the past 68 000 years (Supplementary Appendix S7). These two models indicated that east group had experienced rapid population growth from 15 000 years ago after quite low level of effective population size for more than 50 000 years ago. On the other hand, in the western group, models 1 and 3 showed almost the same levels of high posterior probabilities (0.394 and 0.373, respectively; Table 3). Structural parameters of models 1 and 3 showed a clear single peak except for T in the model 3 (Supplementary Appendix S8). NCUR for these two models showed similar value, 1100. Although model 3 was instantaneous size change model, the ratio between NCUR and NANC was only 1.58 (=1194/754), and hence they did not change (Supplementary Appendix S7). Therefore, the west group kept relatively stable effective population size for a long time. We confirmed that predictive simulations of the best-supported model in each group showed good agreement between observed and simulated data sets (Supplementary Appendices S9 and S10). In addition, regardless of the mutation model used (two-phase model or SMM), Wilcoxon’s test revealed no significant heterozygote excess in any population (all P>0.05; Supplementary Appendix S11), as expected in the absence of a historical bottleneck. Similarly, in the mode-shift test, all populations showed an L-shaped distribution of alleles, providing no evidence of a recent bottleneck.

Table 3 Posterior model probability (P) and posterior mode and 95% HPD of parameters for the compared four population size change models for the east and west populations

Population divergence history

Model IMWE, isolation with one directional migration model from the west to the east, showed the highest posterior probability of all compared models (0.737; Table 4). All posterior distributions of structural parameters of the model showed different shape from their priors (Supplementary Appendix S12). Posterior modes (95% highest posterior density) of NANC, TDIV and mWE were 1004 (33–2736) and 17.3 (6.7–1689.3) in kya and 0.0017 (0.0006–0.0029), respectively. We also calculated number of migrants per generation (NemWE) by multiplying NCUR (=1100) and mWE, and the value was 1.9 (0.7–3.2). We confirmed that predictive simulations of the best-supported model in each groups showed good agreement between observed and simulated data sets (Supplementary Appendix S13).

Table 4 Posterior model probability and posterior mode and 95% HPD of parameters for the compared four population divergence models

Discussion

Southwestern China is one of centers of temperate plant species diversity and endemism (López-Pujol et al., 2011) and well-known glacial refugia for a number of Tertiary relict trees (Zou et al., 2002; Qiu et al., 2011), where logging and agricultural expansion has fragmented lowland forest habitats. Our study of D. dyeriana, a critically endangered paleoendemic, provides insights into opportunities for ecological and genetic restoration efforts, and explores how past climate changes and contemporary anthropogenic disturbance influence the genetic structure and connectivity of D. dyeriana, in particular and other paleoendemic species more generally.

Patterns of genetic diversity

Plant species that are endemic, restricted to islands or have relatively small population sizes often have reduced within-population diversity compared with more common and widespread species because of drift, founder events and other stochastic processes (Frankham, 1997; Cole, 2003). In the present study, our nSSR-measured HE value (HE=0.540) is higher than average nSSR-based HS (the equivalent to HE in the present study) in endemic species (HS=0.42), as reviewed by Nybom (2004), and only slightly lower than the average nSSR-based HS in regional or widespread species (HS=0.65 and 0.62, respectively). However, our nSSR-measured HE value is comparable to other endemic, outcrossing plant species in East Asia, for example, Michelia coriacea (0.505; Zhao et al., 2012) and Kirengeshoma palmata (0.60; Yuan et al., 2014). As expected for codominant microsatellite markers with high mutation rates, D. dyeriana was found to exhibit much higher average-within population diversity at nSSRs than previously reported for anonymous dominant markers (RAPD: HS=0.159, Li et al., 2005; ISSRs: HS=0.093, Qiu et al., 2007). In terms of cpDNA diversity, D. dyeriana also retains a high level of genetic diversity within population as revealed by cpSSR analysis (Hcp=0.607; HR=2.757). The scarcity of population genetic studies based on chloroplast microsatellite data for trees or shrubs precludes a direct comparison of diversity statistics with those of the present study for this marker. However, the number of cpSSR haplotypes found in D. dyeriana falls between those observed in some more widely distributed angiosperms, such as Vitellaria paradoxa (7 haplotypes; Fontaine et al., 2004), Castanopsis hystrix (14 haplotypes; Li et al., 2007) or Saruma henryi (11 haplotypes; Zhou et al., 2010), although comparisons among different genera must be taken with caution.

High genetic diversity maintained in rare plants is attributable to a number of factors (Zawko et al., 2001), such as demographic stability, recent reduction of population size plus insufficient time for isolation or extensive, recurrent gene flow (Maguire and Sedgley, 1997; Chiang et al., 2006). A better comprehension of these factors is important to understand not only the evolutionary history of species, but also their conservation perspectives (Frankham, 1995). For D. dyeriana, the moderate genetic diversity within populations and overall is unlikely to be attributable to ongoing or recent contemporary (pollen and seed) (refer to the results of respective gene flow analyses in BAYESASS, Table 2), but this rather points at relative demographic stability and/or expansion over historical and even more ancient timescales (Figures 2 and 3), as then discussed in the next section. In addition, none of the five extant populations seems to have suffered from recent genetic bottlenecks (see below). These results support our idea that D. dyeriana populations have persisted in Southwestern China for a long time, including the last glacial maximum (LGM, 18 000 years before present). This may not be such a surprising conclusion because the latitude band in subtropical China between 23° N and 25° N, which encompasses the extant range of D. dyeriana (Table 1; Figure 1), has long been regarded as crucial refugia for warm-temperate (‘broad-leaved’) evergreen (WTE) forest species that resulted from southward range shifts during the LGM (Yu et al., 2000; Qiu et al., 2011). This role of refugia as long-term stores of genetic variation in plant species and, consequently, high levels of genetic diversity is widely recognized (Hewitt, 2000; Lee et al., 2006; Qiu et al., 2011; Sakaguchi et al., 2011). Some factors that may have contributed to maintaining these high levels of genetic diversity are the outcrossing mating system of the species, the long lifespan of adult plants and their high resprouting ability (Qiu et al., 2007). Considering groups of populations, western populations (MZ and PB) were found to harbor considerably more genetic diversity than eastern cluster (WC, WM, WZ) at both cpSSRs and nSSRs loci (Table 1). The larger long-term evolutionary effective population sizes in the former (see results of ABC) have likely contributed to their relatively higher diversity.

Values of FIS per population ranged from 0.036 to 0.318, with an average of 0.160 (Table 1), indicating an overall excess of homozygotes in most populations (except MZ) compared with HWE expectations. In fact, a highly selfing breeding system might exist in D. dyeriana that is known to be self-compatible (Qiu et al., 2007). Although D. dyeriana has samara borne in pairs and the seeds dispersed by winds, it is mainly distributed on the edges of WTE forest. Their fragmented occurrence in a forest margin habitat may act as a barrier to effective seed dispersal and reduce the potential for long-distance seed dispersal. Besides, insect pollination can also be restricted and may largely occur within populations (Zhang et al., 2006). Corroborating this, our BAYESASS analyses show that contemporary levels of gene flow between populations were low (Table 2). It is worthwhile to note that our field observations have revealed that seed production in natural populations of D. dyeriana is generally high, whereas seedling establishment is often surprisingly low (Z-Q Ouyang, unpublished data). This may indicate that local populations suffer from a deficit in outcross pollen, and inbreeding depression due to selfing or mating between close relatives (Qiu et al., 2007).

Population structure and contemporary vs historical gene flow

The inter-population differentiation assessed with microsatellite markers (FST=0.140) was lower than previous estimates using RAPD (GST=0.421; Li et al., 2005) and ISSRs (GST=0.375; Qiu et al., 2007). For nSSR marker, homoplasy because of the high mutation rate may increase the underestimation of differentiation (Estoup et al., 2002). On the other hand, the dominant property of ISSRs and RAPD markers is also responsible for a bias in the estimation of differentiation. Although ‘absolute values’ of differentiation may differ between markers, they lead to similar conclusions: (1) a high differentiation among populations was found both with codominant microsatellites and dominant RAPD/ISSRs markers at the small spatial scale; and (2) a predominant part of the total differentiation is due to the subdivision between the eastern and western part of the species’ range (see below). The congruent results obtained with different markers most likely indicate that marker differences (mutation rates, homoplasy, estimation biases) had only a minor impact on differentiation (Mariette et al., 2001).

Because of their uniparental inheritance, cytoplasmic genomes typically have a lower effective population size than nuclear genomes and are therefore expected to show a higher level of population differentiation (Birky et al., 1983). Moreover, as cytoplasmic genomes are maternally inherited in most plant species, the more limited dispersal ability of seeds compared with pollen may further enhance population differentiation of cytoplasmic loci relative to nuclear loci (Ennos, 1994). As a result, FST estimates for cytoplasmic markers are often higher than for nuclear markers (see, for example, Petit et al., 2005). However, in this study, D. dyeriana displays the opposite results. Population divergence based on the cpSSR data was unexpectedly low (GST=0.062). Relatively low levels of genetic differentiation in D. dyeriana at maternally inherited cpDNA markers can be attributed to either recent gene exchange between long-term isolated populations or recent fragmentation of a large population with recurrent gene flow. In the situation of long-term isolation, each separate population would have independently experienced genetic drift, and resulted in high frequencies of rare haplotypes in each population (Kimura et al., 2002). Alternatively, if the populations were recently separated, there would be insufficient time for genetic drift to have occurred in each small fragmented population, and most populations would share common haplotypes (Printzen et al., 2003). In line with the latter interpretation, except for two haplotypes (H7, H10) specific to population MZ, the remaining haplotypes were found to be shared among populations (Table 1; Figure 1). Therefore, the hypothesis of recent fragmentation of a large population with recurrent gene flow is highly probable. This hypothesis is also corroborated by the ABC analysis (see below).

In the STRUCTURE analysis (Figure 4), individuals from the eastern vs western part of the species’ range were largely clustered into two groups (K=2). In addition, the UPGMA analysis showed much closer genetic relationships among the eastern populations (WZ, WM, WC) compared with the more disjunct populations from the west (MZ and PB) (Supplementary Appendix S5). According to our ABC analysis, the divergence time of the two groups was estimated to 865 generations ago, or 17 299 years ago when we assume a generation time of 20 years for D. dyeriana. Estimating generation time remains difficult in forest trees with long reproductive spans. Although D. dyeriana bears first fruits at 10 years of age, it only reaches the canopy after 20 years at which stage the trees acquire their full reproductive potential (Ouyang et al., 2005). If we assume that the generation time is 25 years, then the divergence time is consistent with a climate-induced vicariant event at the LGM (Clark et al., 2009). Both STRUCTURE and BAYESASS analyses indicated that these populations from two groups are not completely isolated. STRUCTURE analyses showed admixture of two clusters in PB (Figure 4), whereas contemporary gene flow appears to occur in an east to west direction (Table 2). Similarly, our ABC analyses also revealed a unidirectional contemporary gene flow from eastern to western groups (Nem=1.9 (0.7–3.2)). This unidirectional contemporary gene flow is likely associated with recent postglacial range expansion of the eastern group (see below). Despite restricted contemporary gene flow between most pairs of populations, the historical rates of migration were mostly significantly different from zero and generally relatively high (Table 2). The historical estimates summarize gene flow over many generations that represents a period much longer than the time over which contemporary gene flow occurred. The observed inconsistency between contemporary and historical gene flows in this study suggests that habitat destruction mainly due to anthropogenic deforestation and land use, particularly from mid-twentieth century (Zhang, 2000), may have contributed to a reduction of population connectivity and increased genetic differentiation by ongoing genetic drift in isolated populations, possibly owing to decreased population size as a result of more recent bottlenecks (Zhang, 2000; Su et al., 2006). Yet, our BOTTLENECK analyses showed no evidence of a more recent bottleneck for all studied populations. Nonetheless, considering this species has a long generation time of 20 years and low historical effective population size (Table 3), it is possible that populations suffering a recent reduction in census size may not suffer a severe reduction in effective population size (a genetic bottleneck) (Pimm et al., 1989; Yuan et al., 2014). In fact, according to our ABC analyses, the effective population size of D. dyeriana is historically low because severe climatic oscillations during the last glacial–interglacial cycle have forced populations into small refugial areas (see below). We also found some evidence of inbreeding within D. dyeriana populations that may be because of small effective population size and limited contemporary gene flow.

Population history

Fossil pollen analyses have previously indicated that during the glacials, WTE forest was forced to retreat southward as far as 24° N on the Asiatic mainland (Yu et al., 2000; Harrison et al., 2001). The mountainous areas of subtropical China may have harbored refugial populations of WTE species through periods of adverse climatic conditions (Provan and Bennett, 2008; Qiu et al., 2011). Nevertheless, the increasingly colder and arid climates at the onset of the LGM (Zhang et al., 2005) would have caused further regression of subtropical forest, with grassland in the lowlands and open woodland in the upland regions (Adams and Faure, 1997; see also Malhotra and Thorpe, 2004). In line with this palaeovegetation-modeling predictions, our ABC analyses suggest the western and eastern groups of D. dyeriana likely persisted in a long-term refuge in Southern China since the beginning of the last glacial period 100 000 years ago, whereas increasingly colder and arid climates at the onset of the LGM might have fostered the fragmentation of D. dyeriana within refugia (Ouyang et al., 2001; Sun, 2002). Although both groups have similar current effective population size, western and eastern population groups experienced strikingly different histories following their initial divergence. For the western group, our ABC analyses suggest that this group kept relatively stable effective population size in a long-term refuge. Genetic evidence of persistence of D. dyeriana at the western margin of the Laojunshan Nature Reserve of Yunnan Province is noteworthy, where the geographical landscape features likely restricted WTE habitats during postglacial period. In contrast, the eastern group had experienced 500-fold population expansion from before 15 000 years ago. It could well indicate that this species actually benefited from WTE forest expansion as a result of the warmer and more humid climates promoted by a stronger Asian summer monsoon during the Holocene in China (Zheng, 2000; Wu et al., 2010). Recent population expansions have previously been inferred among several plant species from China, and generally are interpreted as postglacial expansions from southern LGM refugia (Huang et al., 2002, 2003; Su et al., 2005).

Conclusions and conservation implications

The patterns of genetic diversity and gene flow of D. dyeriana appears to reflect both the species’ natural population history and recent human impact. In line with this palaeovegetation-modeling predictions, coalescent-based ABC analyses suggest the western and eastern groups of D. dyeriana likely persisted in a long-term refuge in Southwestern China since the beginning of the last glacial period 100 000 years ago, whereas increasingly colder and arid climates at the onset of the LGM might have fostered the fragmentation of D. dyeriana within refugia. Following their divergence, the western group kept relatively stable effective population size, whereas the eastern group had experienced 500-fold population expansion as a result of the warmer and more humid climates promoted by a stronger Asian summer monsoon during the Holocene. Although clear loss of genetic diversity by human activities was not suggested by this study, recent habitat fragmentation has led to a reduction of population connectivity and increased genetic differentiation by ongoing genetic drift in isolated populations, possibly owing to decreased population size in recent dozen years.

Examining wide-scale genetic structure and population demography of the endangered tree species will provide useful information on conservation units and guidelines for restoration or plantation efforts in forest conservation and breeding programs of many tree species, especially economically important ones (Tsuda and Ide, 2005; Sutherland et al., 2010; Bodare et al., 2013). This information provided by the present study could also be relevant to the conservation of D. dyeriana. A moderate genetic diversity is detected at both the species and population levels that is evenly distributed within and among populations (Table 1). This seems to rule out genetic impoverishment and inbreeding depression as immediate causes of threat. According to the clusters found with nSSRs, we suggest that D. dyeriana should be managed as two separate units, corresponding to the WC, WZ, WM vs MZ, PB populations, respectively. The MZ-PB unit from outside the reserve preserves the highest genetic diversity (considering both nSSR and cpSSR markers; Table 1) and the largest numbers of private alleles (see also Qiu et al., 2007). Clearly, as these latter populations are facing a high risk of losing genetic variation because of ongoing habitat destruction with likely detrimental consequences for the long-term evolutionary viability of the entire species, conservation action is urgently required. Therefore, a broad genetic sample from the two units should be preserved by ex situ conservation programs (seed banks and botanic gardens). This will potentially allow future reintroductions or population reinforcements, whose success will heavily depend upon the genetic quality of the available ex situ sample (see, for example, Rita and Cursach, 2013; Fernández-Mazuecos et al., 2014). In addition, although currently large-scale transplantation of individuals may not be needed in order to mitigate inbreeding over most of the species’ range, the proposed conservation units would be informative for adaptive management of these natural resources. Moreover, it is required to evaluate the ecological dynamics of not only the target species but also the whole ecosystem associated to it and to design a more practical approach of ecosystem management in this biodiversity hot spot.

Data archiving

Data available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.t8q1g.