Introduction

Glacial cycles during the Quaternary period have played important roles in shaping the diversity patterns of contemporary populations (Hewitt 2000, 2004), even in the warm temperate zone where the land was not covered with ice sheets (Tsukada 1974; Minato and Ijiri 1976). Severe climatic oscillations drove repeated north–south and highland–lowland migrations of communities and ecosystems, thereby strongly affecting the genetic structure and the distribution of organisms they hosted (reviewed in Taberlet et al. 1998; Hewitt 1999, 2004). Thus, current geographic patterns of genetic diversity within species reflect their responses to expansions and contractions of their habitats associated with the glacial cycles, particularly the numbers, sizes, and locations of refugia; the timing and severity of bottlenecks; and the rates of recolonization.

Molecular phylogeographic studies of the genetic structure of extant populations have provided detailed insights into apparent glacial and postglacial distributions of numerous species (Avise 2000). Key issues for phylogeographic analyses include the numbers and locations of refugia, which are still uncertain, despite intensive recent investigations, especially in Europe (Provan and Bennett 2008; Svenning et al. 2008; Stewart et al. 2010) and North America (Soltis et al. 2006; Shafer et al. 2010). In Europe, Svenning et al. (2008) found indications suggesting that temperate trees were probably largely confined to the traditionally recognized southern refugia in and around Mediterranean lowlands, but several boreal tree species may have been widespread not only in these southern areas but also in several areas of northern Europe during the Last Glacial Maximum (LGM). Existence of previously unknown, or cryptic, northern refugia have also been proposed (Birks and Willis 2008; Provan and Bennett 2008; Rull 2009; Stewart et al. 2010; Parducci et al. 2012; Tzedakis et al. 2013; de Lafontaine et al. 2014), based on the phylogeographic patterns of several species together with evidence from pollen and plant macrofossils. The eastern edge of East Asia, including East China, Japan, and the Korean Peninsula, harbors diverse of the temperate flora, and the effects of the locations of refugia throughout Quaternary glacial cycles on the current genetic structure are important (Qiu et al. 2011). For East Asian temperate plants, the population fragmentation occurred caused by fluctuations in sea level throughout the Quaternary, and these taxa restricted to distinct refugia (Qiu et al. 2011). Recent phylogeographic studies of the warm temperate trees in this region also detected the multiple localized refugia (Huang et al. 2002; Cheng et al. 2005, 2006; Wu et al. 2006; Lee et al. 2013, 2014; Xu et al. 2015), though most of them were based on a single locus (chloroplast DNA). Tests of refugia models using multilocus genetic data based on estimated impacts of demographic events and the associated genetic processes to patterns of species’ genetic variation can help to understand more detailed history during and after the glacial periods.

Advances in population genetics have led to the development of statistical approaches based on coalescent models for estimating population genetic parameters, reconstructing demographic histories, and testing related hypotheses using Bayesian or likelihood-based frameworks (Knowles and Maddison 2002). In these approaches, a hypothesis is considered more probable than postulated alternatives if all the available data are more likely under the applied model and underlying assumptions (Hickerson et al. 2010). In addition, a more versatile tool for phylogeographic analysis, termed the approximate Bayesian computation (ABC), has been recently introduced. This “likelihood-free” method relatively easily enables the formulation of model-based inferences in a Bayesian setting for evaluating the demographic or evolutionary scenarios (Pritchard et al. 1999). Because it relies on only summary statistics (Beaumont et al. 2002) rather than evaluations of the overall probability of the data, we can make models flexibly to conform to the situations and compare them without much computational effort. ABC has been successfully used very recently to estimate the demographic parameters relating to various complex demographic scenarios and to assess the effects of the past climatic oscillations (Gao et al. 2012; Budde et al. 2013; Li et al. 2013; Liu et al. 2014a, b; Louis et al. 2014; Wang et al. 2016; Yoichi et al. 2016; Chen et al. 2017; Ren et al. 2017).

Following biogeographic scenarios have been proposed to describe the glacial and postglacial history of Japan’s warm temperate ecosystems based on fossilized pollen data and reconstructions of past climate. The pollen record indicates that refugial populations of broadleaved evergreen forests existed in the warmer southern end of Kyushu in the main islands as well as in the Ryukyu Islands and migrated northward from these southern refugia after the LGM (Tsukada 1974; Matsuoka and Miyoshi 1998). However, based on historical climate data and tolerance of cool temperature of the component species of these forests, some ecologists have proposed that these forests might also have survived in multiple refugia along the Pacific coasts of the main islands as well as on the southern end of Kyushu during the glacial periods (Maeda 1980; Kamei and Research Group for the Biogeography from Würm Glacial 1981; Nakanishi 1996; Hattori 2002). However, it has been difficult to resolve the detailed demographic history of the plant species currently inhabiting the broadleaved evergreen forests, especially Castanopsis-type forest, because of the limited fossilized pollen records and of the relatively slow molecular evolution of chloroplast DNA in plants (Wolfe et al. 1987; Chase et al. 1993; Aoki et al. 2005) and the extremely low levels of intraspecific variation in the chloroplast DNA of Japanese broadleaved evergreen species (Aoki et al. 2003, 2004a, 2016). Application of the ABC method to reconstruct the demography of the dominant species in the type of the forest can help to understand detailed history from the information of the locations of refugia, the diverged time and the population size changes.

In the present study, we examined the demographic history of Castanopsis sieboldii (Fagaceae), the dominant tree species in Japan’s broadleaved evergreen forests, by conducting the ABC method partially using the expressed sequence tag (EST)-associated microsatellite genetic variation analyzed in Aoki et al. (2014). For this, we first used an ABC framework to identify the most likely model of demographic history among the several alternative scenarios based on Bayesian clustering of current genetic variation. We then applied the model to estimate the demographic parameters describing effective population sizes of the species at several points in time and the recent migrations in their demographic histories. Finally, we combined the results of the species distribution modeling (SDM) with the ABC analyses and those of previous phylogeographic studies of other organisms inhabiting the same climatic zone to reconstruct the glacial and postglacial history of Japan’s broadleaved evergreen forests.

Materials and methods

Plant materials

We used fresh or silica-gel-dried leaves of 792 C. sieboldii trees collected from 33 populations (Table S1, Fig. 1) covering most of the species’ natural distribution in Japan. C. sieboldii is represented by two varieties, var. sieboldii and var. lutchuensis and they are not distributed sympatrically; C. sieboldii var. sieboldii is found in the main islands whereas C. sieboldii var. lutchuensis in the Ryukyu Islands and Kyushu (southern regions from Amamioshima) (Yamazaki and Mashiba 1987b). These varieties have morphologically different; seed size and shape of C. sieboldii var. lutchuensis are larger and wider than C. sieboldii var. sieboldii, however, intermediate morphologies are often observed (Yamazaki and Mashiba 1987a).

Fig. 1
figure 1

Locations of the 33 sampled Castanopsis sieboldii populations. Numbers correspond to the population numbers in Table S1. The dotted line indicates the coastline during the LGM, about 18,000–24,000 years ago

C. sieboldii var. sieboldii is sometimes codistributed with its congeneric species Castanopsis cuspidata var. cuspidata in the main islands of Japan. Hybrids with intermediate morphology sometimes occur, especially at sites where the two species coexist (Kobayashi and Hiroki 2003). Our previous study by Aoki et al. (2014) examined the epidermis of the leaves from each sampled individual, since this is the most effective way of discriminating the two species and hybrids (Yamazaki and Mashiba 1987b). In this study, we identified individuals with double epidermal cell layers as C. sieboldii var. sieboldii, with no ancestral contributions from C. cuspidata var. cuspidata. In subsequent analyses, we used the six C. sieboldii var. lutchuensis populations in the Ryukyu Islands and the 27 C. sieboldii var. sieboldii populations in the main islands, which do not contain any individuals of C. cuspidata var. cuspidata.

The value of minimum temperature of the coldest month of the altitudinal distribution of Castanopsis was estimated to be 0–1 °C (Hattori 1985). Castanopsis is insect pollinated plant species, which is difficult to detect the fossilized pollen in the past. Fossilized pollen of Castanopsis trees at Ryukyu Islands existed at the LGM (Kuroda 1998). The fossilized pollen records of Castanopsis-type broadleaved evergreen tree genus (i.e., Myrica and, or Podocarpus) at the LGM existed in southwestern Kyushu (at the mean frequency of several to 10%), northern Kyushu and Japan Sea (several %) as well as at Ryukyu Islands (10%) (Kuroda 1998; Takahara and Takeoka 1992). There were several fossilized pollen records of CastaneaCastanopsis (less than 1% to several %) at the LGM in Japan Sea (Takahara and Takeoka 1992; Miyoshi et al. 1999) and along the Pacific coasts (Kanauchi 2005), however, we cannot determine these pollens as Castanopsis.

Fagaceae acorns including Castanopsis are dispersed by transporting and caching by animals, especially jay and rodents, and jay participates in longer distance dispersals, for example, 100–1900 m (Darleyhill and Johnson 1981), 250–1000 m (Gómez 2003) and 3–500 m (Pons and Pausas 2007) than rodents, 4–37 m (Jensen and Nielsen 1986) and 30–40 m (Miyaki and Kikuzawa 1988).

EST-SSR analysis

Total DNA was extracted from each leaf sample using either the CTAB (hexadecyltrimethylammonium bromide) method (Murray and Thompson 1980) or the method of Doyle and Doyle (1987) after removing the polysaccharides using HEPES buffer (pH 8.0) (Setoguchi and Ohba 1995). We determined the genotypes of each sample based on 32 pairs of nuclear microsatellite markers (expressed sequence tags-simple sequence repeats, EST-SSRs) used in the study of Aoki et al. (2014). The DNA at each EST-SSR locus was amplified using a QIAGEN Multiplex PCR Kit following the manufacturer’s recommended protocol. PCR products were detected using a PRISM 3100 sequencer in conjunction with GENESCAN software, and genotypes were scored using GENOTYPER software (both supplied by Applied Biosystems).

Choosing suitable loci for ABC analyses

Because we need to simulate neutral loci without null alleles in the ABC analysis in order to estimate the past population demography, we checked the neutrality and the frequencies of null alleles of each locus. First, we converted the genotype data from size numbers to repeat numbers; in the course of this procedure, indels were not taken into consideration. For these alleles with indels other than SSRs, we modified the allelic calls to the nearest alleles in size. Second, we estimated the frequencies of null alleles at each SSR locus within a population using INEst 2.0 (Chybicki 2014). The unique feature of this program is taking into account a possibility of inbreeding within a population during the estimation of null allele frequencies. We removed the two loci with the null allele frequencies of values >0.05. Third, we compared the distribution of the FST values over all loci to their expected distributions under an island model with the assumption of neutrality using the LOSITAN program (Antao et al. 2008), based on fdist as described by Beaumont and Nichols (1996). To calculate the approximate P values for each locus, 10,000 independent loci were generated and the simulated FST distribution was compared with the observed FST values. This made it possible to identify outliers in a one-step process by defining them as observed FST values falling outside the 99% confidence interval for the simulated group. We removed three outlier loci in the process. In total, we removed the five EST-SSR loci (two loci with high null allele frequencies and three outlier loci) in these steps from the 32 loci of which C. sieboldii genotypes were scored and used the remaining 27 loci in the subsequent analyses.

Genetic diversity

We calculated the total number of alleles (NA), the average gene diversity within populations (HS) (Nei 1978), the observed heterozygosity (HO), and the FIS fixation indices (Weir and Cockerham 1984) across all populations at each locus to estimate departures from the Hardy–Weinberg equilibrium (HWE). The significance of deviations of FIS from zero was evaluated by permutation tests with sequential Bonferroni correction.

We calculated the average NA, the unbiased heterozygosity (Nei 1978), and the allelic richness (El Mousadik and Petit 1996), based on data from a minimum sample size of 17 per population, using MSA (Dieringer and Schlotterer 2003) and FSTAT version 2.9.3.2 (Goudet 2002) software.

Genetic structure

We used the genotype data, based on the 27 EST-SSR loci, of 792 C. sieboldii trees collected from 33 sites. We estimated the genetic structure among populations by constructing a neighbor-joining (NJ) tree (Saitou and Nei 1987) based on DA distances (Nei et al. 1983) between all pairs of populations using MSA and PHYLIP (Felsenstein 1989). We used 10 C. cuspidata populations as outgroups that do not contain any individual of C. sieboldii var. sieboldii populations (the populations consisting of only C. cuspidata type of leaf epidermal cells in the study of Aoki et al. (2014)).

We used the Bayesian clustering method to elucidate the genetic structure among the C. sieboldii populations using TESS version 2.3 (François et al. 2006; Chen et al. 2007; Durand et al. 2009), because TESS may detect less clusters than STRUCTURE and is suitable for the data in the presence of clines and the sampling covered the species’ geographic distribution (François et al. 2006; Chen et al. 2007; Durand et al. 2009). A total of 100 simulations were run for each value of K (2–10) with 50,000 Markov chain Monte Carlo samplings after 10,000 burn-in iterations using the admixture model. Following the manuals (François et al. 2006; Chen et al. 2007; Durand et al. 2009), we filtered the candidate runs by choosing to keep 20% of them with the lowest deviance information criterion values and estimated Q-matrices of each K value from a mean of the permuted matrices across replicates using CLUMPP (Jakobsson and Rosenberg 2007). The most appropriate cluster number (K) was selected using the ΔK criterion of Evanno et al. (2005).

Estimation of population demographic history

We used the four distinct genetic groups, the Ryukyu Islands, the West (western parts of the main islands along the Pacific Coast), the East (eastern parts of the main islands along the Pacific Coast), and the Japan Sea (Japan Sea side of the main islands) shown by the NJ and Bayesian clustering methods of C. sieboldii for the subsequent ABC analyses. Four probable alternative scenarios for population diversifications in four genetic groups of C. sieboldii were then constructed (Fig. 2). Model 1 assumed one refugia in the southern Kyushu as well as another in the Ryukyu Islands. This model corresponds to the most classic history supported by the fossilized pollen record of the broadleaved evergreen trees at LGM (Tsukada 1974). Therefore, we assumed that the Ryukyu Islands and the West groups were derived from the refugia and the other two groups were recently diverged from the West. Model 2 assumed that as large refugia existed in the Pacific Ocean side of the coastal region in the East as in the West and the Ryukyu Islands. This Model corresponds to the alternative history supported by the past climatic conditions and tolerance of cool temperatures of the broadleaved evergreen trees (Hattori 2002, Kamei and Research Group for the Biogeography from Würm Glacial 1981). So we modeled that the Ryukyu Islands, the West and the East groups were derived from each refugia, and the Japan Sea group was recently diverged from the West group. Model 3 was based on the information of the phylogeographic tree that was constructed by mitochondrial DNA sequence of Curculio hilgendorfi, which is the host-specific predator of C. sieboldii acorns (Aoki et al. 2008). The phylogeographic tree was chosen because the association between the host plant C. sieboldii and its host-specific insect is known to be very tight (Aoki et al. 2005). Moreover, the geographic patterns of the genetic differentiation of these species seemed to be similar (Aoki et al. 2008, 2014). This model assumes that first, the populations of the Ryukyu Islands diversified from the other three groups, then the East diversified from the other two groups, and finally, the West and the Japan Sea diversified. Model 4 was constructed based on the result of the TESS analysis at K = 2, where the West and the Japan Sea showed genetic admixture between ancestries dominated in the Ryukyu Islands and the East. We assumed that all the four groups exponentially grew with gene flow (except for the model 4) from the time after the four groups were established (from the present to T1). We assumed no migration in the model 4, because in this model, observed multiple ancestries in the West and the Japan Sea groups were solely explained by genetic admixture at T1. Symmetric migrations were considered only between the Ryukyu Island and the West, between the West and the Japan Sea, and between the West and the East groups, because these three group pairs showed relatively low FST values (FST < 0.02; Figure S1) and adjacently located. Moreover, when the source–sink relationships existed, asymmetric migration from sink to source in the direction of coalescence was assumed (e.g., from the Japan Sea to the West, and from the East to the West in model 1, and from the Japan Sea to the West in model 2). In the preliminary analyses, we tested without exponential growth or without migration (other than model 4) models, but supports of such models were much lower (data were not shown). We conducted coalescent simulations for each four models, compared the four models, and estimated the parameters using the ABC technique (Beaumont et al. 2002).

Fig. 2
figure 2

The compared four candidate models for the population history of Castanopsis sieboldii. R, W, J, and E correspond to the Ryukyu Islands, the West, the Japan Sea, and the East groups, respectively, in Figs. 3 and 4

We used randomly selected 100 diploid individuals per group to reduce the computational costs. To check the sampling biases, we drew randomly selected 100 individuals in each group and calculated the summary statistics 100 times. The average and standard deviation among the 27 loci for the NA, the heterozygosity, and the allele size range in each of the four groups were calculated. Pair-wise FST and FST among the total four groups for overall loci were also calculated. A total of 31 summary statistics for the observed data were calculated using arlsumstat version 3.5.2 (Excoffier and Lischer 2010). We compared them and confirmed that sampling biases were very small.

We assumed that current effective population size (NCUR) was common among the four groups because the average and standard deviation values of the NA, the heterozygosity, and the allele size range were not so different among the four groups (Figure S1), and this indicated that these four groups had experienced a similar recent population history. Moreover, each group was assumed to have shrunk at the rate G from the current to T1, i.e., to have grown from T1 to the current [NT = NCUR exp (G × T); NT is an effective population size at time T]. Prior for G was drawn from uniform distribution from −0.01 to 0. The population size for the ancestral population among all the four groups (NANC) was also set. Priors for the NCUR, and NANC were drawn from uniform distribution from 1 to 104 independently. The unit of all the population sizes was scaled in the unit of the number of diploid individuals and converted into the number of haploid individuals by multiplying with two when applying a coalescent simulator. The time when the demographic events occurred (T) was scaled in the unit for generations. Priors for Ts were drawn from uniform distribution from 1 to 104 simultaneously, and T1 < T2 < T3 was assumed. The number of migrants per generation (Nm) was used as migration parameter and the value divided by NCUR (i.e., migration rate) was passed to the coalescent simulator. A single common value was used for all migration events. Prior for Nm was drawn from uniform distribution from 0.1 to 20. In model 4, Pij indicates the proportion of genes that coalesce from the group i to j within the group i at T1. Priors for PWR and PJR were drawn from uniform distribution from 0 to 1 and then PWE and PJE were 1—PWR and 1—PJR, respectively. W, R, J, and E denote the West, the Ryukyu Islands, the Japan Sea, and the East groups, respectively.

Generalized stepwise mutation (GSM) model was used for the microsatellite mutation model (Estoup et al. 2002). The GSM model has two parameters, mutation rate (μ) and geometric parameter (PGSM). The value of PGSM can range from 0 to 1, and PGSM = 0 indicates the strict stepwise mutation model. To accommodate the variation among loci, locus-specific μ and PGSM were generated as a random effect variable. We applied the slightly modified method used by Excoffier et al. (2005). The value of μ for microsatellites is generally in the range from 5 × 10−5 to 5 × 10−3, and several studies have used 5 × 10−4 (Estoup and Angers 1998; Estoup et al. 2002). In this study, we set the mean value of μ to 5 × 10−4 and each locus value was assumed to follow a Gamma distribution with shape and rate parameters. Prior distribution of the shape parameter was drawn from uniform distribution from 0.5 to 2.0 and that of the rate parameter was calculated by shape/(mean value of μ). Prior of the mean value of PGSM among the loci was drawn from uniform distribution from 0 to 0.8, and each locus value was assumed to follow beta distribution with α and β parameters. According to Excoffier et al. (2005), α = 0.5 + 199 (mean PGSM) and β = α (1 − mean PGSM)/(mean PGSM) were used.

Coalescent simulations were replicated 1 × 104 times in each model using program fastsimcoal2 version 2.5.2.21 (Excoffier and Foll 2011). A total of 31 summary statistics that were the same to the ones calculated by the observed dataset were calculated using arlsumstat. Since there were too many numbers of summary statistics, we consistently used machine learning approach to reduce the effects of curse of dimensionality in model choice and parameter estimation. ABC model choice via random forest (ABC-RF) implemented in the abcrf package in R was conducted (Pudlo et al. 2016). ABC-RF allows us to compare models with relatively small number of simulations and thus greatly contributes to save time. One thousand trees were constructed and the best model was selected by classification votes of random forest. Posterior probability of the best model and confusion matrix among compared models were also calculated. We replicated 2 × 106 simulations for the best model and posterior distributions of parameters were estimated by a neural network regression method implemented in the abc package in R (Blum and Francois 2010; Csillery et al. 2012). Tolerance value was set to 5 × 10−4 to obtain the closest 1000 simulated datasets to the observed dataset. The number of neural networks was set to 20. When estimating the posterior distributions by the regression method, to prevent the estimated posterior falling outside the lower or upper bounds of the priors, the logit transformation option of the abc function was used. Posterior mode and the 95% highest posterior density (HPD) were calculated using density and HPDinterval functions in the base and coda packages, respectively, in R (Plummer et al. 2006). Time was converted to years, assuming that the generation time of C. sieboldii is 25 or 100 years per generation for short and long expected values. Short and long generation time values assume the average age to start reproduction (Avise 2000) and the intermediate age between the age at maturity and the maximum lifespan (Petit and Hampe 2006).

Finally, a posterior model check was performed to assess the goodness-of-fit to the data estimated from the posterior distributions (Csillery et al. 2012). A total of 1000 new genotype datasets were generated using parameters randomly sampled from the posterior distributions. Summary statistics were calculated and compared with the observed data. To validate the parameter estimation, posterior quantiles for each parameter were calculated with 1000 pseudo observed data set (Cook et al. 2006; Wegmann et al. 2010). When the parameter estimation is unbiased, distribution of posterior quantiles can be approximated by a uniform distribution. To test the deviation from a uniform distribution, a Kolmogorov–Smirnov test was conducted using R.

Species distribution modeling

To assess the refugia of C. seiboldii under LGM (21,000 years BP) climate conditions, the geographical distribution of potential habitats were estimated using the existing species distribution model of the target species (Nakao et al. 2014). The model was constructed with the presence/absence records of the species as a response variables, and four climatic factors (Bio6, mean of daily minimum temperature of winter; Bio10, mean temperature of warmest quarter; Bio18, precipitation of warmest quarter; and Bio19, precipitation of coldest quarter) as explanatory variables (Nakao et al. 2014). Climatic variables at a spatial resolution of 2.5 min were used as explanatory variables, which were extracted from the WorldClim database (http://www.worldclim.org). We also predicted potential habitats under LGM climate conditions using three LGM climate data (i.e., MIROC, CCSM, MPI-ESM) from WorldClim.

Results

Genetic diversity

The EST-SSR loci were highly polymorphic (Table S2). Across all populations, the FIS values deviated significantly and positively from zero at two loci. High levels of genetic diversity within populations were also observed in each population (mean numbers of alleles, expected heterozygosity, and allelic richness were 5.226, 0.574, and 4.888, respectively; Table S1). Of the four genetic groups of the ABC analyses, the average allelic richness among each populations was highest in the group of the Ryukyu Islands (5.337), followed by those in the West (5.191), the East (4.680), and the Japan Sea (4.659). The amount of unique alleles to each group per population was also highest in the group of the Ryukyu Islands (2.667), followed by those in the West (1.000), the East (0.846), and the Japan Sea (0.625).

Genetic structure

The NJ tree of C. sieboldii with C. cuspidata populations as outgroups contained four clusters corresponding to populations from the Ryukyu Islands and the West and the East and the Japan Sea in the main islands (Fig. 3).

Fig. 3
figure 3

Neighbor-joining tree based on Nei’s genetic distances (DA) among the 33 Castanopsis sieboldii populations over 27 EST-SSR loci. Numbers after the outgroup species, C. cuspidata, correspond to the population numbers in the study of Aoki et al. (2014). Values in italics are percentages of 1000 bootstrap replicates supporting the respective nodes

The Bayesian clustering of the C. sieboldii populations indicated that ΔK was highest when K = 2; however, fractions of ancestry from three or four clusters (K = 3 or 4) were also found (Fig. 4). Black cluster dominated in the populations of the Ryukyu Islands (97%), while white cluster dominated in the populations of the East (98%) but was also present in the populations around the Japan Sea (88%) at K = 2. Light gray clusters at K = 3 and 4 dominated in the populations around the Japan Sea. Populations of the West and around the Japan Sea showed genetic admixture between three clusters (the West, black (47%), white (15%), and light gray (36%) and the Japan Sea, black (2%), white (6%), and light gray (75%) at K = 4).

Fig. 4
figure 4

Genetic relationships among the 33 Castanopsis sieboldii populations estimated using the Bayesian clustering method, TESS (Durand et al. 2009)

The NJ tree and the admixture analyses using TESS of C. sieboldii showed that there were almost shared four distinct genetic groups (the Ryukyu Islands, the West, the East, and the Japan Sea). Only one population of No. 20 was assigned to the Japan Sea in the NJ tree, while to the West in the TESS analysis. We assigned the No. 20 population to the West genetic groups in the subsequent ABC analyses because No. 20 geographically located around the Pacific Ocean and was not suitable for including in the Japan Sea genetic group. The populations of No. 13, 14, and 19, which located geographically intermediated between the eastern and western parts of the main islands and showed intermediate cluster between the eastern and western clusters by the NJ tree, were assigned to the East genetic groups in the subsequent ABC analyses because the black cluster (8%, 9%, and 11%, respectively at No. 13, 14, and 19) that dominated in the populations of the Ryukyu Islands was less frequent and the white cluster (64%, 65%, and 42%, respectively) that dominated in the populations of the East was more frequent in the TESS analysis.

Model comparison and parameter estimation for the demographic modeling

Prior to the explanation for the results of model comparison and parameter estimation, in this paragraph, we summarized the validations of model comparison and parameter estimation. Confusion matrix of model comparison among the four demographic models is shown in Table S3. Classification error rates of each model ranged 0.010–0.238 and overall classification error rate was 0.116 (Table 1 and S3). These low classification errors indicate that the wrong model is rarely selected with high confidence. Distributions of predicted summary statistics using posterior distribution of the best model were well congruent with the observed ones (Figure S2). Therefore, we concluded that the goodness-of-fit of the best model to the observed data was well. Posterior quantiles of each parameter were not significantly deviated from a uniform distribution except for PGSM (Figure S3). Distribution of posterior quantile of PGSM was biased toward 1.0. These results suggest that the parameters except for PGSM can be estimated without bias, but PGSM may be underestimated. However, as the PGSM is a nuisance parameter, it will not become a big problem when we consider the following results.

Table 1 Fraction of votes, classification error rate, and posterior probability (PP) of the best model, which were estimated by random forest composed of 1000 trees based on a trained set of 10,000 simulations

Model 1 was selected as the best model among the compared four candidate models with a posterior probability of 0.906 (Table 1). Posterior distributions of all parameters for the best model were clearly different from prior ones (Table 2; Figure S4). All posterior distributions of parameters showed a clear single peak. Posterior modes (95% HPD) of NCUR and NANC were 5463 (2349–9239) and 1517 (13–8809), respectively. Posterior mode (95% HPD) of the effective population size at T1 estimated using G was 1468 (989–3228; Figure S5).

Table 2 Prior and posterior distributions of parameters in the best model (model 1)

Posterior modes (95% HPD) of T1 and T2 were 431 (95–2453) and 1184 (110–8311) generations ago, respectively. When the short generation time (25 years per generation) was applied, posterior modes of T1 and T2 were 10,775 and 29,600 years ago, respectively. When the long generation time (100 years per generation) was applied, posterior modes of T1 and T2 were 43,100 and 118,400 years ago, respectively.

Posterior mode (95% HPD) of Nm was 9.70 (6.08–17.82). According to Sewall Wright’s one migrant per generation theory, one migrant is sufficient to prevent complete differentiation of the populations (Wright 1931). As Nm was significantly larger than 1.0, we considered that migrations were effective.

Species distribution modeling

Potential habitats under the present climate conditions were mainly located in the lowland around the coastal regions. Potential habitats under LGM climate conditions were mainly located in the Pacific Ocean side (southern to northwestern Kyushu, southern Shikoku to western Kii Pen. and around Izu Pen.), including the area of below sea level under current climate conditions (Fig. 5).

Fig. 5
figure 5

Potential habitats of Castanopsis sieboldii estimated by species distribution models at the present and the Last Glacial Maximum (LGM). The potential habitats under LGM were shown by the average of the three Global Climate Models (i.e., MIROC, CCSM, MPI-ESM) (Figure S6)

Discussion

The numbers and the locations of the glacial refugia in several areas of the world, including Japan, are key phylogeographic concerns and have been intensely debated. The ABC analysis of the C. sieboldii genetic dataset provided much stronger support for the scenario that the Ryukyu Islands and the West groups were derived from separate refugia and the East and the Japan Sea groups were diverged from the West. Available fossil pollen data for broadleaved evergreen trees during the LGM (Matsuoka and Miyoshi 1998) suggest that refugia for Castanopsis-type broadleaved evergreen forests were present in the Ryukyu Islands and the southwestern Kyushu in the main islands. The higher levels of genetic diversity of C. sieboldii, the ancient diversification from ABC analysis and higher probability of potential habitats under LGM climate conditions from SDM analysis are observed in the populations of the Ryukyu Islands and the West group containing Kyushu, suggesting that these populations have remained sufficiently large for ancestral polymorphism to be retained from the glacial periods up to the present day.

The time when the East and the Japan Sea groups diverged from an ancestral population of the West were estimated to be 431 generations ago. Under the coalescent theory, basic demographic parameters related to population size, timescale, and migration must be scaled by the rate of genetic drift or by the mutation rate (Hey and Nielsen 2004; Wakeley 2008). In the case of timescale, the parameter is expressed in terms of mutations as the number of generations × mutation rate (Hey and Nielsen 2004). In this study, we thus fixed the value of mutation rate at the moderate level (5 × 10−4) and estimated the number of generations. As we used EST-SSR and in general mutation rate for EST-SSR is considered to be lower than that for genomic-SSR (Cubry et al. 2014), the estimated values of T1 and T2 might be underestimated. Moreover, lacking accurate information on the generation time for Castanopsis, we assumed generation times of 25 and 100 years in ABC analyses. When we consider the generation time, it is important to distinguish among age at maturity, generation time and lifespan (Petit and Hampe 2006). For instance, according to the first complete lifetable for a tree, the estimated generation time of the palm tree Enterpe globosa is 101 years, intermediate between its age at maturity (50 years) and maximum observed lifespan (156 years) (van Valen 1975). Thus, we choose more probable 100-year generation time, which provides that the East and the Japan Sea groups were diverged from the West at 43,100 years ago. The time predates the coldest stage around 18,000 years ago (Maeda 1980; Tsukada 1983; Martinson et al. 1987) of the LGM. These results suggest that the current populations of C. sieboldii descended from at least four isolated populations that were established prior to the LGM, however, the groups of the eastern parts and around the Japan Sea were more recently established from ancient southern refugia. Thus, these results provide evidence for the presence of LGM refugia in the regions of the eastern parts and around the Japan Sea, as well as the Ryukyu Islands and the western parts in Japan.

There was clear evidence of about four times expansion in the historical effective sizes from the glacial periods to the present (Figure S5). Chronological data of the Earth’s climate (Martinson et al. 1987) and pollen records from the Japanese forests (Maeda 1980; Tsukada 1983) suggest that the Japanese climate was coldest around 18,000 years ago, started to warm about 12,000 years ago, and warmest about 6000–8000 years ago. The result suggests that the smaller populations of the four genetic groups may have survived in each glacial refugium for several glacial and interglacial periods in the Quaternary period, and when the climate began to get warm after the LGM, these populations regained and expanded their population sizes from the coastal regions to the inland areas. Migrations also occurred between adjacent populations from the southwestern to the eastern regions in the main islands of Japan from the LGM to the present.

In the main islands of Japan, a similar genetic differentiation between the western and eastern populations that was observed in C. sieboldii has also been detected in several other species (both plants and animals mentioned later) of the broadleaved evergreen forests. Geographical boundaries between the western and eastern populations are often reportedly located in a region spanning the main islands of Shikoku and Chugoku, the western tip of Honshu to Kii Peninsula, for example, Elaeocarpus sylvestris var. ellipticus (Aoki et al. 2004b), Photinia glabra (Aoki et al. 2006), Myrsine seguinii (Aoki et al. 2011), and Camellia japonica (Ueno 2015) for plant species, C. hilgendorfi; a specific predator of C. sieboldii seeds (Aoki et al. 2008) and Rhynchaenus dorsoplanatus; a specific leaf miner of C. sieboldii leaves (Aoki et al. 2010) for insect species. Genetic boundaries between several species of warm to cool temperate zones in Japan also appear to be located in this region, for example, Pinus (Miyata and Ubukata 1994; Iwaizumi et al. 2013), Cerasus (Tsuda et al. 2009), Zanthoxylum (Yoshida et al. 2010), Padus, Carpinus, and Magnolia (Iwasaki et al. 2010, 2012), and Acer (Yoshimaru and Matsumoto 2015) for plant species; Curculio (Aoki et al. 2009) for insect species; and Cervus (Nagata et al. 1999), Macaca (Kawamoto et al. 2007), Petaurista (Oshida et al. 2009), Ursus (Yasukochi et al. 2009), and Lepus (Nunome et al. 2010) for mammal species. The common geographic differentiation among multiple codistributed taxa may help efforts to elucidate the relative influence of major historical events such as climate changes during the glacial and interglacial periods (Avise 2000). The common genetic uniqueness of the eastern populations observed in various component species of broadleaved evergreen forests also suggests the eastern refugia in the main islands.

There is no clear palynological evidence at the LGM that considerable number of Castanopsis-type broadleaved evergreen trees survived the LGM in the eastern regions of the main islands (Figure S7). However, frequencies of fossilized pollen of broadleaved evergreen trees in the eastern parts of Japan began to markedly increase at the southern ends of the Kii, Izu, and Boso Peninsulas (located sequentially eastward along the Pacific Coast, close to the route of the warm Kuroshio Current) from 7500, 8500, and 7500 years ago, respectively, approximately 500–1500 years after similar increases at the southern ends of Kyushu (Matsushita 1992). Paleontological evidence also indicates that several other broadleaved evergreen trees were present on the Boso Peninsula about 9000 years ago (Okazaki et al. 2011). Moreover, the genetic diversity and the distinct chloroplast DNA haplotypes of these types of trees suggest the existence of refugia around the Kii Peninsula (Aoki et al. 2004b; Liu et al. 2013). Potential habitats of C. sieboldii under LGM climate conditions also located in the Pacific Ocean side including the area of below sea level under current climate conditions. Thus, some places in the southern tips of these peninsulas along the Pacific Coast are possible candidate locations of the cryptic eastern refugia in Japan.

Genetic boundaries between the Japan Sea and the Pacific Ocean populations of various plant species of temperate zones in Japan have been observed, for example, Cryptomeria (Tsumura et al. 2007), Fagus (Fujii et al. 2002; Hiraoka and Tomaru 2009), and Euonymus (Iwasaki et al. 2012). Fossil pollen evidence of Cryptomeria at the LGM also existed in the Japan Sea region (Kawamura 1977). There were fossilized pollen records of CastaneaCastanopsis (less than 1% to several %) at the LGM in the Japan Sea (Takahara and Takeoka 1992; Miyoshi et al. 1999). These information and the ABC-based demographic analyses of C. sieboldii suggest that Castanopsis forests could have survived in the small refugia along the Japan Sea coasts, though lack of SDM support for the refugia of this region because of the SDM model is too coarse to take into account topographic effects on microclimate.

Conclusion

The results of this study may help efforts to resolve the demographic dynamics for species of warm temperate ecosystems during the Quaternary period in Japan. Our ABC-based simulations of demographics and SDM of the dominant tree Castanopsis (Fagaceae), serving here as an indicator for the behavior of species in broadleaved evergreen forests, indicate that refugia were located at least one in the Ryukyu Islands and the other three regions of the western and eastern parts and around the Japan Sea of the main islands in Japan at the LGM. The results also provide ABC-based indications of demographic expansions driven by glacial oscillations. Our analyses provide foundations for further ABC-based explorations of the demographic history of species in warm temperate broadleaved forests during and after the last glacial period and a basic model for future phylogeographical studies using this approach.

Data archiving

Data available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.5sb1219.