Introduction

Ancient geological events (e.g. the uplift of mountains and volcanic eruptions) and climate change (e.g. glacier/inter-glacier cycles) shape the divergence of species at the intraspecific and interspecific levels (Favre et al. 2015; Hewitt 2004). Meanwhile, long-term environmental heterogeneity along altitudinal or latitudinal gradients also contributes to genetic differentiation among species (Gugger et al. 2013). The evolutionary processes accompanying this heterogeneity can be reconstructed based on contemporary patterns of genetic diversity and genetic differentiation of organisms (Qiu et al. 2011). For example, phylogeographic studies have indicated that many species in northern Europe migrated from three southern peninsulas—the Iberian Peninsula, the Italian Peninsula and the Balkan Peninsula—since the last glacial maximum (LGM; Comes and Kadereit 1998). Understanding the effects of orogeny, climatic oscillations and environmental heterogeneity on the spatial genetic patterns of species, especially for the dominant species of vegetation or endangered species, is informative not only for disentangling the complex evolutionary histories of species, but also for predicting the responses of organisms to future climate change.

Southwest China is a global biodiversity hotspot owing to its complex topographic and drainage systems (Yang et al. 2004). It is rich in vascular plant species and harbours remarkably high rates of endemic and relict species (López-Pujol et al. 2011). The timing and geological mechanisms underlying the establishment of this biodiversity hotspot in southwest China has fascinated numerous botanists and evolutionary biologists for some time (Meng et al. 2017; Zhang et al. 2010; Zhao and Gong 2015). A growing number of studies have indicated that topographic and drainage diversity in this region has contributed greatly to the pattern of genetic diversity in tropical–subtropical plants (Yue et al. 2012; Zhang et al. 2010). However, these studies have mainly focused on shrubs (Liu et al. 2009; Yue et al. 2012), temperate deciduous trees (Tian et al. 2015; Zhao and Gong 2015) and conifers (Ma et al. 2006; Wang et al. 2013). Upper canopy trees play a crucial role in the ecological functions and biodiversity of evergreen-broadleaved forests (EBLFs; Fang et al. 2014). However, the dynamics and genetic structures of these key tree species in tropical–subtropical Asia have received scarce attention (Jiang et al. 2016).

Subtropical EBLFs constitute the primary native vegetation types of southwest China and thus play important ecological and economic roles in the area (Harrison et al. 2001). Strong macrofossil and palynological evidence has indicated that EBLFs were widely established in southern China no later than the Middle Miocene (Jacques et al. 2011; Sun et al. 2011). Molecular data and species distribution models (SDMs) have indicated that the EBLFs were stable and widespread in East Asia since the LGM (Jiang et al. 2016; Xu et al. 2015), although biome reconstructions based on palynological data have indicated that this vegetation type retreated south of 24°N during the LGM (Harrison et al. 2001). Southwest China contains an extensive transitional zone between two floristic regions, the Sino-Japanese and Sino-Himalayan floristic sub-regions. The boundary between these two sub-regions has been referred to as the ‘Tanaka–Kaiyong line’ (Li and Li 1992; Tanaka 1954), which is comprised of the ‘Tanaka line of Citrus Distribution’ and the ‘Kaiyong line of Orchid Distribution’ (Li and Li 1997). The Tanaka line is also the distribution boundary of a series of closely related plant species and genera pairs in subtropical Asia (Li and Li 1997; Tanaka 1954). However, the biogeographical significance of this line is still poorly understood. Based on comparisons of the regional flora, the substitution distribution pattern between closely related species and genera on the two sides of the Tanaka line have been hypothesized to be an outcome of long-term geographical isolation and environmental heterogeneity of the biota (Li and Li 1997). The plant species found east of the Tanaka line have been considered to be relic species, while those to its west are considered to be more recently derived (Sun 2001). Recent phylogeographic studies in southwest China have revealed prominent genetic diversification patterns across the east and west sides of the Tanaka line as well (Tian et al. 2015; Zhang et al. 2006; Zhao and Gong 2015). These studies have mainly attributed this striking pattern to the geographical barriers and isolation associated with the rapid uplifting of the Yunnan–Guizhou Plateau and the intensification of Asian monsoons since the late Miocene along the Tanaka line and throughout adjacent regions (Fan et al. 2013; Tian et al. 2015; Zhao and Gong 2015). The contribution of long-term environmental heterogeneity between the two sides of the Tanaka line to local differentiation and adaptation has been largely overlooked by current phylogeographic studies.

Quercus kerrii, which belongs to the Group Cyclobalanopsis (Fagaceae), is a widespread species in southwest China and north Indo-China. Its distribution spans across the Tanaka line. This region is also known as the species diversity centre for the Eurasian oaks, especially for the Group Cyclobalanopsis (Luo and Zhou 2000). Quercus kerrii is a keystone species in semi-dry to humid EBLFs at low elevations (50–1300 m asl).

Frequent introgression between closely related species of oaks (Eaton et al. 2015) can result in biased estimates of intraspecific genetic structure. Quercus kerrii and Q. austrocochinchinensis are a pair of sister species as inferred by molecular phylogenetic reconstruction using ITS sequences (Deng et al. 2013) and RAD-seq (Hipp et al. 2015). Quercus austrocochinchinensis is an endangered species consisting of only four small populations. Previous investigations into their morphologies and molecular markers have demonstrated that ongoing hybridization and introgression occurs between the two species (An et al. 2017). However, the observed introgression of SSR alleles was asymmetric, occurring only from Q. kerrii to Q. austrocochinchinensis (An et al. 2017). When hybridization occurs between a dominant species and a rare species, genetic assimilation of the rare species is more likely to occur (Levin et al. 1996). Therefore, introgression between Q. kerrii and Q. austrocochinchinensis has contributed little to the genetic diversity and differentiation of Q. kerrii. Furthermore, Q. kerrii is a keystone species in local low-elevation EBLFs, and it plays a critical role in biological conservation and regional carbon storage and cycling. Quercus kerrii provides a unique opportunity to investigate the roles of environmental factors in genetic diversification of species in tropical-subtropical China.

In this study, we examined population samples of Q. kerrii from southwest China to (1) reveal spatial genetic patterns in Q. kerrii; (2) investigate the historical dynamics of Q. kerrii populations in southwest China; (3) explore the influence of environmental factors on the genetic diversity of Q. kerrii and genetic divergence between Q. kerrii populations. This study provides new insight into historical vegetation dynamics and the flora evolution of southwest China EBLFs.

Materials and methods

Potential distribution and environment factors analysis

Maximum entropy models performed by MaxEnt v3.3.3k (Phillips et al. 2006) were used to compare geographic distributions of Q. kerrii in the present and LGM periods. Asian subtropical–tropical lowland ecosystems are severely impacted by human disturbances (e.g. the expansion of crop farming lands). As a result, large numbers of Q. kerrii populations previously recorded in herbaria no longer exist, and most herbarium records lack specific GPS coordinates. Therefore, when recorded specimen localities corresponded to villages, locations have been converted to imprecise GPS coordinates. Additionally, Asian oak specimens are commonly misidentified in the main herbaria. We found quite a few Asian oak specimens were misidentified as Q. kerrii (e.g. Q. franchetii, Q. aliena, Q. griffithii and even species of Lithocarpus and Castanopsis). In order to ensure the accuracy of current species localities, we only used the 44 GPS records collected from this study to estimate the potential species distributions. Nineteen bioclimatic variables characterizing recent (1950–2000) temperature and precipitation patterns and LGM (CCSM) periods at a 2.5-arc-min resolution were downloaded from WorldClim (available online at http://www.worldclim.org/). Multicollinearity among variables was examined using a Pearson correlation matrix estimated in R v3.3.3 (https://www.r-project.org/). We reduced the subsets of variables with high correlations (r > 0.8) to single variables.

According to the geographic distribution of Q. kerrii in China, we divided the 44 populations of Q. kerrii into four groups, including (1) 20 populations from the Lancang River and adjacent regions (group LCJ); (2) 10 populations located at the Red River drainage and adjacent regions (group HH); (3) 11 populations from the Nan-pan-jiang River basin (group NPJ) and from the east side of the Tanaka line and (4) 3 populations from Hainan Island in the South China Sea (group HN). The detailed distribution and collection information are summarized in Table 1 and Fig. 1b. As groups HH and LCJ are only separated by a short geographic distance and are located on the west side of the Tanaka line, we refer to the two groups collectively as HH–LCJ. We also estimated the potential distribution range of Q. kerrii at the present period for HH–LCJ and NPJ subpopulations, respectively. For all sites and HH–LCJ sites, respectively, we used 75% of the sites for training the model and 25% for testing. The mean value of ten replicate results was used for all estimations. The Jackknife procedure (Pearson et al. 2007) was applied to model the NPJ subpopulations because the sites in this group numbered <25. By removing sites one-by-one from the number of total sites n, the model was built using the remaining n−1 localities. Therefore, n time models were built for a species with n sites. The mean values of n time models were used as the potential distribution of the species. Model performance was evaluated by the area under the curve (AUC) value. Species distribution maps of Q. kerrii in the present and LGM periods were visualized in ArcGIS 10.2 (Environmental Systems Research Institute, Inc., ESRI; Redlands, CA, USA).

Table 1 Sampling information and genetic diversity of Quercus kerrii populations
Fig. 1
figure 1

a The distribution area of this studies. b Geographic distribution and c network of 12 cpDNA haplotypes of Quercus kerrii. b The pie charts reflect the occurrence frequency of each haplotype in each population. Haplotype colours correspond to those shown in the lower-left panel. The two dotted lines represent the Tanaka line and species genetic barriers, respectively. c The circle sizes are in proportion to the number of individuals of each haplotype. Haplotypes are coloured yellow, red, green and violet to represent the populations belonging to groups LCJ, HH, NPJ and HN, respectively

Principal component analysis (PCA) for the climate variables of sample sites was performed using R v3.3.3 (https://www.r-project.org/). The estimated principal components summarized the overall pattern of variation in 19 climate variables among populations during the present and LGM periods. To compare the environmental changes for populations since the LGM (E change), the absolute values of standardized PC1 scores from the present period (E pre) minus scores from the LGM period (E LGM) were calculated. After standardization, values nearer to 0 indicate that the environment has been relatively stable since the LGM, while values near to 1 indicate that the environmental changes are significant. The present and LGM period environmental conditions as well as environmental changes since the LGM at the sampling sites were visualized using ArcGis 10.2 (ESRI). The t test was performed in R v3.3.3 (http://www.r-project.org/) to compare the differences in climatic factors among groups.

Population sampling and genotyping

A total of 403 individuals were sampled from 44 natural populations of Q. kerrii in China from October 2009 to August 2014 (Table 1). For each population, individuals that were at least 10 m away from each other were sampled. Fresh leaf samples were cleaned and stored in silica gel immediately until DNA extraction was conducted. Voucher specimens for each individual were collected and are stored at the Herbarium of the Shanghai Chenshan Botanical Garden (CSH).

Total genomic DNA was extracted using the modified CTAB method from silica-dried leaf tissue (Doyle 1987). After screening universal primer sets published previously, two chloroplast DNA (cpDNA) intergenic spacers, psbA-trnH (Shaw et al. 2005) and ycf1 (Dong et al. 2015), and 10 nuclear microsatellite loci (nSSR; Table S1) with the highest level of polymorphism were selected to examine the genetic diversity and population structure of these 44 Q. kerrii populations. The PCR reaction protocols for cpDNA and nSSR loci followed the procedures described by Jiang et al. (2016). Successfully amplified PCR products of cpDNA and nSSR markers were sequenced or genotyped by Shanghai Majorbio Bio-pharm Technology Co., Ltd (Shanghai, China). The cpDNA sequence chromatograms were checked and assembled using Sequencher version 4.1.4 (Gene Codes Corporation, Ann Arbor, MI, USA). Then the sequences were aligned using CLUSTAL_W implemented in MEGA 6.0 (Tamura et al. 2013) and subsequently adjusted manually. The allele sizes of each microsatellite marker were genotyped using GENEMARKER version 2.2.0 (Soft Genetics LLC, State Collage, PA, USA). MicroChecker version 2.2.3 (Van Oosterhout et al. 2004) was used to check for genotyping errors and null alleles.

cpDNA analysis

The haplotypes of two combined cpDNA sequences were extracted using DnaSP 5 (Librado and Rozas 2009) and deposited into GenBank (accession number given in the ‘Data Archiving’ section). Spatial distribution maps of haplotypes were drawn using ArcGis 10.2 (ESRI). Haplotype diversity and nucleotide diversity were calculated using ARLEQUIN v3.5 (Excoffier and Lischer 2010). Total gene diversity (H T), within-population gene diversity (H S), G ST and N ST (based solely on haplotype frequencies and consideration of similarities among different haplotypes, respectively) for all populations and each group were calculated using HAPLONST (Pons and Petit 1996). A G ST value significantly lower than N ST according to a U-statistic indicates that there is phylogeographic structure within the populations. A median-joining network (Bandelt et al. 1999) of cpDNA was constructed using the programme Network 4.6.1.0 (available at http://www.fluxus-engineering.com/sharenet_rn.htm).

Tajima’s D statistic (Tajima 1989) and Fu’s F S test were calculated (Fu 1997) in ARLEQUIN v3.5 (Excoffier and Lischer 2010) to test the population expansion hypothesis. Significant negative values of these statistics is a signature of recent population expansion. Mismatch distributions were computed by using ARLEQUIN v3.5 (Excoffier and Lischer 2010) for the cpDNA dataset, and significance was tested by comparison of the empirical estimates to the distribution from 1000 permutations of the data. A multimodal distribution indicates population equilibrium, whereas a unimodal distribution indicates a population expansion (Rogers and Harpending 1992). Sum-of-squared deviations, Harpending’s raggedness index and their p-values were computed in order to assess deviations from the sudden expansion model.

nSSR analysis

We calculated the H e (expected heterozygosity) and A r (allelic richness) values for populations with five or more individuals. The pairwise F ST, A r, pA r (private allelic richness), H e (expected heterozygosity) and H o (observed heterozygosity) values were also calculated at the group level. The A r and pA r values were calculated by rarifying the allele data to be consistent with 6 (at population level) or 38 (at group level) gene copies using HP-RARE (Kalinowski 2005). Pairwise F ST, H o and H e values were calculated using ARLEQUIN v3.5 (Excoffier and Lischer 2010).

Both Bayesian clustering and principal coordinate analyses (PCoA) were used to estimate the genetic structure of Q. kerrii. Bayesian clustering was performed with InStruct (Gao et al. 2007) using the admixture model. The number of clusters (K) was set to vary from 1 to 10. Each cluster was repeated 10 times. We performed a run length of 200,000 MCMC (Markov chain Monte Carlo) iterations with a burn-in of 50% of the chain length for each cluster (K). The best fit for the number of clusters was determined using the ΔK method (Evanno et al. 2005). Ten runs of InStruct with the optimum K value were aligned using CLUMPP (Jakobsson and Rosenberg 2007) based on a greedy algorithm. PCoA can be used to visualize similarities or dissimilarities of microsatellite data based on a genetic distance matrix without any population genetic model assumptions. We performed the PCoA using GenAlEx 6.5 (Peakall and Smouse 2006) based on the genetic distances among populations and among groups. The first three (Gp1, Gp2 and Gp3) and two (Gg1 and Gg2) principal coordinates at population level and group level were visualized respectively.

Landscape pattern of Q. kerrii

A general linear model (GLM) analysis performed in R v3.3.3 (http://www.r-project.org/) was used to estimate the correlation of genetic diversity (A r and H e) with environment and geographic factors at the population level. Five variables—present (E pre) and LGM (E LGM) period environment, environmental change since LGM (E change), latitude and longitude—were used as explanatory covariates. The initial model included all explanatory covariates, and the most suitable models were selected based on a backward elimination procedure. The genetic significant difference level among the Q. kerrii groups were tested using the Gp1 and C A (Cluster A from the Bayesian clustering) values.

Alleles in Space (Miller 2005) was used to explore the possible genetic discontinuities and barriers among populations according to cpDNA and nSSR polymorphism. To avoid the effects of genetic distances among populations caused by geographical distances, residual genetic distances were used to construct the possible genetic discontinuities and barriers among populations. The residual genetic distances were derived from the linear regression of all pairwise genetic distances across geographical distances (Manni et al. 2004). First, a connectivity network of sample locations was generated using the Delaunay triangulation rule (Brouns et al. 2003). Then, residual genetic distances between observations connected in the network were placed at the midpoints of each connection to form a three-dimensional surface plot, where x and y axes were equivalent to the appropriate population’s geographical coordinates and the z axis represented the residual genetic distance. The possible genetic barriers were estimated using Monmonier’s maximum difference algorithm (Monmonier 1973). First, the initial barrier segment was identified based on the greatest residual genetic distance between any two locations joined in the Delaunay triangulation connectivity network. Second, the initial barrier was identified with respect to one direction until an external edge of the connectivity network was encountered. Finally, the second step was repeated in the opposite direction.

Results

Past and present distribution modelling

After excluding the highly autocorrelated climate variables, a total of 10 variables were used for the modelling analysis. The AUC values for the climate modelling exceeded 0.99, indicating a good performance of the models. Among all sites, habitat suitability values of populations were 0.14–0.78 and 0.12–0.72 for the present and LGM periods, respectively. The range of Q. kerrii populations in the present period was larger than that of the LGM period (Fig. 4). To quantitatively analyse the extent of species expansion, we calculated the ratio of the LGM and present potential distribution areas based on the maximum sensitivity plus specificity value as the species presence/absence threshold. The ratio of the present range to the LGM range was 1.27. The environmental factor of total precipitation in the warmest quarter (bio_18) explained more than half of all variation (54.85%, SD = 1.23) in the distribution of Q. kerrii, and the remaining nine environmental factors contributed about 45.15% of the variation. The modelling result indicated that there are non-overlapping distributions of Q. kerrii in the present period between the HH–LCJ and NPJ regions (Fig. 5).

PCA 1 score values of environment variables were between −2.3 and 2.3 for populations in the LGM period (E LGM) and between −2.0 and 1.4 for populations in the present period (E pre; Fig. 4). Groups LCJ and HH had similar habitats during the present (E pre) and LGM (E LGM) periods. The environmental changes since the LGM (E change) between groups LCJ and HH showed no significant differences (Table 5). On the contrary, these three environmental factors significantly differed between the groups located east (i.e. group NPJ) and west (i.e. groups LCJ, HH and HH–LCJ; Table 5) of the Tanaka line.

cpDNA results

The two cpDNA regions were successfully sequenced from 370 Q. kerrii individuals from 44 populations. After alignment, the total 1058 bp of cpDNA sequences contained 17 polymorphisms, including 2 indels and 15 nucleotide substitutions. The indels were treated as single-base mutations in the following analyses. In total, 12 haplotypes were identified (Fig. 1b), of which, H2 was the most common haplotype shared by all groups. H1 and H3 were less common haplotypes. H1 existed both in groups LCJ and HH (i.e. HH–LCJ). H3 existed only in group LCJ. H4 and H6 were rare haplotypes, but shared by various groups. H8 was shared by two populations within group NPJ. Four unique haplotypes, H9, H10, H11 and H12, were each found in all individuals in four populations from group NPJ (32, 33, 36 and 39, respectively). The haplotype network indicates that the haplotypes were arranged in a star-like structure (Fig. 1c). Haplotype diversity and nucleotide diversity were zero in more than 80% of the populations (Table 1). The highest haplotype diversity estimates were observed in populations 10 (h = 0.56) and 25 (h = 0.53), and the highest nucleotide diversity estimates were found in populations 25 (π = 2.02 × 10−3) and 29 (π = 0.32 × 10−3; Table 1). Groups NPJ and HN exhibited the highest and lowest degrees of genetic differentiation, respectively (Table 2). The degree of differentiation within groups LCJ and HH were similar. Within each group and among all the populations of Q. kerrii, no phylogeographic structure was detected.

Table 2 Measures of genetic diversity of the Quercus kerrii at group level based on cpDNA and nSSR data

Mismatch distribution analysis for cpDNA was performed to assess the demographic dynamics. The observed unimodal distribution for all samples suggested a recent history of population expansion (Fig. S1). The two test statistics for selective neutrality were negative but non-significantly (Tajima’s D = −1.142, p = 0.131; Fu’s F S  = −2.088, p = 0.241), which weakly suggests a sudden population expansion hypothesis.

nSSR results

A total of 105 alleles were obtained based on 10 nSSR loci, with the allele number varying from 5 to 22 per locus (Table S1). The genetic diversity index A r and H e values of Q. kerrii at the population level were 2.27–3.20 and 0.46–0.69, respectively (Table 1). The genetic diversity indices (H o, H e, A r and pA r) at the group level indicated that group NPJ had the highest genetic diversity, and groups LCJ and HH had similarly low genetic diversity (Table 2). The values of pairwise F ST among groups were 0.004–0.185 (Table 3). The genetic differentiation between group HN and the other groups were higher. Groups LCJ and HH showed the lowest levels of genetic differentiation.

Table 3 Pairwise F ST-values between the groups of Quercus kerrii. Below diagonal were F ST-values and above were p-values

In the Bayesian clustering analysis, the highest ΔK was obtained when K = 2 (Fig. S2). Genetic admixture was widespread in Q. kerrii populations (Table 1). PCoA for the nSSR data at population and group levels revealed consistent patterns (Fig. 2). Group HN was clearly distinct from the other groups. There was no genetic differentiation between groups LCJ and HH. However group NPJ was separated from HH–LCJ.

Fig. 2
figure 2

Plots of the first three and two coordinates of the principal coordinates analysis (PCoA) at population level a and group level b based on the nSSR pairwise differentiation matrix for Quercus kerrii

GLM analysis indicated that the model that best explains the genetic diversity (A r and H e) of Q. kerrii contained the variables E change, latitude and E LGM, but only E change and latitude were significant correlated with A r and H e (p < 0.01; Table 4). The strength of the influence of E change on the Q. kerrii genetic diversity was greater than that of latitude. Except for differentiation between groups LCJ and HH, the genetic structure among the other groups exhibited significant differences (Table 5).

Table 4 Values from best model for regression analysis between the genetic diversity of Quercus kerrii nSSR with environment and geography factors based on the backward eliminating procedure
Table 5 The P-values of t test among groups. Below diagonal were genetic structure (in turn, Gp1 and CA) and above were environment factors (in turn, present and LGM period environment and environmental change since LGM)

The genetic landscape shape analysis also produced somewhat similar patterns of population differentiation among cpDNA and nSSR markers (Fig. 3). The surface plots of cpDNA and nSSR genetic distances showed a drastic drop in group HN, which was consistent with the lower genetic diversity indices observed in this group. The nSSR and cpDNA genetic distances increased from groups HH–LCJ to NPJ. Meanwhile, a genetic barrier for cpDNA was detected between group HH–LCJ and the other two groups based on Monmonier’s maximum difference algorithm (Fig. 1b).

Fig. 3
figure 3

Genetic landscape shape analysis for Quercus kerrii populations based on a cpDNA variation and on b nSSR variation at ten microsatellite loci. The x and y axes show the geographical locations within a Delaunay triangulation network constructed among the sampled populations. Surface heights reflect genetic distances among populations

Discussion

Demographics and distribution dynamics

East Asia was not directly impacted by extensive and unified ice-sheets during the Quaternary period (Liu 1988), but the climatic fluctuations of this region, especially since the LGM period, played an important role in shaping present species distributions and spatial genetic patterns (Qiu et al. 2011). We observed the potential distribution area of Q. kerrii in the present period is 1.27 times greater than that of the LGM period, suggesting that the distribution range of Q. kerrii underwent northward expansion since the LGM (Fig. 4). The neutral test for the cpDNA sequences suggested that the species did not undergo significant demographic expansion, but the star-shaped haplotype network (Fig. 1b) as well as the unimodal distribution revealed by the mismatch distribution analysis (Fig. S1) implied a demographic expansion occurred in Q. kerrii. The consistent SDMs and molecular results indicated that the range expansion of Q. kerrii could have been coupled with demographic expansion. This dynamic pattern fits with the general pattern of organisms responding to climatic change, as species experienced southward retreated and northward colonization during the glacial/interglacial period (Hewitt 2004). However, the fluctuation of the distribution range of Q. kerrii was far less than that indicated by previous reconstructions of the East Asia biome using palynological data. These analyses have indicated that the EBLFs of East Asia obviously retreated to the south region, to a latitude of about 24°N during the LGM (Harrison et al. 2001; Yu et al. 2000). The distributions of other dominant tree species of EBLFs in south China also suggest that subtropical tree species have had relatively stable distributions since the LGM. For example, the phylogeographic studies on Quercus glauca in southeast China (Xu et al. 2015) and Quercus schottkyana (Jiang et al. 2016) in southwest China indicate that their distributions have been stable and widespread since the LGM. Diversified topography and low amplitude fluctuations of climate during the Quaternary period in southwest China have determined that this region has acted as refugia for many plants and have provided relatively stable and suitable habitats.

Fig. 4
figure 4

Potential species distributions in the a Last Glacial Maximum and b present periods. The colours of the rectangles in the lower left refers to different ranges of habitat suitability. Dots represent the points of sampled populations in our study, and dot colours represent the different ranges of PCA1 scores estimated from 19 environmental variables

The distributions of Q. kerrii and Q. schottkyana (another evergreen oak species in the Group Cyclobalanopsis) mostly overlap in southwest China. However, quantitative analysis for the SDM results indicated that Q. kerrii has undergone a larger expansion of potential distribution range since the LGM than Q. schottkyana, with potential distribution expansions of 1.27 and 1.03 times, respectively (Jiang et al. 2016). Except for dispersal distances of seed and pollen gains, the propensity for species to shift in altitudinal range at the regional scale also affects the range dynamics of species in response to climatic fluctuations. Species are expected to track warming and cooling climates by shifting their ranges to respectively higher and lower latitudes or elevations (Chen et al. 2011; Zhu et al. 2012). Owing to the climate diversity along the vertical gradient of mountains, species that are distributed along middle elevations of montane areas can easily migrate up or down elevations as the climate warms or cools, thus tracking suitable habitats. Quercus kerrii and Q. schottkyana have similar biological traits, including the limited dispersal ability of their seeds, wind pollination and substantial longevity. Both are local dominant species of EBLFs in southwest China as well; therefore, the two species might exhibit similar local adaptability, though this is limited by their distinct distribution elevations. Quercus schottkyana is commonly found at elevations of 1500–2500 m asl, the middle mountain region; these species may be able to migrate upward or downward along mountains in response to warming or cooling climate patterns. The main distribution of Q. kerrii in southwest China is restricted to low elevations (50–1300 m asl) in the river gorges and the basins scattered throughout the mountains. Therefore, as the climate cools, its suitable habitats become more limited, with a tendency for the species distribution to retreat to southern low-land refugia but expand northward when climate warms, for example, during the Quaternary glacial/interglacial cycles. The species’ distinct levels of resistance to low temperatures in their habitats may underlie the different demographic dynamics of the two species during the Quaternary glacial cycles.

Genetic isolation and environmental heterogeneity

We determined that the genetic diversity and genetic structure of Q. kerrii populations differed between the NPJ and HH–LCJ groups, which is consistent with the genetic barriers also occurring between the two groups (Fig. 1b). The SDM analysis for HH–LCJ and NPJ regions respectively indicated the existence of unsuitable habitat zones between the two regions as well (Fig. 5). Overall, this genetic differentiation and geographical isolation zones are geographically consistent with the Tanaka line, a straight line running approximately between 98°E, 28°N and 108°E, 19°N (Fig. 1b). Although it is difficult to estimate the precise divergence times between HH–LCJ and NPJ groups based on our data, our results suggested that the isolation between NPJ and HH–LCJ Q. kerrii groups might have coincided with the establishment of the Tanaka line.

Fig. 5
figure 5

Potential species distribution areas of Q. kerrii on the west side of the Tanaka line (represented in yellow) and east of the Tanaka line (represented in green). The maximum training sensitivity plus specificity threshold was used to determine the species presence threshold. The sizes of red dots represent the extent of climate change since the Last Glacial Maximum

The Tanaka line is an important biogeographic boundary describing the biota of the East Asian tropical-subtropical regions. There are a series of prominent species or genus pairs that are separated by the Tanaka line (Li and Li 1992). Previous phylogeographic studies on perennial herbs (Zhang et al. 2006), shrubs (Fan et al. 2013) and deciduous trees (Tian et al. 2015) found obviously divergent intraspecific spatial genetic pattern across the two sides of the Tanaka line. The complexity of the topography and climate of tropical–subtropical Yunnan are closely related to the uplift of the Himalayas and the consequent establishment of modern Asian monsoon systems in the Neogene (Shi et al. 1999). The fossil evidence suggests that the rapid uplift of the Yunnan-Guizhou Plateau occurred at the late Miocene and Pliocene, which was simultaneous to the key establishing stage of the modern Asian monsoon systems (Favre et al. 2015; Jacques et al. 2014). The paleoclimate reconstruction based on fossil assemblages demonstrated that the Southeast Asian and East Asian monsoon co-occurred in the Yunnan during the Late Miocene (Jacques et al. 2011) with a weaker monsoon intensity than is typical today (Li et al. 2015). The modern East Asian monsoons were gradually established and grew stronger between the Pliocene and Quaternary (An et al. 2001). East of the Tanaka line is mainly affected by the East Asian monsoon, while the west is under the strong influence of the Southeast Asian monsoon (i.e. the Indian Ocean monsoon; Li and Li 1992). The evidence from fossil assemblage, palaeogeography and paleoclimate data suggests that the biogeographic significance of the Tanaka line is related to historical tectonic movements and/or long-term environment differentiation, which has acted as a barrier to plant dispersal and gene flow between populations at the east and west sides of the Tanaka line (Li and Li 1997; Zhao and Gong 2015).

A large number of phylogeographic studies have indicated that the formation and establishment of drainage systems (which are also related to the uplift of the Yunnan–Guizhou Plateau) played a key role in disrupting gene flow between populations distributed across valleys or adjacent rivers (Yue et al. 2012). However, our study did not reveal these relationships between contemporary drainage systems and spatial genetic patterns of Q. kerrii. Within groups, the haplotype distributions of Q. kerrii populations on either side of rivers showed no obvious differences (Fig. 1b). The populations comprising the two groups HH and LCJ are separated by the Lan-cang River and Red River, respectively, but each group of populations exhibits similar genetic diversity and differentiation levels (Tables 2, 3 and 5). This phenomenon indicated that regardless of the poor dispersal ability of Q. kerrii acorns, the contemporary drainage patterns in southwest China are not the main factor that has shaped the current genetic patterns of the species.

The genetic pattern present in Q. kerrii suggests that the recent environmental heterogeneity to the east and west of the Tanaka line might play a key role in shaping genetic differentiation patterns within species. High genetic differentiation and the higher genetic distances of populations to the east of the Tanaka line (group NPJ) relative to those west of the Tanaka line (i.e. group HH–LCJ; Fig. 3) were detected. Correspondingly, the climatic factors of the east and west sides of the Tanaka line significantly differ in the present and at the LGM as well (Fig. 4 and Table 5). The east area of the Tanaka line experienced more severe climatic fluctuations since the LGM than the west side did (Fig. 5). Meanwhile, our niche modelling demonstrated there were no overlaps in potential distributions for NPJ and HH–LCJ groups, which further implicated the obvious niche differentiation that might occur between the populations east and west of the Tanaka line (Fig. 5). Environmental heterogeneity across a landscape can create genetic heterogeneity through evolutionary processes, e.g. natural selection, genetic drift and the founder effect (Antonovics 1971; Mitton 2000). The populations from group NPJ with the highest genetic diversity and genetic differentiation indices are located around the peripheral distribution area of Q. kerrii. These populations may therefore undergo more severe and very divergent environmental conditions compared to those in the core distribution area, which may, in turn, enhance divergence driven by local adaptation.

Generally, regions with long-term habitat stability have higher genetic diversity. On the contrary, we detected that environmental change since the LGM is significantly positively correlated with Q. kerrii genetic diversity (Table 4). Both interspecies hybridization and the admixture of individuals from different isolated populations within unstable habitat areas can boost species genetic diversity (Ortego et al. 2014; Ortego et al. 2012). There is ongoing hybridization and introgression between Q. kerrii and Q. austrocochinchinensis; however, the introgression from Q. kerrii to Q. austrocochinchinensis is predominant (An et al. 2017). Moreover, introgression at low to moderate levels only has a limited contribution to increasing genetic diversity in oaks (Ortego et al. 2014). Therefore, the admixture of populations with different genealogical origins in an unstable habitat during climate fluctuation seems a better explanation for the positive correlation observed between environmental change and genetic diversity in Q. kerrii.

Our study revealed some of the factors that influence key EBLF oak species through climatic change. The lowland EBLF species Q. kerrii is more sensitive to climatic change than the mid-elevation montane species Q. schottkyana. The EBLFs at low elevations in Indo-China might be more vulnerable to climatic change than those at mid-elevation montane areas; in turn, more conservation and management attention are needed for species occupying these areas. The regional genetic differentiation and genetic distances of Q. kerrii were uneven. Environmental heterogeneity between the east and west of the Tanaka line plays an important role in shaping genetic diversity and differentiation. Our study supports the hypothesis that long-term isolation and environmental heterogeneity between populations east and west of the Tanaka line drive inter- and intraspecific divergence of plants in the tropical and subtropical regions of Southeast Asia, which supports the biogeographic significance of the Tanaka line. However, the contributions of isolation and local adaptation to the divergence between the populations locates at the east and west of the Tanaka line still requires further investigation using more taxa and high-throughput genotyping of genome-wide markers.

Data archiving

The cpDNA sequences reported in this study were deposited in GenBank under accession numbers MF138014–MF138030. The nuclear microsatellite loci data (nSSR) is available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.0r20b