Distinct late Pleistocene subtropical-tropical divergence revealed by fifteen low-copy nuclear genes in a dominant species in South-East China

In East Asia, genetic divergence is usually considered to be correlated to different floristic regions, however, subtropical-tropical divergence is largely ignored, compared to widely explored temperate-subtropical divergence. Lindera aggregata (Lauraceae), a dominant species in South-East China was selected to address this issue. Fifteen low-copy nuclear genes (LCGs) and four chloroplast DNA (cpDNA) fragments were used to detect its evolutionary history. In LCGs, STRUCTURE and dated Bayesian phylogeny analyses detect distinct subtropical-tropical divergence since late Pleistocene. Approximate Bayesian calculation (ABC) further supports the distinct subtropical-tropical divergence, and close related Taiwan and South China populations are diverged at the last interglacial. Isolation by distance, isolation by environment and isolation by resistance analyses suggest the current climatic difference rather than geographical distance contributes to the genetic differentiation. Principle component analysis shows populations of tropical cluster occur in warmer area with higher precipitation. Ancestral area reconstruction based on Bayesian phylogeny indicates that ancestral L. aggregata populations are distributed in tropical region. In cpDNA, although unique haplotypes are found in tropical region, distinct subtropical-tropical divergence is absent. In conclusion, distinct late Pleistocene subtropical-tropical divergence of L. aggregata is triggered by climate. It is likely that L. aggregata is originated in Southwest-South China and experienced hierarchical dispersal from south to north. The South China Sea land bridge has dual role in connecting or isolating Taiwan and mainland populations since the last glaciation.


Results
Haplotype distributions and Bayesian inference of cpDNA. The combined four cpDNA fragments revealed ten haplotypes and one haplotype (H3) was widely distributed in our previous work 20 . In the present study, six additional haplotypes are obtained through six additional substitution sites (Table 1 and Supplementary Table S1). The haplotype H3 is also widely distributed and more diverse haplotypes are also found in southern region ( Supplementary Fig. S1). All the six new haplotypes are private in the added populations. Especially, Taiwan population is composed of two private haplotypes (H14 and H15, Supplementary Fig. S1). Although low lineage divergence is also found in Bayesian phylogeny 20 , haplotypes from Taiwan (H14 and H15) and haplotypes from populations in South China (H11 in NAN and H1 in BLSH) form a monophyletic lineage ( Supplementary Fig. S1).
Genetic structure and Bayesian inference of LCGs. In our previous study 20 , eighteen sampled populations were divided into three distinct clusters with one located in south, the second in the north-eastern part and the third in the north-western part using LCGs. In the present study, K = 5 is the most possible number of genetic clusters using 29 populations based on LnP(D) and ΔK ( Supplementary Fig. S3). The north-western and north-eastern clusters remain the same, while additional populations in the admixture and tropical region form other three clusters, which are southern cluster (in brown) and Taiwan cluster (in blue) in tropical region www.nature.com/scientificreports/ and north-central cluster (in yellow) located in the middle region (Fig. 1a). In the Bayesian phylogeny, the subtropical-tropical divergence is supported although the lineage of Taiwan and southern cluster has low posteriors probability supports (Fig. 1b). The divergence time of subtropical-tropical populations is estimated as 109.7 thousand years ago (ka) with 95% highest posterior density interval (HPD) as 72.9 to 171.9 ka and Taiwan-South China populations as 90.7 ka (95% HPD, 66.5-119.2 ka). High genetic differentiation is found between tropical and subtropical clusters with the highest found between Taiwan and other clusters ( Table 2). Taiwan cluster has low genetic diversity, especially haplotype richness, A R , while the other four clusters have similar level of genetic diversity ( Table 2).

DIYABC analyses and ancestral area reconstructions. In DIYABC, scenario 3 (southern cluster and
Taiwan cluster coalescing first and then coalescing with subtropical clusters) which supports the subtropicaltropical divergence receives the highest support (PP = 0.67) compared to scenario 1 (PP = 0.23, southern cluster and subtropical clusters coalescing first and then coalescing with Taiwan cluster) and 2 (PP = 0.10, Taiwan cluster and subtropical clusters coalescing first and then coalescing with southern cluster) (Fig. 2). Based on a generation time of approximate 10 years 20 , the divergence times of subtropical-tropical and Taiwan-South China populations are estimated as 154 ka (95% HPD, 64-275 ka) and 112 ka (95% HPD, 54-173 ka), respectively. Other parameter estimations are shown in Table 3 and Supplementary Fig. S4. BBM analysis in RASP indicates the most possible distribution of ancestral L. aggregata populations is located in tropical region (D, 76%) and they experience vicariance between tropical and South China region after dispersal (D → CD → C|D, Fig. 3). The ancestral distribution of subtropical populations is likely within South China (C, 64%). Table 2. Pairwised genetic differentiation (F ST ) and genetic diversity as measured by π and haplotype richness, A R , among the five potential clusters as inferred by STRU CTU RE in low-copy nuclear genes (LCGs).  Figure 2. The three divergence scenarios with posteriors probability (PP) among southern, northern and Taiwan clusters of Lindera aggregata (a-c). The effective population size of the three clusters is labeled as N south , N north and N tw . t1/t2, divergence times for the depicted event. Table 3. Posterior median estimation and 95% highest posterior density interval (HPD) for demographic parameters in scenario 3 of Lindera aggregata in DIYABC. N south , N tw and N north represent current population size of southern, Taiwan and northern cluster; t1/t2, divergence times in years when a generation time of 10 years is applied for the depicted event; μ, mutation rate per generation per locus. Based on geographic, environmental and genetic distance, Mantel test and partial Mantel test show significant IBE (r = 0.24, P = 0.03), or (r = 0.23, P = 0.04) when accounting for geographic distance, while no significant IBD (r = 0.11, P = 0.08), or (r = 0.07, P = 0.20) when accounting for environmental distance (Table 4, Supplementary  Fig. S5). The multiple matrix regression with randomization (MMRR) also indicates significant IBE (r = 0.23, P = 0.04) while not IBD (r = 0.07, P = 0.38). Mantel test further shows significant IBR (r = 0.24, P = 0.03). The first two axes of the principal component analysis, PCA, on climate data for the investigated populations explained 80.4% (axis 1: 57.7%, axis 2: 22.7%) of the total variation. Figure 4 and Supplementary Table S3 show that the tropical populations tend to occur in warmer areas with higher precipitations compared to subtropical populations.

Discussion
Late Pleistocene subtropical-tropical divergence triggered by climate. With additional 11 sampled populations, especially in the tropical and genetic admixture region 20 , more detailed genetic structure is revealed and distinct subtropical-tropical divergence is detected. Although one widespread haplotype (H3), more diverse haplotypes in southern region and no distinct structure are consistently found using chloroplast markers 20 , various analyses including genetic differentiation distribution, genetic structure, Bayesian phylogeny, ABC modeling of nuclear markers, which have much higher mutation rate 24 , successfully detect the recent subtropical-tropical divergence that can be only traced back to late Pleistocene (109.7/154 ka).
The IBD and IBE analyses indicate climatic difference rather than geographical distance is responsible for the genetic differentiation of L. aggregata. The PCA analysis shows that tropical populations are tend to occur in warmer areas with higher precipitations compared to subtropical populations, similar to our previous study 20 . It should be noticed that a significant IBE does not necessarily imply the occurrence of adaptation to local environments 25,26 . Short evolutionary history (since late Pleistocene) occurred in small scale compared to the   22 , and the resistance distance was calculated based on the ENM at present using 10 low correlated bioclimatic variables 20 .
The absence of significant IBD shows genetic barrier of Wuyi and Nanling Mountains in other species 20,27,28 would act as dispersal corridor in L. aggregata as introgressions from three subtropical clusters are found 20,29 . The distinct clustering of subtropical red and green or even yellow cluster may be caused by demographic expansions. Waters et al. 30 suggest the genetic partitioning of re-colonizing genotypes could potentially produced by a combination of occasional northward long distance dispersal (LDD) and high-density blocking. In L. aggregata, occasional LDD dispersed by birds 31 combined with high-density hindering would cause genetic partitioning in subtropical region 30,32 although low-level of genetic differentiations are found ( Table 2). Taiwan Strait 33,34 and South China Sea (SCS) 35 should act as geographical barriers between Taiwan and the mainland populations even though no significant IBD is found. High pairwised genetic differentiations between Taiwan and other clusters combined with low genetic diversity (especially A R ) in LCGs and two unique chloroplast haplotypes indicate Taiwan population is long-term isolated 36 . Bayesian phylogeny of LCGs further implies that it is likely a relict population. Thus, the Taiwan Strait and SCS have successfully impeded both pollen and seed dispersal between Taiwan and the mainland. While in Qiongzhou Strait, shared nuclear and chloroplast genetic component between Hainan island and the mainland ( Fig. 1 and Supplementary Fig. S1) signify effective seed and pollen dispersal.

Dual role of SCS and hierarchical south-north dispersal.
Our previous work indicates L. aggregata have experienced postglacial northward range expansions from long-term persisted southern refugia populations 20 . The present study with additional populations could trace its earlier demographic history in tropical region. Dual role of SCS land bridge and hierarchical south to north dispersal shape the present distribution of L. aggregata populations.
In Taiwan, its cpDNA haplotypes are closely related to that from South China although limited variations are found ( Supplementary Figs. S1 and S2). This closer relationship is further confirmed in ABC estimations using LCGs (Fig. 1). DIYABC estimates a similar mutation rate range ( Table 2) compared to Ye et al. 20 and a similar divergence time (112 ka) between Taiwan and South China populations with BEAST (90.7 ka). The time falls into the last interglacial when sea level raised 37 . The sea level fluctuations of SCS due to glacial-interglacial alternations can both provide dispersal corridor or barrier 38,39 . The L. aggregata populations likely showed continuous distribution between South China and Taiwan in early stage, while geographic isolation afterwards has resulted in genetic distinctness. Previous study on Quercus championii (Fagaceae) also suggests the Pleistocene SCS land bridge contributed to flora of Taiwan island by dispersal from ancestral Southwest China-Southeast Asia 35 . Some terrestrial vertebrates also display similar patterns 40 .
Hierarchical south to north dispersal of L. aggregata is mainly supported by the RASP analysis. Lindera is supposed to be tropical Asia origin as the ancestral types are concentratedly distributed in Southwest and South China 41 . Chloroplast phylogenomics further indicate that L. aggregata is closely related to species that are distributed in Southwest and South China 42 . As RASP indicates that the ancestral populations of L. aggregata are located within tropical region (Fig. 3). It can be speculated that populations in Taiwan and South China (brown and blue cluster, Fig. 1) are firstly established, likely from Southwest and/or South China. Then populations in www.nature.com/scientificreports/ narrow belt between Nanling-Wuyi Mt. and SCS (yellow cluster, Fig. 1) are formed from tropical populations. At last, north-most populations (red and green clusters) are colonized from southern refugia populations, likely located in South China (C) floristic sub-region as show in RASP analysis (Fig. 3), after LGM 20 .

Conclusion
Different floristic regions in East Asia are likely result in different evolutionary histories. The present study firstly shows distinct subtropical-tropical divergence using a dominant species in the evergreen broadleaved subtropical forests in East Asia. It is an important case study supplement to the widely investigated temperate-subtropical differentiation related genetic divergence. The genetic divergence is mainly attribute to the current climatic conditions and the present distribution of L. aggregata populations are formed through dual role of SCS land bridge and hierarchical south to north dispersal since the late Pleistocene. In conclusion, although South-East China harbors low heterogeneity of topography and climate, subtropical-tropical divergence may also be established. However, further genetic investigations on other species are needed to verify this assumption.

Methods
Sampling and sequencing. We collected leaf samples of a total of 92 individuals from 11 populations (  44 with an admixture model and assuming allele frequencies to be correlated among populations. Ten independent runs were performed for each number of populations (K) from 1 to 10 with 100,000 Markov chain Monte Carlo (MCMC) steps of burn-in, followed by 1,000,000 steps. LnP(D) and ΔK were applied to determine the most likely number of clusters. Pairwised genetic differentiation (F ST ) and genetic diversity as measured by π and A R of LCGs among potential clusters were calculated in SPADS 1.0 45 .

Bayesian phylogeny and divergence time estimation.
In LCGs, the Bayesian phylogeny of 29 populations (population WGSH was excluded due to sequencing problems) was inferred using *BEAST 46 in BEAST 2.4 47 with all partitions unlinked. The substitution models for all loci were the same as Ye et al. 20 . A strict clock model and Yule process with a piecewise linear and constant population size model were applied. The length of the MCMC algorithm was set to 2 × 10 9 steps with sampling every 2 × 10 4 steps, and the first 20% was discarded as burn-in. The estimated substitution rate of 3.4 × 10 −9 with 95% HPD of 1.8 × 10 −9 -5.8 × 10 -9 site −1 year −1 was applied to estimate divergence time 20 .

DIYABC analysis.
As the lineage including the Taiwan population (TW) and populations in southern cluster (NAN, FCG, GJ, HN, BLSH and ZHSH) does not receive significant support (see "Results"), the subtropicaltropical divergence was further checked through approximate Bayesian calculation (ABC) in DIYABC 2.0 48 . All individuals were subdivided into TW, southern cluster, and northern cluster (including north-western, northeastern and north-central clusters) in subtropical region according to STRU CTU RE and Bayesian phylogeny analyses (see "Results"). Three possible scenarios were simulated (Supplementary Table S2). Each simulation was summarized by the following summary statistics: number of segregating sites, mean and variance of pairwise differences, private segregating sites, and mean and variance of number of rarest nucleotide at segregating sites with cluster, mean of segregating sites, mean of pairwise differences, and F ST between pairs of clusters. The simulation was repeated 1 × 10 6 times for each scenario. To compare the posterior probability of three scenarios, the 3 × 10 4 (1%) simulated data sets closest to the observed data set were selected for the logistic regression and 300 for the direct approach. After choosing the best scenario, we estimated parameter posterior distributions taking 1 × 10 4 (1%) simulated data sets closest to the observed data set for the local linear regression, after applying a logit transformation to the parameter values. Two independent runs were performed in all simulations.
Ancestral area reconstructions. In order to reconstruct the geographical diversification of L. aggregata, License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.