Tectonic and climatic impacts on the biota within the Red River Fault, evidence from phylogeography of Cycas dolichophylla (Cycadaceae)

Dramatic crustal deformation and river incision in Southwest China induced by the Indo-Asian collision have long been argued to contribute to the complicated landscapes, heterogeneous environment and abundant biodiversity in this region. However, biological impacts in promoting intraspecific phylogeographical subdivision and divergence along the Red River Fault zone (RRF) remain poorly understood. To investigate the possible biological effects of tectonic movements and environment variations within the RRF, the phylogeography of Cycas dolichophylla-an endemic but widely distributed Cycas in Southwest China and North Vietnam along the RRF were carried out based on four chloroplast DNA intergenic spacers (cpDNA), three nuclear DNA sequences (nDNA) and 16 simple sequence repeat variations (SSR). Two different phylogeographical patterns were detected: a Southwest-Northeast break across the RRF disclosed by chlorotypes and a China-Vietnam separation revealed by SSR. A Bayesian skyline plot from cpDNA data demonstrated a historical increasing, but a recent declining, dynamic in population size during the Pleistocene. Consequently, we infer it is the local environmental variation during Cenozoic that contributed to the complex landscape and microclimate mosaics, facilitating speciation and divergence of C. dolichophylla. Subsequently, the Quaternary climatic fluctuations coupled with human activities profoundly influenced the genetic structure and demographic history of this species.

The aligned length of three nuclear genes, GTP, PHYP and PPRC, were 573-574 bp, 940-946 bp as well as 720 bp, respectively. For GTP, 14 polymorphism loci were identified with 12 haplotypes confirmed (Table 1). Haplotype 1,7,8,9 (G1,[7][8][9] clustered together, while the others formed another lineage with the divergence time closing to 0.64 Ma (95% HPD: 0.25-1.09 Ma) ( Fig. 1b; Supplementary Fig. S1a). G2 was the most common haplotype, occurring in 11 of the 13 populations, with its geographic distribution range covering both southwestern and northeastern of the RRF. Nevertheless, although it was not a widely shared haplotype, G5 occupied an interior (ancestral) rather than tip (derived) position in the network with the derived haplotypes showed a satellite-like distribution (Fig. 1b). All of the individuals sampling from PB were restricted to a private haplotype (G11), whereas the BN population was fixed by G1 with one MG sample included (MG-9). As for PHYP, 22 haplotypes were defined based on 26 polymorphism loci (Table 1). These haplotypes were further grouped together in two clades, namely, P5 and P17 formed one clade and the other 20 haplotypes shaped another one where four mutational steps were taken for this separation (Fig. 1c). The timing of most recent common ancestors of these two lineages was estimated to 0.58 Ma (95% HPD: 0.28-0.92 Ma) ( Supplementary Fig. S1b). Similar to the results of GTP, population PB still occupied a private haplotype (P17) and BN was restricted to P1 although this haplotype was shared by CP, HS, MG and XL (Table 1). Interestingly, the population of MLP defined seven distinct haplotypes (P10-16) by a unique 34 bp or 68 bp deletion accompanied by several mutations. The most widely shared haplotype, P2, was in the interior position of the network with nine populations embraced and the other haplotypes formed a star-like structure including two small 'loops' (Fig. 1c). Eleven haplotypes were identified on the base of 11 polymorphism loci of PPRC. Two lineages were also confirmed by the maximum clade credibility tree with divergence time approaching to 0.53 Ma (95% HPD: 0.21-0.87 Ma) ( Supplementary Fig. S1c). However, relationships inferred from Network remained ambiguous because of the presence of multiple closed 'loops' (Fig. 1d). Apart from BN, CP and PB, haplotype 3 (R3) swept all of the remaining populations occurring in either southwest or northeast of the RRF. No matter in MP-derived strict consensus trees or the maximum clade credibility trees inferred by BEAST, all the combined cpDNA and three nDNA sequences suggested the monophyly of C. dolichophylla with fine support (MP bootstrap for cpDNA: 100%; GTP: 86%; PHYP: 86%; PPRC: 86%) ( Fig. 2; Supplementary Fig. S1a-c).
As for SSR data, a total of 217 alleles were identified by 16 microsatellite loci in 255 individuals from 13 populations (Supplementary Tables S1 and S2). Results of Bayesian clustering inference ( Supplementary Fig. S1e,f) indicated that the most likely number of clusters was 2 for all the C. dolichophylla populations when the Δ K statistics were applied. When considering the second fittest number of grouping, which suggested as K = 3, it was almost similar to the case in K = 2 with only minor subdivision. Additionally, all the structure analysis derived from UPGMA and PCo ( Supplementary Fig. S1d,h) supported for two grouping division. The BARRIER analysis ( Supplementary Fig. S1g) showed that there was only one major genetic boundary with a 58% mean bootstrap value, separating BN from the remnant twelve populations. Therefore, two lineage clusters corresponding to cluster I and cluster II in UPGMA were applied in this study. The AMOVA analysis (Table 2) reflected that there were nearly parallel contributions on molecular variations both among and within populations derived from PHYP (50.90% vs 49.10%), GTP (55.08% vs 44.92%) and PPRC (59.24% vs 40.76%). However, 74.3% of genetic variations within populations and 25.7% among populations were detected from SSR data. The fixation indexes calculated by these three nuclear genes as well as SSRs were all greater than threshold 0.25, indicating significant genetic variation between populations. The gene flows inferred from nuclear genes were lower than the critical value 1 (Table 2). However, results of analysis of gene flows between each pair of the 13 populations in C. dolichophylla based on SSR data (Supplementary Table S3) were disparate with cpDNA and nDNA, which exhibited that XL and BX exchanged genetic materials more frequently

Markers
Source of variation d.f. with other populations no matter from South Yunnan or Vietnam than others. Moreover, the highest level of gene flow was detected between DV and NH whose geographic distance was considered to be the closest among these populations.

Percentage of variation (%) Fst Nm
Population genetic diversity and structure. For the cpDNA data, the average within-population diversity (H S = 0.320) was much lower than the total diversity (H T = 0.994), and both N ST (0.834) and G ST (0.678) were high (Table 3). U-test showed that N ST was significantly larger than G  Table S4). For nDNA, multimodal or bimodal distributions were detected with the appearance of non-significant SSD and H Rag (Supplementary Fig. S3b-d). The results of neutrality test and Fu's Fs for GTP and PPRC were similar, with positive but not significant values acquired. Nevertheless, the case for PHYP was discrepant: all the SSD statistic and H Rag value were non-significant, along with negative results of neutrality test and Fu's Fs (except for Fu and Li's D*) (Supplementary Table S4).
Historical demography inferred by Bayesian skyline plot based on a jointed cpDNA matrix uncovered two dynamic events: during about 0.57-0.70 Ma, C. dolichophylla experienced population expansion (Fig. 3a). However, it began to contract in effective population size at approximately 0.10 Ma with a rapid decline rate detected during about 70 Ka. Population demographic histories deduced from nDNA revealed that C. dolichophylla have experienced population size contraction mainly during the late Pleistocene to Holocene (Fig. 3b-d).
For both the Sign and Wilcoxon tests (Supplementary Table S5), no significant heterozygosity excess was detected under the S.M.M assumption. However, on the postulation of a more conservative and powerful model (T.P.M), Sign test revealed significant excess of heterozygosity in PB. As for the Wilcoxon test, it seemed to indicate that BN and CP have experienced significant heterozygosity excess, while NHC and PB suffered the most significant excess of heterozygosity, all of which implied a departure from mutation-drift equilibrium 23 . Moreover, a normal L-shaped mode-shift distribution was forecasted in each of the population. All the G-W indices did not exceed the threshold level of 0.68 (Supplementary Table S5).

Discussion
A major finding of the present study was that C. dolichophylla comprised two geographically distinct lineages as inferred from the chlorotype network analysis, namely a southwestern clade distributed to the southwest of the RRF (including PB) and a northeastern clade occupying the northeast region (Fig. 1a). This genealogical split was geographically largely consistent with one phytogeographical boundary, 'Tanaka Line (TKL)' 24 , except for two southwestern-clade chlorotypes, C17 and C21, that occurred in the northeast of RRF. These two chlorotypes escaping from southwest lineage was best explained by the local adaptation to novel selective pressures during the species' range expansion into geographically and ecologically different environments. However, the missing of ten  interior haplotypes was mainly due to the species' long evolutionary history during historical climate oscillations, geological as well as recent human activities 25,26 .
The network of nuclear haplotypes, no matter in GTP, PHYP or PPRC, all failed to register a prominent genealogical break across the RRF although several exclusive haplotypes were discovered in southwestern and northeastern clades, respectively (Fig. 1b,c). Phylogenetic relationships of nucleotypes inferred from BEAST trees remained confusion (Supplementary Fig. S1a-c). Such phylogenetic ambiguity could be explained by the widely shared nucleotypes between southwestern and northeastern populations (e.g. G2, P2 and R3), which may to the retention of ancestral polymorphisms 27 . Interestingly, the results of SSR data (UPGMA and Bayesian clustering inference) revealed another distinct scene ( Supplementary Fig. S1d,f): C. dolichophylla was split into two groups divided by national boundary, a Chinese group with populations from BN, JM, MLP and PB and a Vietnam group with the populations from northern Vietnam, as well as two Chinese populations, HS and MG. The explanation for two Chinese populations nested in Vietnam lineage might be the closer geographical distance to Vietnam, accompanied with the influence of southern drained Red River system as well as the frequent human activities, such as commercial trades and the introduction of exotic populations. The observation of much stronger phylogeographical division across the RRF in cpDNA as opposed to nDNA and SSR is associated with the different inheritance modes and evolutionary rates between nuclear and organelle genomes 28 .
The discovery of two distinct groups with cpDNA data was not unexpected, given the complex topography and heterogeneous environmental conditions in Southwest China, especially along the RRF. However, for a newly radiated species, the local habitat differences in the immense region with varying altitudes, soils, slopes as well as ecological selection from pollinators and seed transporters, seemed more promising factors to explain the diversification of C. dolichophylla than geological uplift. On one hand, the re-diversification of Cycas mainly happened in the late Miocene 13,29 and the divergent time derived from chlorotypes of C. dolichophylla was during the early Pleistocene (1.94 Ma). On the other hand, most of the tectonic process in Southwest China happened during or even before the Miocene, such as uplift, crustal thickening, strike-slip and river incision [1][2][3][4]11 . Therefore, we proposed that it is the heterogeneous topography along the Red River region that acted as physical barriers to restrict gene flow between cycads, with global climate changes 30 resulted in the diversification of Cycas species in this region. Subsequently, the local environmental variation contributed to the formation of microclimate pockets 31 , providing suitable and stable environment for species to survive through hostile climate fluctuation and accumulate genetic variations that gave rise to the speciation of C. dolichophylla, as well as the subsequent differentiation of the Southeastern and Northwestern subclades 32 . However, the fluctuation of Quaternary climate also made particular impacts on the population dynamics of this newly formed species, which was evident from our BSP inferences and neutrality tests ( Fig. 3; Supplementary Table S4).
During the largest Naynayxungla Glaciation (0.72-0.50 Ma) when valley glaciers dominated 33 , it was difficult for this cliff species to flourish, but underwent range contraction, limiting intraspecific gene flows and further diversification of nDNA lineages. The subsequent interglacial period provided opportunity for C. dolichophylla to undergo spatial expansion and population growth. Nonetheless, with the persistence of cold conditions until the late Ionian Stage (0.3-0.13 Ma) together with the onset of Riss glacial cycle, the enlarged C. dolichophylla populations began to shrink. Moreover, during the last stage of the last glacial period (32-12 ka), cold and arid climatic conditions established, accompanied by a strengthened Asian winter monsoon dominating over continental Southeast Asia 34 . Being adapted to warm and moist conditions, such abnormal climate variations restricted C. dolichophylla to scattered habitats and thus hindered gene flow between populations. Although the Bottleneck analysis based on SSR data failed to register any recent bottleneck event on this species, the small G-W indices implied a historical bottleneck, which coincided with the demography history (Supplementary Table S5). Concurrently, genetic drift probably played a key role in differentiation, regarding the small historical population size of C. dolichophylla estimated by LDNe (Supplementary Table S6).
In conclusion, our results together with evidence from previous researches jointly support the perspective that it is the climate changes and environmental heterogeneity (pockets of microclimate) that gradually cast the evolutionary history of C. dolichophylla. That is, tectonic activities, river incision as well as the reinforcement of Asian monsoon systems didn't work directly on speciation, but local habitat differences coupled with ecological selection from pollinators and seed transporters gave birth to the speciation of C. dolichophylla. Whereas the alternation of glacial and interglacial periods from the mid-Pleistocene onwards played critical roles in shaping this species' population structures and demographic history.

Methods
Population sampling. A total of 255 foliage samples from 13 populations of C. dolichophylla in RRF were collected, which covered almost the entire range of this species ( Fig. 1; Supplementary Table S7). Each population was represented by 11-28 individuals. Fresh leaves were dried in silica gel immediately after collection for DNA extraction. Ten individuals per population were randomly selected for chloroplast and nuclear DNA sequencing, with C. edentata and C. rumphii being employed as outgroups. All of the 255 individuals were used for the microsatellite study.
DNA extraction, amplification and sequencing. Genomic DNA was extracted from the silica-dried leaves by modified CTAB method 35 . Four cpDNA intergenic spacers, atpH-atpI 36 , psbM-trnD 36 , trnL-trnF 37 and trnS-trnG 36 , as well as three single-copy nDNA genes, GTP-binding protein Era (GTP) 38  Data analysis of DNA sequences. Prior to conducting all analyses, congruence test was generated by PAUP* 4.0 b10 46 for the four cpDNA sequences. Although the result suggested indistinctive degree of homogeneity between those fragments (P < 0.5), a combined cpDNA matrix was still applied for the following analyses under previous suggestions 47 , whereas the three nDNA sequences were analysed separately. Genetic diversity indices were calculated by DnaSP, version 5.0 48 . Permut v.1.0 was used to calculate the mean genetic diversity within populations (Hs), total genetic diversity for species (H T ) and coefficient of population differentiation (G ST , N ST ) 49 . Analysis of molecular variance (AMOVA) 50 was conducted to partition genetic variation among and within populations. All of those analyses were performed with Arlequin version 3.1 51 . To uncover the reticulate evolutionary relationships between haplotypes, networks of chlorotypes and nuclotypes were constructed on Network, version 4.2.0.1 52 . Large fragment of indels were treated as a single mutation. Phylogenetic relationships among haplotypes were inferred based on Maximum Parsimony (MP) and Bayesian Inference (BI) methods with C. edentata and C. rumphii served as outgroups. For the MP trees, multi-trees heuristic search and tree bisection-reconnection branch swapping were adopted with 1000 repeats bootstrap analysis performing on PAUP. For BI tree search, the Markov chain Monte Carlo (MCMC) runs started from independent random trees and extended for 1.0 × 10 7 generations, were repeated twice with trees sampled every 1000 generations implemented in MrBayes 3.1.2 53 .
Divergence times among haplotypes were estimated under a Bayesian framework in BEAST 1.6.1 54 with HKY + G and HKY applied as the most suitable nucleotide models for joint cpDNA and three nDNA fragments, respectively. Since there was no well-documented evolutionary rate available for Cycas, previously estimated rates for seed plants, 1.01 × 10 −9 and 5.1-7.1 × 10 −9 substitutions per site per year for synonymous sites (s/s/y) for cpDNA and nDNA, respectively, were adopted in this analysis 55 . MCMC chains were run for 1.0 × 10 7 generations, sampling once every 1000 generations. All the above parameters were performed by BEAUti 1.6.1 in the BEAST package. TreeAnnotator 1.6.1 was used to create a maximum clade credibility tree with the first 10% trees discarded as burn-in 56 . Pairwise mismatch distribution and neutrality tests [57][58][59] were operated in DnaSP. Historical demography of C. dolichophylla was inferred from Bayesian skyline plots, which were generated by BEAST. Maps were drawn using the software ArcGIS version 10.2 (http://desktop.arcgis.com).
Data analysis of SSR data. The indices of genetic diversity and genetic differentiation were calculated using GenAlEx version 6.4.1 60 . Gene flow (Nm) between populations were calculated with formula Nm = (1-F ST )/4F ST 61 , with the F ST acquired from Arlequin. AMOVA analysis was performed on Arlequin. Isolation by distance (IBD) was tested by performing Mantel tests in GenAlEx using a correlation of geographic distance with genetic distance for all pairs of populations. To infer genetic structures of sampled populations and individuals, unweighted pair group mean analysis (UPGMA) were carried out on TEPGA, version 1.3 62 , with 5000 of permutations. Meanwhile, an individual-based principal coordinate analysis (PC O ) was visualized by the program MVSP, version 3.12 63 . A model-based Bayesian clustering method was applied to assign individuals from 13 populations using the STRUCTURE version 2.3 64 . Twenty independent runs were performed for each set, with grouping number (K) ranging from 1 to 20. For each run, a burn-in of 1 × 10 5 iterations and 1 × 10 5 subsequent MCMC steps were performed. In addition, the second-order rate of change of the natural logarithm probability of the data with respect to the number of clusters (Δ K) was evaluated to deduce the best fit number of grouping with the online tool STRUCTURE HARVESTER V.0.6.93 65 . Barrier boundary analysis was carried out to test whether there were genetic borderlines existed between populations based on genetic distance matrices performing on BARRIER version 2.2 66 .
To detect recent bottlenecks caused by reduction in effective population size, the observed gene diversity was compared with the equilibrium gene diversity given the observed number of alleles implementing in BOTTLENECK version 1.2.02 67 . The Garza-Williamson index (G-W) 68 was also estimated for testing early shrink in population size using Arlequin software. Effective population sizes (N E ) with the linkage disequilibrium (LD) method implemented in LDNe 69 at three levels of lowest allele frequency (0.01, 0.02, 0.05) for a 95% confidence interval were calculated for conservation strategic design.