High species diversity may result from recent rapid speciation in a ‘cradle’ and/or the gradual accumulation and preservation of species over time in a ‘museum’1,2. China harbours nearly 10% of angiosperm species worldwide and has long been considered as both a museum, owing to the presence of many species with hypothesized ancient origins3,4, and a cradle, as many lineages have originated as recent topographic changes and climatic shifts—such as the formation of the Qinghai–Tibetan Plateau and the development of the monsoon—provided new habitats that promoted remarkable radiation5. However, no detailed phylogenetic study has addressed when and how the major components of the Chinese angiosperm flora assembled to form the present-day vegetation. Here we investigate the spatio-temporal divergence patterns of the Chinese flora using a dated phylogeny of 92% of the angiosperm genera for the region, a nearly complete species-level tree comprising 26,978 species and detailed spatial distribution data. We found that 66% of the angiosperm genera in China did not originate until early in the Miocene epoch (23 million years ago (Mya)). The flora of eastern China bears a signature of older divergence (mean divergence times of 22.04–25.39 Mya), phylogenetic overdispersion (spatial co-occurrence of distant relatives) and higher phylogenetic diversity. In western China, the flora shows more recent divergence (mean divergence times of 15.29–18.86 Mya), pronounced phylogenetic clustering (co-occurrence of close relatives) and lower phylogenetic diversity. Analyses of species-level phylogenetic diversity using simulated branch lengths yielded results similar to genus-level patterns. Our analyses indicate that eastern China represents a floristic museum, and western China an evolutionary cradle, for herbaceous genera; eastern China has served as both a museum and a cradle for woody genera. These results identify areas of high species richness and phylogenetic diversity, and provide a foundation on which to build conservation efforts in China.
Species composition within a geographic area is the result of historical processes including speciation, extinction, migration6 and ongoing ecological interactions. The extent to which each process has contributed to spatial and temporal patterns of biodiversity, as well as community assembly, varies across the landscape. The biodiversity patterns within a region may result from a recent increase in the rate of speciation that has generated a cradle of biodiversity. Alternatively, biodiversity may derive from the presence of numerous surviving ancient lineages, together forming a museum region. The process of speciation and the maintenance of ancient lineages need not be mutually exclusive, and some regions have features of both cradles and museums.
The evolutionary history of regional floras has typically been addressed using specific taxa as exemplars7,8,9 or by examining the entire flora at various taxonomic levels10,11,12. These investigations provide insights into historical factors, including geological history, climatic shifts and evolutionary processes, that might have contributed to modern geospatial patterns of biodiversity13,14. Concomitantly, these studies lay the foundation for decision-making in conserving biodiversity. However, few studies have explored the biodiversity patterns of a large region incorporating dated phylogenies and detailed distribution data.
China, which is home to 30,000 of the approximately 350,000–400,000 species of vascular plants15, is ideal for investigating patterns of biodiversity because of its large size, range of habitats, considerable biological diversity and heterogeneous physical geography. Whether areas within China serve as cradles or museums remains unclear, as floristic components of putative ancient origin3,4 and of recent diversification5 have both been discovered. It has previously been suggested16 on the basis of comparisons between the taxonomic richness of vascular plants in China and the United States, that the greater species diversity in China reflects the region’s complex topography and long connections with tropical South-East Asia. On the basis of patterns in species richness (using 555 endemic seed plant species), mountainous regions of central and southern China have been identified as the main centres of plant endemism17. Previous studies have attributed most of the geographic variation in species richness of woody plants in China to temperature seasonality18 and the extent of winter cold19. Notably, to our knowledge, no previous study has incorporated both phylogenetic and spatial components to address the evolutionary history of the Chinese flora.
We conducted a broad assessment of spatio-temporal divergence patterns and of the assembly of the Chinese angiosperm flora, using a robustly dated phylogeny as well as species distribution data (i) to document the relative proportions of ancient and recent divergences that shaped the extant Chinese angiosperm flora in various geographic regions; (ii) to investigate the differential spatio-temporal divergence patterns of woody and herbaceous genera and their relationships with environmental variables; and (iii) to compare genus- and species-level measures of phylogenetic diversity and explore their conservation implications for the Chinese flora.
Our phylogeny resolved evolutionary relationships among all major angiosperm lineages in China (Extended Data Fig. 1), yielding topologies that are highly similar to those for angiosperms as a whole20,21. Our estimates of divergence times based on penalized likelihood and PATHd8 are congruent with one another, and agree with those obtained in recent studies of angiosperms on a global basis22,23 (Extended Data Fig. 2). Divergence time estimates show that 66% of Chinese angiosperm genera originated during the Neogene and Quaternary periods; the remaining genera diverged in the Palaeogene (29%) and Cretaceous (5%) periods. Additionally, the herbaceous genera have diversified much more rapidly than the woody genera during the past 30 million years (Extended Data Fig. 3).
We divided China into 100-km × 100-km grid cells, evaluated age variance within grid cells (Extended Data Figs 4, 5), and calculated mean divergence times (MDTs) and median divergence times of genera within each grid cell (Fig. 1; Extended Data Figs 6, 7; Supplementary Information). Mapping the MDTs of all genera revealed a transition belt that coincides with the modern 500-mm isoline of annual precipitation, which marks the boundary between humid–semi-humid and arid–semi-arid areas24 (eastern China versus western China, Fig. 2). Both MDT and null-model analyses indicate that eastern China has older lineages (red grid cells, Fig. 1a, j), particularly in central to southern China. By contrast, western China, and especially the Qinghai–Tibetan Plateau, contains taxa that have diverged more recently (blue grid cells, Fig. 1a, j). Furthermore, our genus-level analyses demonstrate that eastern China is phylogenetically overdispersed with higher phylogenetic diversity, and that western China shows phylogenetic clustering with lower phylogenetic diversity (Extended Data Fig. 8). These findings are also observed in analyses of phylogenetic diversity based on multiple species-level trees, in which taxa that lacked target DNA sequences were provided with meaningful branch lengths using a birth–death clock model (see Methods; Extended Data Fig. 9). The flora of the Cape of South Africa likewise shows phylogenetic structure—the western region is phylogenetically clustered, and the eastern region is overdispersed10. However, taxon richness is decoupled from phylogenetic diversity in the Cape of South Africa; in China, taxon richness and phylogenetic diversity are positively correlated.
Western China includes the arid north-western portion of the country and most of the Qinghai–Tibetan Plateau (Fig. 2). A fundamental climate shift may have occurred in western China as recently as the early Miocene, owing to the uplift of the Qinghai–Tibetan Plateau and subsequent development of the Asian monsoon24,25. Of the 111 genera that occur only in western China, 76% originated in the past 20 million years and only 24% originated before this time. In western China, a much higher percentage of herbaceous than woody genera has originated since 30 Mya (Fig. 3a). Moreover, genera that occur only in western China are predominantly members of only a few clades (Apiales, Asterales and Brassicales), most of which have much younger divergence times than the major clades of eastern China (Fig. 2; Extended Data Table 1). MDTs calculated from the youngest 25% of herbaceous genera in each grid cell also indicate that western China—in particular the Qinghai–Tibetan Plateau—has younger lineages (Fig. 1f) than eastern China, which further suggests that western China represents a cradle for herbaceous angiosperms.
Mountainous areas of eastern China have been proposed as refugia for plants that originated in the early Cretaceous or late Jurassic periods26,27 because their geological environment and climate (including orogenic movements, annual temperature and annual precipitation) may have experienced little change since the Cretaceous28. Of the 1,026 genera that occur only in eastern China, 39% originated before 20 Mya and 61% arose more recently than this. Both herbaceous and woody genera diverged at similar rates throughout geological time (Fig. 3a). The 20 major clades with the largest number of genera occurring only in eastern China are distributed throughout the ordinal-level time-tree from early-diverging clades (for example, Alismatales, Asparagales, Magnoliales and Ranunculales) to later-diverging lineages (for example, Asterales, Gentianales and Lamiales) (Fig. 2; Extended Data Table 1). MDTs based on the youngest 25% and oldest 25% of genera in each grid cell reveal that eastern China has old herbaceous lineages (Fig. 1f, i), but has both old and young woody lineages (Fig. 1e, h). Eastern China may have served as a museum for herbaceous genera, but as both a museum and a cradle for woody genera.
The mean annual precipitation (MAP) and mean annual temperature (MAT) have higher explanatory power for the MDTs of the herbaceous genera (Fig. 4c, f) than of the woody genera (Fig. 4b, e). These patterns may reflect the heterogeneity in rates of evolution between herbaceous and woody lineages. Herbaceous plants are well known to have higher substitution rates owing to their shorter generation times, which perhaps allows them to respond more quickly to environmental change through increased genetic divergence and speciation rates23,29.
The spatial divergence and diversity patterns of angiosperms detected here do not precisely reflect the latitudinal gradient in China; MDT and phylogenetic diversity decrease from south-east to north-west (Fig. 2a; Extended Data Fig. 8d, g). Our results show the importance of water and temperature in limiting the dispersal of species from humid and warm regions to drier and colder areas. The effects of topography, with a pronounced altitudinal gradient increasing from east to west, and the monsoon climate in eastern Asia are so extensive that the decreasing temperature and precipitation gradients from south-eastern to north-western China are not consistent with the latitudinal gradient, as might be expected in flatter regions.
On the basis of a species-level phylogenetic tree and distribution data with ‘county’ as the basic unit, we inferred that the species richness and phylogenetic diversity in protected areas cover approximately 88% and 96%, respectively, of the total species richness and phylogenetic diversity in China. For conservation planning, these values may be overestimates that result from the coarse scale of our distributional data, as most nature reserves are smaller in size than Chinese counties. Notably, areas with the top 5% highest phylogenetic diversity and standard effective size of phylogenetic diversity (SES-PD) are mainly located in several provinces of eastern China (Fig. 3b): Guangdong, Guangxi, Guizhou and Hainan for genus-level phylogenetic diversity hotspots, and Yunnan for species-level phylogenetic diversity. These areas are also hotspots for threatened plants in China30. However, in contrast to western China, protected areas in eastern China are fragmented (Fig. 3b), largely as a result of urbanization and administrative division. Our data suggest the need to establish more connections between existing nature reserves and national parks that span provincial borders to conserve plant lineages of ancient and recent origins in eastern China, as well as the other organisms that depend on these floristic elements. These findings should be of broad interest to evolutionary and conservation biologists, and serve to stimulate better-informed conservation planning and research.
Sequences of four plastid genes (atpB, matK, ndhF and rbcL) and one mitochondrial gene (matR) were used to reconstruct the phylogeny of Chinese vascular plants31. Generic circumscriptions were based on ref. 15. We sampled one species for the 1,173 genera with only one species in China. For the 1,736 genera with 2–30 species in China, two species were sampled from each genus. For the 267 genera with more than 30 species in China, approximately 10% of the species of each genus were sampled, reflecting intrageneric diversity. We downloaded all available sequences for the target DNA regions from GenBank; if more than one sequence was available for the same locus for a species, the longest one of good quality was selected. For genera with sequences that were unavailable in the public database (781 genera in total), we generated new sequences from leaf materials, collected from the field for 513 genera and from specimens from the Chinese National Herbarium (PE) for 47 genera. There are 231 genera that remain unavailable because we failed to obtain the materials or amplify the target sequences. Details of DNA extraction, PCR, sequencing, alignment, accession numbers of sequences and phylogeny reconstruction have previously been published31.
Divergence time estimation
We used the penalized likelihood method as implemented in treePL32 (https://github.com/blackrim/treePL) to date divergence times of Chinese angiosperms based on the optimal maximum likelihood phylogram obtained with RAxML 8.0.2233 in the CIPRES Science Gateway34, after excluding the outgroups (for example, lycophytes, ferns and gymnosperms). Our dated phylogeny included 5,864 species native to China, representing 2,665 genera from 273 families or approximately 92% of the angiosperm genera of China. We validated the available fossils and selected 138 calibrations for dating analyses (Supplementary Table 1 in Supplementary Information). The ‘prime’ option was applied to identify the best optimization parameters, and a ‘thorough’ analysis was then carried out with the optimal parameters determined above (opt = 1, optad = 1 and optcvad = 4). To identify the best smoothing parameter that affects the penalty for rate variation over the phylogram, a ‘random subsample and replicate cross-validation’ was conducted with treePL. Confidence intervals for each node were calculated following previously published methods22. To accommodate for variation in branch length estimates, we calculated 100 bootstrap replicates with topology fixed to the above maximum likelihood phylogram but with varying branch lengths. We then conducted treePL on these 100 replicates. Age statistics for all nodes were summarized with TreeAnnotator v.1.8.435.
We also used an alternative dating method, PATHd836, to estimate divergence times of Chinese angiosperms. The calibrations for the PATHd8 analysis were identical to those used for the treePL analysis, except that the crown age of angiosperms was set to 138 Mya instead of a maximum of 140 Mya and a minimum of 136 Mya (as in treePL) because PATHd8 requires one fixed calibration. Both treePL and PATHd8 are rate-smoothing methods, but PATHd8 sequentially takes averages over path lengths from an internode to all its descending terminals, one pair of sister groups at a time37, where smoothing is done stepwise for each node separately; by contrast, smoothing in treePL is done simultaneously over the tree. The correlation between ages at all nodes based on the treePL and PATHd8 analyses was assessed with Spearman’s rank correlation analysis in R v.3.2.038.
To evaluate whether dates for this regional time-tree are biased owing to the geographic sampling, we used a correlation analysis to compare our estimated divergence times with recent global-scale angiosperm time-tree reconstructions22,23; one of these represents a family-level time-tree with multiple fossil calibration points22, and the other is a species-level time-tree with dense taxon sampling (32,223 species) and fewer calibrations23. The stem age of each family was extracted for the Spearman’s rank correlation analyses. Only the family ages were compared (circumscription of families, following ref. 20), because different genera and species were included in the three studies. Ages of genera were extracted from our dichotomous time-tree estimated by treePL for the downstream analyses. For monophyletic genera, stem ages were extracted directly by tracing their stem node. For genera that are polyphyletic or paraphyletic (380 out of 2,665), the stem age of each monophyletic lineage was extracted and the oldest one was selected as the age of the genus. The numbers of angiosperm genera that originated during specified geological timespans are provided in Extended Data Fig. 3, with the global temperature changes since 65 Mya39 indicated.
Distribution of angiosperm species in China
The spatial distribution data and information on growth form were assembled from nearly all published national and provincial floras, as well as some local floras, checklists and herbarium records. The spatial distribution data are at the county level (2,377 counties) with an average county-size of approximately 4,000 km2. To minimize the sampling bias of unequal sampling areas, we divided the map of China into 100-km × 100-km grid cells, and grid cells on the border that cover less than 50% of the area of a grid cell (that is, 5,000 km2) were excluded from the analyses. Maps of China used in this study were adapted from standard maps released by the National Administration of Surveying, Mapping and Geoinformation of China (http://www.sbsm.gov.cn; review drawing number: GS(2016)1576). The gridded distribution database contained 1,409,239 occurrence records for 26,978 angiosperm species from 2,845 genera. After matching with the phylogeny, the final dataset included a total of 2,592 angiosperm genera (woody genera, n = 925; herbaceous genera, n = 1,501; genera with both woody and herbaceous species, n = 166).
Spatial distribution of MDTs and null-model test for divergence hotspots
To explore the spatial divergence patterns of Chinese angiosperm genera, we calculated the weighted MDTs of all genera in each grid cell by integrating spatial distribution data with our dated phylogenetic tree. AGEi represented the age of a genus i (i = 1, …, n) in a grid cell, and Si the species number in genus i in this grid cell. From this, MDT was calculated as:We further divided the genus dates in each grid cell into quartiles and calculated MDTs on the basis of the youngest and oldest quartiles, separately, in each grid cell. The MDTs based on the youngest quartile allowed us to recognize centres of recent divergence, whereas MDTs based on the oldest quartile detected ancient centres of divergence. To avoid potential bias from grid cells that had either relatively old or young genera, we ranked all genera from youngest to oldest, partitioned them into quartiles based on their ages, computed MDT in each cell for the absolute youngest 25% and the absolute oldest 25% of genera, and then mapped the results across China.
We designed a null model to identify ancient and recent divergence hotspots for the angiosperm flora of China. The mean ages of the youngest and oldest quartiles in each grid cell were selected as the observed values for the null models, and then we shifted the genera randomly using all genera investigated in China as a sampling pool to obtain the null distributions of ages for the youngest and oldest quartiles for each grid cell. The standardized effect size of the mean divergence time (SES-MDT) of genera for each grid cell was calculated as:where MDTobserved is the observed MDT; MDTrandom is the expected MDT of the randomized assemblages (n = 999); and s.d.(MDTrandom) is the s.d. of the MDT for the randomized assemblages. Grid cells with values of SES-MDT for the youngest quartile below −1.96 were identified as notable hotspots of recent divergence, whereas grid cells with SES-MDT for the oldest quartile above 1.96 were identified as notable hotspots of ancient divergence. Considering that the evolutionary history of herbaceous and woody plants may differ40, the above analyses were also conducted separately for herbaceous and woody genera. Analyses of MDT were implemented in R and ArcGIS 10.1 (http://www.esri.com/).
Previous studies have demonstrated that the overall species richness patterns of birds are largely determined by the geographically wide-ranging species41,42,43, indicating that patterns may be driven by a subset of taxa and may not be representative of an entire biota. To explore whether MDT patterns for China are influenced largely by values for widespread species, we ranked genera from the most restricted to most widespread in China, partitioned the genera into quartiles on the basis of their range size and mapped MDT for each quartile following a previously published description41.
Spatial distribution of median divergence times
Age variation within grid cells was evaluated by plotting divergence times in each grid cell (Extended Data Fig. 4) and calculating the skewness and kurtosis of divergence times (Extended Data Fig. 5). To verify the results of MDT, we also investigated the distribution patterns of the Chinese angiosperm genera by mapping the median divergence times (medianDT) based on all genera, and the youngest and oldest quartiles in each grid cell. The null model for the median divergence time applied a modified effective-size statistic44,45,46 and was calculated as:where medianDTobserved is the observed median divergence time; medianDTrandom is the expected median divergence time of the randomized assemblages (n = 999); MADrandom is the median absolute deviation of the divergence times for the randomized assemblages; and meanADrandom is the mean absolute deviation of the divergence times for the randomized assemblages.
Richness and phylogenetic diversity
We calculated the generic richness, Faith’s phylogenetic diversity47 and SES-PD of the Chinese angiosperm genera on the basis of our ultrametric chronogram using the ‘picante’ package in R. Faith’s phylogenetic diversity is the sum of all phylogenetic branch lengths that connect species in a community. We calculated phylogenetic diversity as the length of the subtree that joins the genera in each grid cell to the root of the chronogram. SES-PD was calculated because phylogenetic diversity is usually positively correlated with species richness48. We first obtained a null distribution of the expected phylogenetic diversity values by shuffling taxa labels across the tips of the tree 999 times for each grid cell. SES-PD was then calculated by dividing the difference between the observed (PDobserved) and expected phylogenetic diversity (PDrandom) by the s.d. of the null distribution (s.d.(PDrandom)):
The net relatedness index (NRI) and the nearest taxon index (NTI) were calculated to investigate the phylogenetic structure (clustering or overdispersion) of angiosperm genera across China49. NRI is based on the mean phylogenetic distance (MPD), an estimate of the average phylogenetic relatedness between all possible pairs of taxa within a grid cell, and primarily reflects structure at deeper parts of the phylogeny. NTI is based on mean nearest taxon distance (MNTD), an estimate of the mean phylogenetic relatedness between each pair of taxa in a grid cell and its nearest relative in the phylogeny, and reflects shallower parts of the phylogeny. NRI and NTI were calculated as follows:where MPDobserved and MNTDobserved are the observed MPD and MNTD; MPDrandom and MNTDrandom are the averages of the expected MPD and MNTD of the randomized assemblages (n = 999); and s.d.(MPDrandom) and s.d.(MNTDrandom) are the standard deviation of MPDrandom and MNTDrandom for the randomized assemblages. The null distributions of MPD and MNTD were created by randomly selecting the observed number of genera in each grid cell 999 times, with all genera in the phylogeny as a sampling pool. Positive values of NRI and NTI indicate phylogenetic clustering, whereas negative values indicate phylogenetic overdispersion in a grid cell. NRI and NTI for woody and herbaceous genera were calculated separately to compare their phylogenetic structures across China.
Regression analyses between MDT and two climatic variables
To explore the underlying mechanisms of spatial divergence patterns of the Chinese angiosperms, MDT in each grid cell was regressed against the respective mean values of MAP and MAT in each grid cell using the linear regression model in R. The adjusted R2 was used to indicate the explanatory power of each variable, although it is clear that these associations do not necessarily indicate causation of the climatic variables in determining MDT. Climatic data were downloaded from the WorldClim database Version 1.4 (http://www.worldclim.org/) with a spatial resolution of 10 min50.
Species tree reconstruction and conservation implications
With our dated genus-level chronogram as the backbone, a species-level tree including 26,978 Chinese angiosperm species was generated by inserting species that were not sampled in our generic tree within the genera to which they belong using the R package ‘S.PhyloMaker’51. Our species-level tree included approximately 96% of all known angiosperm species native to China; 1,098 aquatic species were not sampled. To mitigate the effect of polytomies on the calculation of phylogenetic diversity, we resolved polytomies in the reconstructed tree with a birth–death clock model52. We constructed constraints based on the tree constructed with molecular data, and unresolved taxa were then placed within the relevant constraints. Node heights for each constraint were fixed on the basis of divergence time estimates. We then conducted a Bayesian analysis using MrBayes v.3.253 with the topological and node height constraints and with the birth–death (speciation and extinction) priors as uniform (0.0, 10.0). Two analyses were run for 2,500,000 generations, sampling every 500 generations, to ensure convergence and mixing; the first 750,000 generations were discarded as burn-in, and 1,000 of the post-burn-in trees were retained for further analyses. The species-level phylogenetic diversity and SES-PD were calculated on the basis of 10 trees randomly selected from the 1,000 trees. The Spearman’s rank correlation was used to assess the consistency of phylogenetic diversity or SES-PD patterns based on different trees. Grid cells with the top 5% highest values of both phylogenetic diversity and SES-PD were identified as hotspots of phylogenetic diversity (Fig. 3b). MDT analyses were not conducted on the species tree as the missing data rendered the variation between replicates uninformative. Once additional molecular information is collected for these species, further analyses can be performed.
Spatial data of protected areas in China were compiled from two sources: (i) a previous publication30 that digitized nature reserves in mainland China, which included 334 national, 857 provincial and 1,431 prefectural or county-level nature reserves (provided by Z.-Y. Tang); and (ii) 92 protected areas in Taiwan, downloaded from the Database of Protected Areas (https://www.protectedplanet.net/; accessed August 2017). Considering that most of the nature reserves were designed according to administrative units, we calculated richness and phylogenetic diversity in the protected areas with ‘county’ as the basic unit rather by than dividing China into grid cells. Each conservation area was intersected with the map of China to produce the protected areas in ArcGIS. Species occurring in these counties are supposed to be protected, but counties with protected areas that covered less than 10% of the area of a county were excluded to reduce sampling bias.
Statistics and reproducibility
No statistical methods were used to predetermine sample size. Spearman’s rank correlation and linear regression analyses were conducted in R. Precise P values are provided to show statistical significance. Null-model tests (999 random replicates) were used to assess the significance of spatial diversity and divergence distributions with −1.96 and 1.96 as significant boundaries.
Example code used to conduct null-model test (written in R) can be found at Dryad: http://datadryad.org/resource/doi:10.5061/dryad.p89m3.
Sequences for phylogenetic analyses have previously been published31 and deposited in GenBank. The dated phylogeny is archived in Dryad: http://datadryad.org/resource/doi:10.5061/dryad.p89m3. The spatial distribution data are available from: http://www.darwintree.cn/resource/spatial_data. All other additional data are available from the corresponding author upon reasonable request.
We thank J.-Y. Fang, D.-Z. Li, K.-P. Ma, S.-Z. Zhang, H. Sun, J.-Q. Liu, Z.-H. Wang, X.-Q. Wang and H.-Z. Kong for help initiating this study. This research was supported by the National Key Basic Research Program of China (2014CB954100), the National Natural Science Foundation of China (31590822), the Chinese Academy of Sciences International Institution Development Program (SAJC201613), the National Natural Science Foundation of China and US National Science Foundation Dimensions Collaboration Project (31461123001), the US National Science Foundation (Open Tree of Life: DEB-1207915, DEB-1208428; ABI DBI-1458466 and DBI-1458640; iDigBio: EF-1115210 and DBI-1547229; US–China Dimensions of Biodiversity: DEB-1442280) and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).