An updated floristic map of the world

Floristic regions reflect the geographic organization of floras and provide essential tools for biological studies. Previous global floristic regions are generally based on floristic endemism, lacking a phylogenetic consideration that captures floristic evolution. Moreover, the contribution of tectonic dynamics and historical and current climate to the division of floristic regions remains unknown. Here, by integrating global distributions and a phylogeny of 12,664 angiosperm genera, we update global floristic regions and explore their temporal changes. Eight floristic realms and 16 nested sub-realms are identified. The previously-defined Holarctic, Neotropical and Australian realms are recognized, but Paleotropical, Antarctic and Cape realms are not. Most realms have formed since Paleogene. Geographic isolation induced by plate tectonics dominates the formation of floristic realms, while current/historical climate has little contribution. Our study demonstrates the necessity of integrating distributions and phylogenies in regionalizing floristic realms and the interplay of macroevolutionary and paleogeographic processes in shaping regional floras.


The phylogeny of global angiosperm genera
We downloaded all the sequence data of seed plants available in GenBank (as of May 19, 2018) for the following genes commonly used in plant phylogenetic studies: 18S ribosomal DNA (18S rDNA), internal transcribed spacer region (ITS, including ITS1, 5.8S ribosomal DNA and ITS2), and 26S ribosomal DNA (26S rDNA) from the nuclear genome; ATPase β-subunit gene (atpB), Maturase K (matK), NADH dehydrogenase F (ndhF) and ribulose bisphosphate carboxylase large chain (rbcL) from the chloroplast genome; and Maturase R (matR) from the mitochondrial genome.This gene sample represents both fast (e.g., ITS) and slowly (e.g., 18S rDNA, rbcL, matR) evolving genes.When multiple sequences were available for a genetic marker of a species, the longest one with the latest publication time was kept.Sequences from hybrids, unidentified taxa and taxa with dubious identification were not included.
Accession number of the sequences that were used in our molecular analyses are available in Supplementary Data 6.
To construct the genus level dataset, we firstly assessed the monophyly of each genus with sequence data using the large species-level phylogeny generated by 1 .For the monophyletic genera, one representative sequence per genetic marker per genus was used as the placeholder of the genus.A total of 593 genera (4.7%) were identified as non-monophyletic.1) For the non-monophyletic genera caused by very few stochastic intruders from other genera, we removed the intruders' sequences from our database.2) For non-monophyletic genera with several clades, we identified all the monophyletic clades and estimated the number of species included in the tree of 1 for each clade.Then the largest clade of each non-monophyletic genus, which represented the majority of the genus, was used to represent the genus.3) For polyphyletic genera, we only selected species from the core monophyletic clades of each genus.These procedures ensured that we only combined sequences from species that belonged to the same monophyletic lineage.The final matrix included molecular data for 12,539 seed plant genera.
Because the alignment of some gene regions (particularly the ITS1 and ITS2) between very divergent groups is difficult and may lead to unwanted artefacts, we adopted an alignment strategy with the following steps.1) The sequences of each plant order were placed in a separate matrix and were aligned separately using L-INS-i strategy in MAFFT v7. 4 with the following parameters: --localpair --maxiterate 1000 -adjustdirectionaccurately.2) The order-level alignments were put together as a single fasta file.For each of these combined fasta files,subMSAtable file with the information on which sequences correspond to individual order-level alignments was created using the makemergetable.rbscript distributed with MAFFT v7. 4. 3) The order-level alignments were then merged and refined cross orders using using the --localpair -merge commands in MAFFT v7. 4.This approach decreases alignment errors and ensures higher consistency with currently accepted taxonomy (i.e.APG IV).
For the further phylogenetic analysis, to ensure better consistency with the higher-level seed plant relationships in APG IV, we constrained the phylogenetic analyses in RAxML v8.0.26 using the APG IV relationships among angiosperm orders and among eudicots, monocots and magnoliids.Similarly, recent comparative studies of angiosperms have used family-level topology to constraint for their phylogeny reconstruction 2 .Here we apply the constraints on the topology at the deeper nodes of the tree (i.e., at order level), as these are more likely to present problems given the character sample for this study (for example, some of these relationships have been elucidated using morphological data that is not used in our analyses).Phylogenetic analysis were partitioned by RAxML v8.0.26 3 and the GTRGAMMA model.The tree was dated with the program treePL v1.0 4 .The fossil calibration points used here followed 2 .Because the crown age of Angiosperms is still debated 5 , and different studies have provided estimates that vary from a maximum of about 280 Ma 6 to a minimum of 130 Ma 7 , we conducted three different dating analyses with different constraints on the age of the Angiosperm crown as the following: 1) min = 149 Ma, max = 256 Ma based on 8 ; 2) min = 140 Ma, max = 210 Ma based on 9 ; and 3) min = 140, max = 150 Ma as suggested by 10 .We checked whether families as defined in the APG IV were recovered monophyletic by our molecular phylogeny reconstructed.In the 423 families included in our dataset with molecular data, 404 (95.5%) families were recovered as monophyletic in our phylogeny.Only 19 (4.5%) families were found to be nonmonophyletic and most of these cases were due to the position of a few (often a single) genera, while core species of these families reman monophyletic (see Supplementary Data 4 for the list of non-monophyletic families).Families that were not recovered monophyletic are also families in which generic composition has been contentious (e.g.Capparaceae 11 ).Therefore, our phylogeny offers a good global overview of the Angiosperm phylogeny at the genus level.
To explore the potential influence of the fast-evolving genes (particularly ITS1 and ITS2) on phylogenetic topology, we reconstructed the phylogeny using sequence data without ITS conducted in RAxML v8.0.26 and dated it in the same method as described in the previous paragraph.Then we compared it with the phylogeny based on the full sequence dataset.We found that only 9.4% genera (1,176 genera of the total 12,456 of angiosperm genera with molecular data) were dropped when removing ITS from the reconstruction (see Supplementary Fig. 18a for the proportion of dropped genera in each family).The dropped genera were mainly from a few very large families (Supplementary Fig. 18b, e.g., Asteraceae, for which 267 out of 1454 genera were dropped), while the number of genera in 349 (82.5%) out of 423 families remained unchanged after removing ITS.Most importantly, the removal of ITS did not affect the overall topology of phylogeny.Specifically, we found that the topologies of 404 families (95.5% of all 423 families) remained the same after removing ITS.Some small changes in the topology of a few genera were found in the very large families.Moreover, we found that after removing ITS, the branch lengths of genera and the phylogenetic distances between genera, which is directly related with the estimation on phylogenetic beta diversity, were highly consistent with those estimated from the original phylogeny (Supplementary Fig. 18c, d; for branch lengths of genera, Pearson r = 0.81; for phylogenetic distances between genera, Mantel r = 0.965, p = 0.01).These results suggest that the removal of ITS did not dynamically change the phylogeny and the estimation of phylogenetic beta diversity used for the division of biogeographic regions in the following analysis.
After the molecular was dated, other currently described seed plant genera without sequence data were added to the dated phylogeny as polytomies based on current taxonomy, and then were resolved using the polytomy resolver following 12 conducted with BEAST v1.8.0 13 .For the BEAST analyses we used a birth-death model with uniform priors for the mean growth rate (λ − μ) and the relative death rate (μ/λ) parameters.The model was run for 11 million generations and the posteriors were sampled every 1000 generations.After discarding the first 10% of generations as burnin, we used Tracer v1.6.0 14 to examine the effective sampling sizes (ESS) of all parameters, chain mixing and convergence to a stationary distribution.The post burn-in sample showed convergence to a stationary distribution, good mixing and high ESS for all parameters (ESS > 200).Following 15 , the final ultrametric topologies that included all described angiosperm genera were produced based on the maximum clade credibility tree extracted from the post-burn-in sample.
Here we extracted the angiosperms from this seed plant phylogeny for further analysis, which contains 14,244 genera in total.We repeated the clustering and ordination analysis using the three phylogenies with different constraints for the crown age of angiosperms, and the results were highly consistent (see Supplementary Fig. 3).
In the main text we presented the results based on the phylogeny with a constraint of 140-210 Ma for the crown age of angiosperms as recent family-level phylogenies of angiosperms suggest that the crown age of angiosperms is ca.190-210 Ma 16,17 and those based on phylogenies with alternative dating schemes as supplementary materials.
It is noteworthy that all of the results based on the phylogenies with different scenarios of angiosperm crown age are highly consistent.
The phylogenies used in our study represent the most recent and most complete angiosperms genus-level phylogeny.Currently there are three published large angiosperms phylogenies 1,2,18 .However, the coverage of angiosperm genera in these phylogenies (i.e.8180 2 , 10,155 1 , and 8399 genera 18 ) is significantly lower than that of the phylogenies used in this paper which include molecular data for 12,539 genera.
The divergence time estimation of these phylogenies also differs.Although these phylogenies are based on a largely overlapping set of fossil calibrations and the same dating algorithm 1,2,18 , all except the phylogeny used here are based on a single dating scheme which does not allow to test the potential effect of alternative hypotheses for the crown age of angiosperms on our findings.In contrast, the phylogenies used here have been dated with three different constraints for the crown age of angiosperms.
Finally, although unsequenced taxa are included in both 1 and the phylogenies used here based on current taxonomy, these unsequenced taxa have been kept as unresolved polytomies in 1 , which makes it difficult to test the potential influences of the relatedness between these unresolved taxa on our findings.

Global distributions of angiosperm genera
The geographic standard units (hereafter GSUs) used for the compilation of species distribution data were generated following the Global Administrative Areas (GADM) database version 1 (https://gadm.org/data.html)and the World Geographic Scheme for Recording Plant Distributions (WGSRPD) as the base maps 19 .To reduce the effects of data deficiency and area on the estimation of genus richness, we merged small adjacent regions of WGSRPD and GADM into larger ones and splitted the large units of WGSRPD to small ones based on GADM.The final size of each GSU is ca. 4 o longitude × 4 o latitude (366,179.4± 20,245.85km 2 ).Small islands (<25,000 km 2 ) and the Antarctica were removed.Finally, the earth's landmass was divided into 420 GSUs (see Supplementary Data 2 for the detailed information of GSU).This scheme of geographic standard units has been used for the compilation of species distribution data in several previous studies 19,20 Distributional records for angiosperms were compiled at the species level from over 1,100 data sources, including published regional and local floras, floristic investigations, specimen records and online databases (see Supplementary Data 1 for the full list of data sources).We classified the raw distributional data into four types: coordinates, range maps, gridded distributions, and recorded localities.Depending on the types of the raw data, we applied different methods to reduce spatial conflicts between the original records and the boundaries of the GSUs used in our study and to improve the accuracy of species distributions in the final dataset.For species distributions recorded as coordinates (e.g.GBIF), we mapped the coordinates to GSUs using the 'inpolygon' function in MATLAB v2017a.For species distributions recorded as range maps, we digitized and georeferenced these boundaries in ArcGIS 10.0 and then intersected them with our GSUs.For species distributions recorded as grid cells, we overlapped these grid cells with GSUs used in our study.The record of a grid cell was kept only when the intersected area of the grid cell was larger than half of its size.
For the distribution data recorded as locality names, we searched them in the global geographic names service (http://www.geonames.org/)and then intersected the recorded boundary with the boundaries of GSUs in our study.When a recorded geographic unit intersected with more than one GSU, we assigned the record to the GSU that covered over 80% area of the recorded unit.
To improve the quality of species distribution data, we set a threshold for the number of data sources to keep an occurrence record of a species in a given GSU.For geographical units in Europe, Australia, China, South Africa, Madagascar and North America, an occurrence record of species in a geographical unit corroborated by at least 3 data sources was retained, leading to high confidence of the data quality in these regions.For the geographical units in Central America, Greenland, Amazon and Turkey, an occurrence record of species in a geographical unit corroborated by at least 2 data sources was retained, leading to medium confidence of the data quality in these regions.
The entire data was retained for India, North and Central Africa, and Patagonia because of data deficiency in these regions, leading to relatively low confidence of the data quality in these regions.December, 2022).We kept accepted names with the highest confidence level.
Taxonomic names that were identified as 'unresolved' in both COL and POWO were removed.Synonyms and misspelt names are replaced with the corrected/accepted names.The misspelt taxonomic names were corrected using the Taxonomic Name Resolution Service 4.0 (TNRS, https://tnrs.biendata.org/),which has been widely used in plant studies.
During data compilation, we also collected the status of species (i.e.being native, cultivated, introduced, invasive and hybrid) from regional data sources as much as we Finally, we compiled the distributional data for each genus by aggregating distribution data of all its species.The distribution maps of all genera were carefully verified manually by the members in the institutes of the last authors of this paper to improve the quality of the distribution data.We then integrated the phylogeny and distribution data and the final distributional database contains 384,771 records for 12,664 angiosperm genera (see Supplementary Data 2), representing 90.63% of the total 13,974 accepted genera in POWO 21 .

Taxonomic and phylogenetic beta diversity
Simpson beta diversity 22 modified by 23 was used to represent the dissimilarity in species assemblages among GSUs (Equation 1).
where a represents species shared between two geographic standard units, b and c represent species unique to each unit.We used this metric because it is not affected by the hierarchical data structure and provides unbiased estimation of compositional turnover across space 24,25 .Pairwise matrices of taxonomic beta diversity (calculated using the number of shared (a) and unique species (b, c) between two geographic standard units) and phylogenetic beta diversity (calculated based on the length of shared (a) and unique branches (b, c) of the overall phylogeny in each pair of geographic standard units) were generated for further clustering analysis using the above equation.

Floristic domains and sub-domains identified by hierarchical clustering
Hierarchical clustering analyses were conducted to group GSUs into floristic realms and sub-realms.With this method, the GSUs with the highest similarity (i.e.lowest distance) were first grouped together and then the most similar groups were grouped into clusters.This process was repeated until all GSUs were all grouped into a single cluster.Then a dendrogram was generated by the clustering analysis, which represented the hierarchical relationships among floristic realms and sub-realms.The hierarchical clustering analyses were conducted using the "hclust" function in stats (version 3.6.2) package in R 3.6.1 26 and for the matrices of taxonomic and phylogenetic beta diversity between the GSUs, separately.The whole procedure can be divided into two main parts as follows.

Part 1 Choosing a clustering algorithm
In the hierarchical analysis, several methods are widely used to estimate the distances between clusters.Thus, we need to choose the best algorithm prior to do the formal clustering analyses.Here, we compared seven clustering algorithms provided by the "hclust" function, "single", "complete", "median", "mcquitty", "average", "centroid" and "ward.D2", respectively.We chose the best one by comparing the performance and accuracy of these seven methods using the following approach proposed by Holt et al. 25 .
(i) Performance evaluation.This evaluation aims to choose the clustering algorithm that can best represent the floristic divergences with the lowest number of clusters.First, using each of the seven clustering algorithms, we grouped the GSUs into different numbers of clusters and calculated the between-cluster beta diversity.Second, we calculated the proportion of beta diversity explained by identified clusters (Pbeta) as the sum of between-cluster beta diversity divided by the sum of the whole beta diversity matrix following 25 .Third, we demonstrated the variations in Pbeta with the increase in the number of clusters for each algorithm.In general, with the increase in the number of clusters, the Pbeta increases.Following 25 , the best performing algorithm was identified as the one returning the minimum number of clusters when the Pbeta reached 99%.
(ii) Accuracy evaluation: This evaluation aims to choose the clustering algorithm that can represent the floristic divergences with the lowest biases.To do this, the cophenetic correlation coefficients were calculated for each clustering algorithm using the "cophenetic" function in stats (version 3.6.2) package 27 in R 3.6.1 26 .We firstly obtained the co-phenetic distance matrix by calculating the branch length distances separating each pair of tips (i.e.GSUs) on the dendrogram tree.Then the co-phenetic correlation coefficients were estimated as the Pearson product-moment correlation coefficient between the co-phenetic distance matrix and the beta diversity matrix between all GSUs.The clustering algorithm with the highest accuracy has the highest co-phenetic coefficients.
The above evaluations indicated that the "average" algorithm outperformed all tested alternatives (Supplementary Fig. 14).Therefore, the "average" method, also known as the Unweighted Pair Group Means Algorithm (UPGMA), was finally chosen for defining floristic realms and sub-realms of the global angiosperms.The geographic sizes of the identified floristic realms and sub-realms are similar to those of the "Kingdoms" and "Regions" in Takhtajian's floristic map 29 (see Supplementary Table 1 for the detailed comparison).It should be noted that the methods used here for defining floristic clusters are different from Takhtajian 29 .
Takhtajian's floristic "Kingdoms" and "Regions" are defined by endemic taxa that represent the distinctiveness of different floras 29 .Specifically, floristic "Kingdoms" are usually characterized by endemic families, subfamilies, and tribes; and floristic "Regions" are characterized by endemic genera (in few cases by endemic families).
Thus, we used different terminology i.e., floristic realms and sub-realms.

Sensitivity analyses for the uncertainty in identifying floristic realms
The GSUs on boundaries between floristic realms may contain mixed floristic components, which may lead to less "hard" boundaries 30,31 .Meanwhile, floristic regionalization may also be influenced by 1) phylogenetic uncertainty and 2) incomplete sampling across lineages 32 .Thus, we conducted the following analysis to evaluate the uncertainties.
(i) Regions with uncertainty in floristic realm division.Following 25 , we conducted the silhouette analysis to evaluate the performance of clustering.Specifically, the silhouette widths for each GSU was calculated as following: where si is the silhouette width of a GSU i; ai is the average dissimilarity between i and all other GSUs of the identified floristic realm to which i belongs; bi is the average dissimilarity between i and all GSUs of the nearest realm to which i does not belong.In general, higher positive silhouette widths indicate better classification of a GSU into a floristic realm 33 .In contrast, GSUs with negative silhouette widths are identified as uncertain regions that may contain mixed floristic elements 33 .As a result, most GSUs have positive silhouette widths, indicating that they are assigned to a floristic realm appropriately (Supplementary Fig. 2).Only a few regions near the boundaries between the Saharo-Arabian and Holarctic realms, between the Indian-Malaysian and Holarctic realms, and between the Neotropical and Chile-Patagonian realms have negative silhouette widths.
Thirdly, we assessed the robustness of our results to the choice of analytical approach by redefining the floristic realms using fuzzy c-means clustering 34 .Unlike the above-mentioned UPGMA method which assigns GSUs exclusively to one cluster, fuzzy c-means method estimates the likelihood of each GSU belonging to a certain cluster, under a given degree of fuzziness 35 .We conducted fuzzy c-means using "funny" function in the "cluster" package (version 2.1.4) 36in R 3.6.1 26 .The number of clusters was set to 8 to match the number of realms in the UPGMA analysis.We repeated silhouette analysis to evaluate the clustering performance (see Supplementary Results and Discussion for the detailed comparison with UPGMA clustering results).
(ii) Phylogenetic uncertainty.As mentioned above, we repeated the clustering and ordination analysis using the tree dated with three different constraints for the crown age of angiosperms.Furthermore, to evaluate the potential impact of phylogenetic uncertainty in the placement of genera which positions were resolved using polytomy resolver 12 and BEAST v1.8.0, we repeated our regionalization analyses using: 1) the single maximum credibility tree, 2) the phylogenetic trees only containing taxa with molecular data, and 3) five randomly sampled post burn-in posterior trees from the polytomy resolver.The results based on the phylogenies with different scenarios of angiosperm crown age, the molecular phylogenies and post burn-in posterior phylogenies are all highly consistent (see Supplementary Fig. 3-8).
(iii) Incomplete sampling.A recent species-level floristic regionalization conducted by Carta et al. ( 2022) was based on a fraction of vascular plants 32 .In order to evaluate the effect of incomplete sampling, we extracted the angiosperm genera included in Carta et al. (2022) (9,905 angiosperm genera) and then we conducted the same genus-level regionalization as reported in our study.We found that incomplete sampling did not change our floristic maps (Figure S9, also see Supplementary Results and Discussion).

Changes in floristic realms through time
To explore the phylogenetic depth at which the spatial divergence of floristic assemblages appears, we conducted the following steps.then collapsed all the descendent leaves of each branch encountered at each time, respectively.Second, the current geographic extension of an ancestral branch at a given time is represented by merging the geographic distributions of all its descendent leaves.
Third, we calculated the phylogenetic beta diversity between different GSUs at different geological times following the methods of 37,38 .Fourth, we applied the UPGMA clustering method to generate floristic realms at each geological time.The floristic realms were identified by cutting the UPGMA dendrograms to return the minimum number of clusters require to get Pbeta = 80%.Lastly, we used the NMDS ordinations to investigate the temporal changes in the dissimilarity of phylogenetic compositions between different realms following the method of 38,39 .Specifically, we illustrated the changes in the dissimilarity of floristic realms by overlaying the coordinates of the NMDS ordinations conducted for different geological times.This approach exploring the phylogenetic depth of the emergence of floristic realms have also been applied in previous studies [37][38][39][40][41] .Note that we do not intend to estimate the ancestral geographic ranges of phylogenetic branches or the ancestral floristic assemblages.

Effects of contemporary climate on floristic realm divisions
If the contemporary climate dominates the division between the identified floristic realms, the explanatory power of contemporary climate differences on the beta diversity between realms should be higher than those on the beta diversity within realms.The climate difference of a given pair of geographic standard units was defined as the Euclidean distance between them in a two-dimensional climatic space: where i and j represent two given geographic standard units, and MAT and MAP represent annual mean temperature and annual mean precipitation, respectively.The values of climate are standardized prior to analysis.
To evaluate whether contemporary climate dominates the division between the identified floristic realms, we compared the effect of contemporary climate differences on the beta diversity between and within realms using Ordinary Least Square (OLS) regressions with phylogenetic beta diversity as the response variable and the corresponding contemporary climate differences as the predictor.Specifically, for each sister pair of floristic realms (or cluster of realms) (i.e. two adjacent realms or cluster of realms in the UPGMA clustering dendrogram), we built the following three OLS regression models: PBD in,jn ~ Climate diff in,jn Equation ( 5) where m and n respectively represent the two clusters of realms, im represents the ith geographic standard unit located in the realm (or cluster of realms) m, jn represents the jth geographic standard unit located in the realm (or cluster of realms) n, and PBD represents the phylogenetic beta diversity between the geographic standard unit i and the geographic standard unit j.
Equations 4-6 evaluated the explanatory power of contemporary climate differences on the beta diversity within a single realm (or cluster of realms) m or n and those on the beta diversity between the two realms (or clusters of realms), respectively.
The Novozealandic realm contains only two basic geo-units and thus, we could not calculate R 2 for it.

Effect of historical climate and geographic isolation on floristic realms division
Plate tectonic dynamics could lead to changes in both geographic distance and dispersal barriers (e.g., ocean and mountains) among areas.These changes could subsequently lead to geographic isolations among floras, causing regional differences in evolutionary histories and result in distinct phylogenetic assemblages of plants.By combining all the paleo-digital elevation models obtained from Scotese's paleoatlas 42 , we produced detailed paleogeographic maps with a spatial resolution of 1 × 1 degree from 80 Ma up to the present in 1 Ma steps following 43 .The changes in the paleolocations and mean elevation of each 1-degree grid cell through time were estimated by combining the information on the dynamics of geological events involving sea floor spreading, continental rifting, continental collisions together with other indicators of paleotopography and bathymetry 42 .
Then, at every geological time interval of 1 Ma, we calculated the geographic isolation between each two geographic standard units following 44 .Specifically, we first estimated the coordinates of the center of each geographic standard units and identified their central grid cells at the present and in the geological history using the open source software GPlates v2.1 (http://www.gplates.org; 45).The geographic isolation between two geographic standard units at each geological time was then estimated as the geographic isolation between their central grid cells, which was calculated using the paleogeographic maps obtained from Scotese's paleoatlas 42 and the 'cosDistance' function 46 in the 'gdistance' R package (version 1.6) 47 .This function assigned different conductance to different grid cells and then identified the route with minimum total route cost between a given pair of grid cells.The conductance of a grid cell was 0 in the ocean and was inversely proportional to its elevation on land.The cost value of a grid cell was calculated as the reciprocal of the mean conductance of all of its adjacent cells, where 0 was not included.The neighboring cells of a focal grid cell included those in 16 directions, including the four orthogonal and four diagonal nearest neighbors (the resulting graph is called the 'king's graph', because it reflects all the legal movements of the king in chess), and the eight neighbors that could be connected by 'knight's moves' in chess.For comparison, we also calculated the conductance without considering differences in elevation for the land, and it returns similar patterns of geographic distances; and we also calculated the cost value using alternative concepts of neighbors provided in the 'cosDistance' function, including the four orthogonal neighboring cells of a focal grid (4 neighbors); the four orthogonal and four diagonal nearest neighbors (8 neighbors, i.e., the 'king's graph' as mentioned above).And all alternative calculations yield similar patterns of distances (Person r > 0.83).
Climate can influence species distributions and phylogenetic compositions of a region through filtering effects and its effects on evolutionary history.Hence larger climatic differences between regions may lead to higher turnover in species composition and hence higher beta diversity.Paleoclimate from 150 Ma up to the present was reconstructed by 43 from the geographic distribution of lithological indicators of climate at a temporal resolution of 1 Ma and a spatial resolution of 1 × 1 degree.These indicators included coal, evaporite, bauxite, tillite, glendonite dripstones and the fossil records of important taxa such as high latitude occurrences of palms, mangroves, and alligators 48,49 .The climate distance of a given pair of geographic standard units at a certain time interval was defined as the Euclidean distance between them in a two-dimensional climatic space (Equation 3).The MAT and MAP of a geographic standard unit at each geological time were estimated as the average value of all 1 × 1 degree grid cells within it.
To compare the relative effects of geographic isolation and climate distances between geographic standard units on the division of floristic realms, we calculated the Pearson correlation of geographic isolation and climate differences for each pair of floristic realms from 80 Ma to the present, and then conducted the hierarchical partitioning analysis using the following multiple regression model (Equation 7) with Gaussian residuals at each geological time separately: ln (PBD im,jn ) ~ ln (Climate diff im,jn,t ) + ln (Geographic iso im,jn,t ) Equation ( 7) where m and n respectively represent a sister pair of floristic realms (or cluster of realms) , im represents the ith geographic standard unit located in realm (or cluster of realms) m, jn represents the jth geographic standard unit located in realm (or cluster of realms) n, t represents the given geological time, and PBD represents the phylogenetic beta diversity between the geographic standard unit i of realm (or cluster of realms) m and the geographic standard unit j of realm (or cluster of realms) n.Then the relative R 2 of geographic isolation and climate distances were estimated at each geological time, and its temporal trend was plotted to demonstrate the changes in the drivers of the divisions between floristic realms.Both geographic isolation and climate distances were standardized prior to conducting regression models.

Contribution of different clades on floristic realms division
The divisions between floristic realms are shaped by clades whose descendant lineages show little overlap in their geographic distributions.The contribution of a clade to the division between two realms was evaluated using node-based analysis developed by 41 .The node-based analysis calculates a 'geographic node divergence' (GND) score for each node in a phylogeny, which measures the degree of mismatch in the geographic distribution of the two sister lineages diverging at a given node, i.e., descendants of one sister lineage did not persist in the geographic range of the other 41  We extracted subtrees from the global angiosperm phylogenies for realm divisions.
Each subtree contains all genera that are distributed in a given pair of realms.Then we used node-based analysis to detect the clades with GND scores over 0.65 and extracted the SOS scores of all the geographic standard units occupied by these clades.The contribution of each of these clades on the division of the two floristic realms was measured by the R 2 of the one-way analysis of variance (ANOVA) with SOS scores as the response variable, and the two realms as the predictor.Then we showed how the R 2 of different clades varied with their stem ages to evaluate the temporal changes in the contribution of different clades on realm divisions.To explore the major taxa driving realm divisions, we extracted the clades with the highest R 2 (90 quantile) for each pair of realms.The node-based analysis was conducted in R 3.6.1 26 using the "nodiv" package (version 1.4.0) 41.

Comparison with floristic regionalization based on taxonomic beta diversity
Floristic realms and their relatedness based on taxonomic beta diversity are consistent with phylogenetic-based floristic realms with three main exceptions (Supplementary Fig. 10).First, subtropical East Asia was grouped into the Indo-Malesian realm when taxonomic beta diversity was used, but was grouped into the Holarctic realm when phylogenetic beta diversity was used.Second, Mexico was grouped into the Chile-Patagonian realm when taxonomic beta diversity was used but was grouped into the Neotropical realm when phylogenetic beta diversity was used.
Compared with the regionalization based on phylogenetic beta diversity, the regionalization based on taxonomic beta diversity may be more strongly influenced by contemporary climate and geographical isolation.For example, East Asia has similar climate with the Indo-Malesian realm, while its flora is more similar to that of the Holarctic realm [50][51][52] .Similarly, Mexico shares a similar dry climate with the Chile-Patagonian realm 53 , while its flora is more similar to the North American flora 54 .These inconsistencies in climatic and floristic similarities may explain why taxonomic and phylogenetic beta diversity identify them as different realms.We further find that contemporary climate has a higher explanatory power on the taxonomic-based division than on the phylogenetic-based division of these realms (Supplementary Fig. 16).The relationship among (Australian, Chile-Patagonian), (African, Indo-Malesian) and (Neotropical, Chile-Patagonian) realms may mainly reflect current geographical isolation when taxonomic beta diversity is used, while their relationship may reflect plate tectonics when phylogenetic beta diversity is used (see Fig. 2 for the historical process of plate tectonics).These results demonstrate that floristic regionalization based on phylogenetic beta diversity is less affected by contemporary climate and better captures the floristic evolutionary history compared to the taxonomic-based regionalization 50 .

Comparison with the species-level floristic regionalization
A recent species-level floristic regionalization based on data of a fraction (ca.20%) of vascular plants divides the world into three realms, i.e. the Holarctic, Holotropical and Austral realms 32 .The Holarctic realm in 32 is similar to the Laurasian super-realm in our study, although the boundaries differ to some extent (Fig. 1).The Holotropical realm in 32  As a result, the number of floristic realms identified by the genus-level regionalization is similar to that in the previous regionalization schemes 29,56 and the boundaries well reflect the spatial divergences and the underlying evolutionary history in world's flora assemblages 54 .The differences in regionalization using different taxonomic levels may indicate that genus assemblages of plants may better capture the phylogenetic turnover across space at large scale 29 .

Robustness of regionalization based on UPGMA clustering
Recent studies suggest that regions along the edges of the identified floristic realms may contain mixed floristic components and might form a "transitional zone" 24,29 .Interestingly, our silhouette analysis for results based on the UPGMA methods shows that most GSUs near the edges of floristic realms are assigned to a realm with low uncertainty (Supplementary Fig. 2).We further identified the potential "transitional zone" which may be ignored in the UPGMA clustering by using fuzzy c-means as an alternative clustering method.We found that the floristic realms generated by the fuzzy c-means method are generally consistent with those generated by the UPGMA clustering, further indicating that the floristic realms are identified with low uncertainty (Supplementary Fig. 17).Even though, two differences were identified between the results based on the UPGMA and fuzzy c-means methods: 1) North America is clustered as a realm by the fuzzy c-means method but as a sub-realm in the Holarctic realm in UPGMA analysis; 2) Chile-Patagonian realm is not distinguished from Neotropical realm in the fuzzy c-means analysis.The fuzzy c-means method generates clusters based on differences among GSUs, while the UPGMA method generates clusters based on the differences between GSU groups and ignores the variations within groups 34,35 .The reasons for the difference between results of these two methods are likely because of a high proportion of Neotropical genera (Supplementary Data 5, also see Supplementary Fig. 19) in these regions (i.e.North America and Chile-Patagonia).
These results indicate that the identification of these realms/sub-realms may have lower confidence compared with the other realms, suggesting that further investigations on the boundaries of floristic regions in the New World are needed in future studies.
Because compared with the UPGMA method, the fuzzy c-means method requires the user to specify a number of initial clusters (k) and the degree of fuzziness when assigning the membership of each GSU 35 .The setup of k and fuzziness is generally ambiguous and lack evaluations on the robustness of the choices.Furthermore, the fuzzy c-means do not generate a dendrogram showing the relationships between identified floristic regions as the UPGMA method does, and hence cannot be used to define floristic regions within each identified realm.This limits the ability of the fuzzy c-means method to evaluate the evolutionary history of the divisions between floristic realms.We therefore included the results based on the UPGMA clustering in the main text and those based the fuzzy c-means clustering method in the supplementary materials for comparison.

Continent of Australia Australian Kingdom
Australian realm

Part 2
Identifying floristic realms and sub-realmsAll the GSUs were clustered into different clusters using UPGMA.The clustering process and the hierarchical relationships among clusters were shown using the UPGMA dendrogram.We cut the UPGMA dendrogram to get the minimum number of clusters required to reach Pbeta = 80%.These clusters of GSUs were then defined as floristic realms.Similarly, the UPGMA dendrogram was cut to get the minimum number of clusters required to reach Pbeta = 95%, and these clusters were then defined as floristic sub-realms.Finally, we identified eight floristic realms and 16 floristic subrealms.To illustrate the relationships between different floristic domains in a twodimension non-hierarchical space, we used the Nonmetric Multidimensional Scaling (NMDS hereafter) with Kruskal's neighbor-joining algorithm based on the beta diversity matrices among all GSUs28 .In NMDS plots, GSUs belonging to the same floristic realm were shown in the same color, and more similar GSUs occurred closer to one another.
First, we cut the phylogenetic trees at different geological times, including late Jurassic (160 Ma), Cretaceous (140 Ma, 120 Ma, 100 Ma and 80 Ma), Paleogene (60 Ma, 50 Ma, 40 Ma and 30 Ma), Neogene and Quaternary Periods (20 Ma, 10 Ma, 15 Ma, 10 Ma, 5 Ma and 0 Ma), and . Values of GND over 0.65 indicate significant distributional divergence between the two sister lineages, thus pinpointing the exact evolutionary splits associated with the division between floristic realms.The GND score of a given node is calculated as a summary of multiple specific overrepresentation scores (SOS), one for each geographic unit occupied by the clade containing two descendant lineages.A SOS score is the z score of the relative overrepresentation by one of the two descendant lineages in the geographic unit, as compared to the expectation generated by a matrix-swapping null model.Values of SOS close to zero thus indicate that tips from both descendant lineages are equally represented in a given geographic unit, and positive (or negative) SOS values indicate predominance of one of the two descendant lineages (see Supplementary Fig. 15 for an example of the result of node-based analysis).
covers the African, Indo-Malesian and Neotropical realms and north Chile-Patagonian realm in our study.The Austral realm in 32 covers the Australian and Novozealandic realms and southern Chile-Patagonian realm.The different floristic groupings between these two studies may be due to: 1) we included more complete taxa than Carta et al. (2022); 2) we used angiosperms at genus level while Carta et al. used vascular plants at species level, including some fern and gymnosperm species in addition to angiosperms.Thus, we further 1) extracted the angiosperm genera included in Carta et al. (2022) (9905 angiosperm genera) and 2) included both angiosperm and gymnosperm genera.Then we conducted the genus-level regionalization as the methods reported in our study.We found that both incomplete sampling and including gymnosperms did not change our floristic maps (FigureS8).These results suggests that the differences between our results and Carta et al. (2022) may reflect the difference in the taxonomic levels used for regionalization.Notably, we identified eight realms using genus-level phylogenetic similarity, in contrasting with species-level regionalization by Carta et al. (2022) which identified 3 realms.In the previous regionalization schemes such as Engler's 55 or Takhtajan's29 , the global-scale floristic classification is usually based upon endemic families and genera rather than endemic species29 .Although the methods for defining floristic clusters are different between our work and the previous regionalization schemes based on taxa endemism, we kept the same taxonomic level when conducting floristic maps.