Is phylogenetic diversity a surrogate for functional diversity across clades and space

In the face of limited funding and widespread threats to biodiversity, conserving the widest possible variety of biological traits (functional diversity, FD) is a reasonable prioritization objective. Because species traits are often similar among closely related species (phylogenetic signal), many researchers have advocated for a phylogenetic gambit: maximizing phylogenetic diversity (PD) should indirectly capture FD. To our knowledge, this gambit has not been subject to a focused empirical test. Here we use data from >15,000 vertebrate species to empirically test it. We delineate >10,000 species pools and test whether prioritizing the most phylogenetically diverse set of species results in more or less FD relative to a random choice. We find that, across species pools, maximizing PD results in an average gain of 18% of FD relative to a random choice, suggesting that PD is a sound conservation prioritization strategy. However, this averaged gain hides important variability: for 10% of the species pools, maximizing PD can capture less FD than an averaged random scheme because of recent trait divergence and/or very strong trait conservatism. In addition, within a species pool, many random sets of species actually yield more FD than the PD-maximized selection, on average 36% of the time per pool. If the traits we used are representative of traits we wish to conserve, our results suggest that conservation initiatives focusing on PD will, on average, capture more FD than a random strategy, but this gain will not systematically yield more FD than random and thus can be considered risky.

question of whether maximizing PD will actually capture more FD than prioritization schemes 72 that ignore phylogeny has, to our knowledge, never been empirically tested [16]. While it may 73 seem obvious that sampling species across the tree of life will capture high amounts of FD, a 74 recent theoretical study demonstrated that PD could be a poor surrogate for FD and, in some 75 scenarios, prioritizing species on the basis of PD could actually lead to capture less FD than if 76 species were simply selected at random [16]. 77 We clarify what our goals are in testing the utility of PD to capture FD. First, we take as given 78 that maximizing PD is not the overarching goal per se of PD-maximization schemes, but rather 79 that a PD maximization strategy is valued for its ability to capture more FD compared to a 80 strategy that ignores phylogeny. Second, asking whether PD maximization captures more FD 81 than a random choice is fundamentally distinct (and a lower bar) from asking whether 82 maximizing PD also maximizes FD [e.g. 15,19-21,23,24]. Finally, it is important to note that we 83 are selecting species sets to maximize PD or FD within a region. While this is a simplification, as 84 conservation actions often aim to select sets of areas (e.g. in reserve design), the only global 85 phylogenetically-informed conservation initiative is species-centered (EDGE; Isaac et al. 2007). 86 More fundamentally, the framework we use here allows us to directly test the fundamental 87 phylogenetic gambit at the heart of all PD-based conservation [16]. Critically, the question we 88 raise has been shown to be distinct from asking whether traits have phylogenetic signal 89 (whether closely related species tend to share similar sets of traits), since PD can be a poor  The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/243923 doi: bioRxiv preprint first posted online Jan. 5, 2018; This points to the need for empirical tests of whether -within a given species pool-sets of 92 species selected to maximize PD actually contain more FD than sets of species selected without 93 regard to evolutionary relatedness. We evaluate the PD~FD relationship for different species 94 pools (taxonomic families and geographical assemblages, i.e., sets of species co-occurring at a 95 given scale) using a large global dataset including trait, phylogenetic, and geographic range data 96 for 4,616 species of mammals, 9,993 species of birds, and 1,5036 species of tropical fish. 97 Specifically, we measure FD as functional richness (see methods) and compute, for any given 98 species pool, an estimate of surrogacy (S PD_FD , [26,27], Figure 1). S PD_FD represents the amount 99 of FD sampled by the set of species chosen to maximize PD, relative to the FD sampled by 100 optimal set of species selected to maximize FD directly, with both components controlled for 101 the expected FD from a random species set of the same size. S PD_FD will be positive if the 102 averaged PD-maximized set contains more FD than the averaged random set, and negative if 103 not. S PD_FD will equal 100% if the PD-maximization strategy is optimal (i.e. to maximize FD). We 104 integrate S PD_FD for each species pool across all deciles of species richness (Eqn. 1) but because 105 they are many sets of species that can maximize PD or than can be chosen at random, we 106 computed S PD_FD based on the averaged FD over 1000 PD-maximized sets and 1000 random sets 107 [16]. 108 We find that selecting the most phylogenetically diverse sets of species within a given 109 taxonomic family or within a given geographical location (large grid-cells across the globe) 110 captures, on average, 18% more FD than that of randomly chosen species (i.e. S PD_FD = 18%, SD 111 +/-6.5% across pools, see Figure 1). Although the surrogacy is generally positive, there was 112 variation across species pools. For example, the surrogacy of PD varies widely from a minimum 113 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/243923 doi: bioRxiv preprint first posted online Jan. 5, 2018; of -85% to a maximum of 92%, meaning that selecting the most phylogenetically diverse sets of 114 taxa can capture either 85% less (or 92% more) FD than that of randomly chosen taxa ( Fig. 2-3   115 and Fig. S1-2). However, in 88% of the species pools, choosing sets of species according to PD 116 captured more FD than would be expected at random (i.e., surrogacy values > 0 in 88% of the 117 cases, see Fig. 2-3). This suggest that, on average, maximizing PD is a sound strategy to capture 118 FD.

119
However, even if in the majority cases maximizing PD does, on average, better than an 120 averaged random selection, this does not capture the reliability of its performance. The PD-121 maximization and the random selection strategies exhibit variation: simply by chance, random 122 selection of species can capture very high (or, conversely, very low) FD, and the same may be 123 true (to a previously unstudied degree) for PD. The extent of this variation is important: if it is 124 less than the average difference, PD-maximization is a reliable strategy as it will always yield 125 more FD, but if it does not, then PD-maximization could be unreliable for individual 126 conservation interventions. To contrast these two situations, we measured the fraction of times 127 that, within each species pool, the PD-maximization strategy yielded more FD than random 128 selection (see methods). PD-based selection was the best choice in 64% of cases (SD across 129 species pool=9%, see Supplementary Table 1 and Fig. S3), making it the better strategy but not 130 a perfectly reliable one. Thus, while the PD-maximization strategy has a consistent positive 131 effect (i.e. the average PD-maximization strategy yields more FD than the average random 132 strategy), its effect is weak (i.e. the PD-maximization strategy still yields less FD than the 133 random strategy in 36% of the trials within a species pool).

134
. CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/243923 doi: bioRxiv preprint first posted online Jan. 5, 2018; We next explored the drivers of surrogacies values across species pools. Surrogacy of PD 135 appears to weaken as the species pool richness increases (on average, Spearman Rho between 136 absolute surrogacies and species richness = -.15), most clearly seen in the tropics and in 137 species-rich families such as the Muridae (rats mice and allies) and Columbidae (pigeons and 138 allies) (Fig. 2-3). This is likely because our measure of FD (see Methods) rapidly saturates as the 139 number of selected species increases and species from these large pools harbor high functional 140 redundancy, such that a random prioritization scheme performs relatively well, or at least no 141 worse than other strategies (Fig. S4). In contrast, FD can be greatly increased by prioritization of 142 species using PD from species poor assemblages or clades [see also 28]. This is particularly the 143 case in spatial assemblages containing multiple taxonomic orders, which are both 144 phylogenetically and ecologically divergent from one another. Interestingly, the PD-FD 145 relationship was not consistent across taxonomic scale: we found that, in contrast to patterns 146 at the family level, for certain mammalian and avian orders (which are older than the families 147 described above), using PD to select species is much worse for capturing FD than choosing 148 species at random (see, for example, the Afrosoricidae, Chiroptera, and Charadriiformes in Fig.   149 S5). 150 We explored whether it is possible to explain this variability within-and between-151 datasets, and in particular, why for some assemblages/clades, a PD-prioritization strategy fails 152 to capture more FD than random choice. It is often implicitly assumed that phylogenetic signal 153 (i.e. the degree to which closely related species tend to harbor similar sets of traits) can be used The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/243923 doi: bioRxiv preprint first posted online Jan. 5, 2018; underlying traits ( Fig. S6-7, on average Spearman Rho = 0.17). Similarly, tree imbalance, which 157 is known to affect surrogacy in simulations [16], did not explain surrogacy in these empirical 158 data ( Fig. S6-7).

159
For mammals, regions where PD did worse than random were located in the Sahara, 160 south western Patagonia, southern Africa including parts of Madagascar, and New Guinea 161 ( Figure 2). These latter two in particular are of concern, since they are global conservation 162 priorities on account of species endemism and habitat loss. We suggest two historical reasons 163 for such idiosyncratic poor performance of PD. First, there is a tendency for a large carnivore 164 species, either a top predator (e.g., cheetahs in the Sahara or foxes in Patagonia) or a large 165 scavenger (e.g., the hyena in South Africa) to co-occur with a close relative with distinct traits in 166 these areas (e.g., a desert cat with the cheetah or the aardwolf with the hyena, see Fig. S8).

167
Only one of these closely-related species will tend to be selected under prioritization schemes 168 that maximize PD, thus reducing the volume of the convex hull on average when the 169 functionally distinct one is not selected (the large predator or scavenger). This seems also to 170 drive the low surrogacy of PD in Charadriiformes (especially Larus and Sterna; see Figure S8).

171
Second, lineages in which traits evolve very slowly will contribute little to FD, even over long The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/243923 doi: bioRxiv preprint first posted online Jan. 5, 2018; being distantly related in the phylogenies we used (Fig. S8). As such, they will be selected in all 179 PD maximizing sets, but will not contribute greatly to FD. 180 In summary, while in specific cases maximizing PD actually captures less FD than a 181 random set, in the majority of cases PD performs well (at least, better than random) as a 182 surrogate (in 88% of the species pool sets the mean surrogacy value ≥0). This represents an 183 important and necessary test of the motivations of conservation planning activities that 184 incorporate PD. However, we simplistically and implicitly assume that chosen species will either 185 be saved or will go extinct and we have not linked our various scenarios to any particular policy and therefore hope that our results generalize beyond the species we study here.

195
The spatial scale of our analysis reflects the scale of available data appropriate for  systems. Our analysis suffers from a similar data limitation. We chose these traits because they 214 are frequently collected in ecological studies, not because we know they are ecologically 215 important. Our assumption is that their phylogenetic distribution is typical of those traits that 216 are most desirable for the purpose of conservation and that our primary results are therefore 217 widely applicable. We urge others to expand our simple test to other clades and traits in order 218 to test the generality of our findings. The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/243923 doi: bioRxiv preprint first posted online Jan. 5, 2018; Prioritizing the most phylogenetically diverse set of taxa in a region or clade will result in an 221 average gain of 18% functional diversity relative to applying the same conservation effort 222 without considering phylogeny, but this gain will decrease as species richness increases. This 223 suggests that PD is a reasonable conservation prioritization strategy, especially in species-poor 224 clades or regions, or in the absence of meaningful data on functional traits. However, we note 225 two important drawbacks of this strategy. First, in cases of either recent trait divergence or, 226 alternatively, very strong trait conservatism, a PD prioritization scheme can capture less FD 227 than a random scheme. Second, we found that while this strategy, on average, captures FD 228 well, it is also somewhat unreliable, and 36% of the time will not capture more FD than random        The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/243923 doi: bioRxiv preprint first posted online Jan. 5, 2018; [40]. Species composition was then extracted from grid cells of 5°x5°, corresponding to 309 approximately 555x555 km at the equator [41]. This grain size of the grid was chosen 310 because it represents a good compromise between the desired resolution and the 311 geographical density of information.       practice as the numbers of combinations of selected species was too high (e.g., 10 71 possible 394 sets for all mammal assemblages). To rapidly and efficiently find the set of species that aim to 395 maximize FD, we developed a novel (at least in ecology) greedy algorithm. In brief, our 396 approach iteratively (starting with two species) select the species that is the furthest from the 397 centroid of the already selected set. To avoid selecting two species that are far from the 398 centroid but close to each other, we penalized the distance to the centroid by the distance to 399 the closest neighbour in the already selected set. Here we present in details the greedy 400 algorithm we used to find the set of species that maximize FD:

401
Step 1. Select the two species with the highest trait distance 402 Step 2. Compute the centroid of these two selected species 403 Step 3. Compute distances between species not in the set and this 'set centroid'.

404
Step 4. Penalize these distance by adding the following factor f (Eq. 4) 405 f = K x e L x minD (eq. 4) 406 with K and L being penalizing factors and minD the distance between a given candidate 407 species and the nearest species already in the selected set.

408
Step 5. Select the species that maximized the penalized distance 409 Step 6. Go back to step one with this new set of species until the desired number of In tests of subsets of the data for which finding the true maxFD was feasible, we found our 419 approach to adequately approximate the true maxFD and to produce a very good independently, the number of cases where FD random >FD maxPD across the 1000 random *1000 444 maxPD sets combinations (i.e. 10 6 comparisons). We then averaged theses number across % of 445 selected species and report statistics across datasets (Supp. Table 1).     The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/243923 doi: bioRxiv preprint first posted online Jan. 5, 2018;