Phylogenetic measures of biodiversity and neo- and paleo-endemism in Australian Acacia

Mishler, Brent D.; Knerr, Nunzio; González-Orozco, Carlos E.; Thornhill, Andrew H.; Laffan, Shawn W.; Miller, Joseph T.

doi:10.1038/ncomms5473

Article
Published: 18 July 2014

Phylogenetic measures of biodiversity and neo- and paleo-endemism in Australian Acacia

Brent D. Mishler^1,2,
Nunzio Knerr¹,
Carlos E. González-Orozco^1,3,
Andrew H. Thornhill^1,4,
Shawn W. Laffan⁵ &
…
Joseph T. Miller¹

Nature Communications volume 5, Article number: 4473 (2014) Cite this article

12k Accesses
193 Citations
101 Altmetric
Metrics details

Subjects

Abstract

Understanding spatial patterns of biodiversity is critical for conservation planning, particularly given rapid habitat loss and human-induced climatic change. Diversity and endemism are typically assessed by comparing species ranges across regions. However, investigation of patterns of species diversity alone misses out on the full richness of patterns that can be inferred using a phylogenetic approach. Here, using Australian Acacia as an example, we show that the application of phylogenetic methods, particularly two new measures, relative phylogenetic diversity and relative phylogenetic endemism, greatly enhances our knowledge of biodiversity across both space and time. We found that areas of high species richness and species endemism are not necessarily areas of high phylogenetic diversity or phylogenetic endemism. We propose a new method called categorical analysis of neo- and paleo-endemism (CANAPE) that allows, for the first time, a clear, quantitative distinction between centres of neo- and paleo-endemism, useful to the conservation decision-making process.

You have full access to this article via your institution.

Download PDF

Climate change and land use threaten global hotspots of phylogenetic endemism for trees

Article Open access 31 October 2023

Wen-Yong Guo, Josep M. Serra-Diaz, … Jens-Christian Svenning

Endemism patterns are scale dependent

Article Open access 30 April 2020

Barnabas H. Daru, Harith Farooq, … Søren Faurby

Distribution and relative age of endemism across islands worldwide

Article Open access 12 August 2019

Simon Veron, Thomas Haevermans, … Roseli Pellens

Introduction

Biodiversity is not just species—instead it is the full set of nested clades representing phylogenetic relationships among organisms at all levels. Species are, at best, only one level of clades among thousands, smaller and larger¹. Unfortunately, biodiversity is most often studied solely at the species level, which misses both the full richness of patterns that can be inferred from the full tree of life, and the analytical power that comes from a phylogenetic approach. Our perception of biodiversity patterns becomes more complete when phylogenetic methods are added to traditional species-based methods^2,3.

Likewise, endemism is not just about species, even though virtually all endemism studies focus solely at the species level. Clades at all levels can be endemic and all levels are relevant to discovery and evaluation of centres of endemism. Endemism, rather than being species-centric, should be more broadly defined to mean ‘the geographic rarity of that portion of a phylogenetic tree found in a given area’. This phylogenetically based definition encompasses clades that are at the traditional species level, but also takes into account clades larger than or smaller than named species, and so provides a more complete picture of endemism.

The relevance of phylogeny to ecology and evolution is widely recognized and has revolutionized those fields^4,5,6,7,8; however, the relevance of phylogeny to biodiversity assessment and conservation remains generally underappreciated despite groundbreaking steps in this direction^9,10. Phylogenetic measures of biodiversity were pioneered by Faith¹¹, who developed the concept of phylogenetic diversity (PD), which has been increasingly explored in recent years^12,13,14,15. Faith et al.¹² and Rosauer et al.¹⁶ then established phylogenetic concepts of endemism. Faith et al.’s approach was to identify what parts of a phylogenetic tree are absolutely restricted to a given region, an approach that could be called ‘absolute phylogenetic endemism’. Rosauer et al.’s approach considered the relative breadth of geographic distribution of parts of a phylogenetic tree that are found in a given region, an approach that could be called ‘weighted phylogenetic endemism’.

Rosauer et al.’s definition (which is applied throughout this paper and referred to as PE) is directly analogous to weighted endemism for species (or other terminal taxa in a phylogeny, abbreviated WE¹⁷). The range of either a branch or species can be measured using various units, for example, the number of grid cells it occurs in, and the range of a branch is the union of ranges of terminal taxa descended from it. PE for a region is the length of a branch multiplied by the proportion of its range which occurs in that region (the inverse of the range for a single-cell case), summed over all the branches found in that region, just as species endemism (WE) for a region is one multiplied by the proportion of a species’ range which occurs in that region, summed over all species in the region¹⁶.

It has long been recognized that there are two kinds of endemic species: neo-endemics—recently diverged species that are endemic because of lack of dispersal/migration out of their ancestral area; and paleo-endemics—old species that were perhaps more widespread in the past and are now restricted to a local region^18,19,20. This traditional taxonomic formulation is suboptimal for two reasons. The first is theoretical: this formulation only deals with species, yet clades at all levels can be endemic. The second is methodological: a rigorous analytical approach has so far been lacking to separate the two kinds of endemism in practice. This paper aims to solve both issues by presenting and illustrating a general approach to studying endemism at all phylogenetic scales. It provides the first quantitative measure to clearly distinguish centres of neo-endemism from centres of paleo-endemism. Our approach also allows the discovery of areas that are centres of both neo- and paleo-endemism, we call such areas ‘centres of mixed-endemism’, while centres with extremely high values of both we call ‘centres of super-endemism’.

Another important step forward was the development of methods to examine differences among regions in PD: ‘PD-dissimilarity’²¹ or ‘phylogenetic beta-diversity’²², and to apply these to conservation concerns, for example, ‘PD-complementarity’¹². These methods use a pairwise distance matrix among regions as a basis for cluster analyses and ordinations, but instead of standard distance metrics based on the proportion of shared species, they use a phylogenetically based metric on the basis of the proportion of shared branches.

Australia presents the best current opportunity for studying large-scale patterns of PD and PE in plants because of the nearly complete digitization of herbarium collections by Australia’s Virtual Herbarium ( http://avh.ala.org.au/). Here we take advantage of this rich source of distributional data, and the generation of new DNA sequence data gathered for phylogenetic purposes, to study one of the most diverse clades of Australian plants, the legume genus Acacia. Over 1,000 species have been described within the clade of Australian Acacia²³, <1% of which occur beyond Australia²⁴. It is estimated that this clade diverged from its closest relatives around 25 Myr ago and has spread into most Australian climatic areas including the monsoonal tropics, the arid interior and the Mediterranean climates of southern Australia²⁴. Acacia has diversified into a vast array of vegetative forms during this radiation and this has resulted in a complicated morphologically based taxonomic classification²⁵. Basic patterns of species richness (SR) and endemism in Acacia across the Australian continent are known²⁶; however, little is known about the spatial distribution of Acacia in a phylogenetic context.

Our goals were to: (1) map patterns of PD and PE in Acacia across the Australian continent; (2) explore properties of a new index (relative phylogenetic diversity or RPD), designed to identify and distinguish areas of phylogenetic overdispersion and clumping that reflect signals of biogeographic history and ecological processes; (3) explore properties of another new index (relative phylogenetic endemism or RPE), within a novel framework called Categorical Analysis of Neo- And Paleo-Endemism (CANAPE), designed to identify and distinguish centres of neo-endemism from centres of paleo-endemism in a rigorous way; (4) develop novel hypothesis tests for these measures using appropriate null models; (5) examine similarities and differences among the identified centres of PE with respect to implications for conservation.

We found that, while SR and PD are generally correlated, there are regions with much more PD or much less PD than expected given our hypothesis test. The new RPD index works well to distinguish these regions and gives insight into ecological and biogeographic processes. Likewise, while WE and PE are generally correlated, there are regions with much more PE or much less PE than expected given our two-step CANAPE hypothesis test using the new RPE index, corresponding to centres of paleo-endemism and neo-endemism, respectively. When comparing the discovered centres of endemism using a phylogenetic beta-diversity analysis, we found interesting biogeographic patterns of similarity in the parts of the phylogenetic tree shared among areas, and were able to identify areas of particular conservation concern where parts of the phylogeny remain unprotected.

Results

Phylogenetic analyses

The final molecular data set had 4,044 aligned nucleotides across six loci. The maximum likelihood tree topology recovered is shown in Supplementary Fig. 1 and the data set and tree are lodged in TreeBase (ID 13659, http://treebase.org/treebase-web/search/study/summary.html?id=13659).

Basic biodiversity analyses

Maps of SR, WE, PD and PE are shown in Fig. 1. Bivariate plots and linear regression analysis examining the relationships among these variables revealed that they are significantly positively correlated, but with variable scatter. For example, while PD is significantly related to SR, there is still reasonable scatter (r²=0.876; Supplementary Fig. 2), and no sign of a plateau at the highest levels of richness found in this study (at very high richness, a decline in increase of PD would be expected as most of the tree becomes represented). PE is also significantly, but less closely, related to SR (r²=0.400; Supplementary Fig. 3). PD is significantly related to PE, but again with much scatter (r²=0.475; Supplementary Fig. 4).

**Figure 1: Maps showing basic biodiversity patterns in Australian *Acacia*.**

Development of null hypotheses

It is important to look at the expected values of these variables in light of appropriate null hypotheses, thus we developed two new metrics: RPD and RPE as the basis for null hypotheses to be tested statistically using a randomization approach (see Methods for details about these two derived metrics and the hypothesis test).

Randomization tests

Randomization-based significance tests of PD, RPD, PE and RPE are shown in Fig. 2. Areas of significantly high PD include southwestern Australia, Tasmania and the southern coast of South Australia; areas of significantly low PD are scattered broadly across most of the remainder of the continent (Fig. 2a). Areas of significantly high RPD include southwestern Australia, central Australia and the southern coast of South Australia, with a few cases in northern and eastern Australia; areas of significantly low RPD include many locations in the eastern Great Dividing Range and southeastern Australia (Fig. 2b). Areas of significantly high PE include southwestern and western Australia and scattered areas along the east coast and Tasmania; areas of significantly low PE are scattered broadly across the interior of the continent (Fig. 2c). Areas of significantly high RPE include southwestern Australia, the Pilbara Region, central Australia, wet tropic sites in Queensland and Tasmania, while areas of significantly low RPE are mostly found in the southeast part of the continent; interestingly the northern region of the Monsoonal tropics is underrepresented for both (Fig. 2d).

Figure 2: Maps showing significance levels resulting from of a randomization test in Australian *Acacia.*

Figure 3a shows the results of the two-step CANAPE described in the Methods, while Fig. 3b shows a bivariate plot comparing the numerator and denominator of RPE, used in CANAPE, to help understand the classification of centres of endemism shown in Fig. 3a. Areas with grid cells dominated by paleo-endemism include southwestern Australia, the Gascoyne Region, central Australia, wet tropic sites in Queensland and Tasmania. Grid cells dominated by neo-endemism are restricted to the coast of New South Wales. Areas of mixed endemism are mainly in southwestern Australia and along the southeast coast, with super-endemic sites largely confined to the Southwest.

**Figure 3: CANAPE, a two-step procedure described in text.**

Identifying and comparing areas of endemism

Two hundred and forty-six grid cells of significantly high endemism were identified from the results of the CANAPE test. The cluster analysis using PD-dissimilarity (Fig. 4) revealed that these grid cells tend to cluster geographically. The southeast (blue), southern South Australia (turquoise) and many central locations (brown) are more similar in terms of the parts of the phylogenetic tree they share as compared with the southwest (shades of green) and western central areas (shades of red and purple). Interestingly, the greatest diversity of phylo-clusters is present in central-west Western Australia, and there is a major biogeographic break observable between the southwest and areas immediately north in the Wheat Belt and central Western Australia coast.

**Figure 4: Map (a) and cluster analysis (b) showing phylogenetic similarity relationships among centres of endemism for Australian *Acacia*.**

Discussion

Investigating the phylogenetic patterns of biodiversity and endemism adds significantly to the traditional approach that considers species diversity alone. For example, 13 out of 21 previously recognized centres of raw species endemism in Acacia (that is, comparable to the measure shown in Fig. 1b) are located in the southeastern and southwestern temperate regions of Australia²⁶. Many of the centres of PE found in this study are located in the same regions, but add new localities previously unidentified by the traditional species-based metrics (Fig. 3a). These results provide critical information that can guide conservation planning because they locate biodiversity centres in terms of evolutionary history and potential refugia.

The null hypothesis for testing the phylogenetic measures employed here requires a tree, since PD and PE are only defined given a tree. One tree that can be used is the actual tree, and indeed the basic hypothesis test of PD and PE we applied (shown in Fig. 2a,c) simply compares the observed value with what one would expect if the same number of taxa were randomly drawn from the actual tree, similar to the relative phylogenetic diversity (PD_rel) measure of Davies et al.²⁷ This allows one to infer whether the measure is significantly high or low for a given number of terminal taxa drawn from that tree, but this test is entirely dependent on the particular tree at hand, and not comparable to other studies underlain by a different tree. One could attempt to generalize solely based on the number of terminal taxa present, but that would not be a sound reasoning without basing the expectation for PD and PE on a generalizable comparison tree giving relationships among the terminal taxa.

Therefore, to have a more general, useful and completely phylogenetic null model, we developed two derived metrics that are new to this study, RPD and RPE, both ratios that compare the PD or PE observed on the actual tree in the numerator to that observed on a comparison tree in the denominator. Several comparison tree topologies were explored in this study, but only one was employed for the analyses presented here, as it represents the most generalized null model for our purposes. This comparison tree gives the expectation for PD and PE if all branches on the actual tree topology (interior and exterior) were equal in terms of branch length. This tree is equivalent to one commonly used early approach to measurement of PD that counted nodes on a tree^9,11, and is equivalent to a punctuational model of evolution. The hypothesis test of RPD or RPE tells one how much observed PD or PE differs from that null expectation, for example, asking ‘is PD significantly high or low compared to what I would expect with that number of terminal taxa randomly selected from my tree if all its branches were equal in length?’ The expectation of the ratio is 1, and significant departure from the expected allows us determine if there is an over-representation of long branches or short branches on the actual tree, an important innovation that is useful for addressing several key biological questions, as detailed below.

Significantly high RPD indicates an area where there is an over-representation of long branches. This could have several alternative explanations. One possibility is historical biogeography: the area is refugial, containing relicts from past climate change²⁸. Another possibility is ecology: the result of competition that prohibits close relatives from co-occurring in the same communities (that is, phylogenetic overdispersion⁸). Separating these two possible causes would be assisted by mapping ecologically significant variables to the tree.

Significantly low RPD indicates an area where there is an over-representation of short branches. This pattern also could have alternative explanations, including evolutionary: the area is a place of recent divergence of lineages. Another possibility is ecological: the result of habitat filtering based on phylogenetically conserved traits that result in close relatives co-occurring in the same communities (that is, phylogenetic clustering⁸). Adding ages on branches would help separate these explanations as would mapping ecologically significant variables to the tree.

This comparison tree is particularly useful for the purpose of distinguishing centres of neo-endemism and paleo-endemism. Since PE is simply the PD of a range-weighted tree (that is, where each branch has been divided by its range size), then when RPE is significantly greater than 1 it must mean there is an over-representation of rare long branches and when it is significantly less than 1 it must mean there is an over-representation of rare short branches. This is because rare long branches, whether terminal branches or deeper, in the actual tree are longer than the null expectation and vice-versa for rare short branches.

However, since RPE is a ratio, if the purpose is identifying centres of significant endemism, it is important to realize that spurious conclusions are possible when interpreting the significance of RPE. It is possible to have a significantly high or low RPE ratio when both the numerator and denominator are quite small, and hence when there is not a significant amount of endemism present. Thus we realized that a two-step process is necessary for finding areas of significant PE; we need to first establish that there is a significant amount of endemism in a grid cell, then use the RPE ratio to parse the significant centres of endemism into those dominated by rare long branches (paleo-endemism), those dominated by rare short branches (neo-endemism) and those with rare branches of mixed lengths. This is the two-step CANAPE test described in the Methods.

By comparing Fig. 2d with Fig. 3a, it is possible to see the need for the two-step approach: Fig. 2d shows some grid cells that are significantly high or low in RPE that are not actually centres of PE and thus are not significant in Fig. 3a. The scatter plot in Fig. 3b helps to show what is going on: the randomized values are grey, and most are clustered in the lower left corner along with the nonsignificant actual values (beige coloured). Of the significant actual values, the centres of paleo-endemism (blue) occupy space in the upper left of the distribution, where PE on the actual tree is larger than PE expected on the comparison tree (indicating the rare branches must be longer than expected), while the centres of neo-endemism (red) occupy space in the lower right of the distribution, where PE on the actual tree is less than PE expected on the comparison tree (indicating the rare branches must be shorter than expected). The centres of mixed endemism tend to occur in the upper right of the distribution, with the highly significant values (here termed super-endemism) in the far upper right.

In this way, CANAPE is able to distinguish different types of centres of endemism, and can thus give insights into different evolutionary and ecological processes that may be responsible for these patterns. The centres of paleo-endemism indicate places where there are over-representation of long branches that are rare across the landscape. This pattern seems to be a clear indication of refugial areas where clades that are present may have suffered high extinction and range contraction in past eras. Note that there could be centres of paleo-endemism superimposed geographically in an area that is caused by climatic or geological events at different times in the earth’s history. This would be indicated if the rare long branches of an area group into two or more different age categories in a dated phylogeny. We identified several areas of paleo-endemism in Acacia using the CANAPE test (Fig. 3a). These areas include the wet tropics in northern Queensland, central alpine areas of Tasmania, southwest Western Australia, the Gascoyne region in Western Australia and scattered areas in the arid centre of the continent.

The centres of neo-endemism indicate an area where there is an over-representation of short branches that are rare on the landscape. This could, for example, indicate a place where peripheral isolates tend to diversify, thus enabling studies of speciation. We identified only a few areas of neo-endemism in Acacia using the CANAPE test in southeastern Australian including the Greater Sydney Basin.

Centres of a third type of endemism were identified by CANAPE in the southwest and southeast (Fig. 3a), complex centres containing a mixture of both paleo-endemism and neo-endemism. The most highly significant of these sites we here term ‘super-endemic’ sites—such sites are mostly restricted to the mega-diverse southwest. The two main areas of super-endemism are north of Perth in the Wheat belt area and along the Albany coast.

The cluster analysis, using PD-dissimilarity to compare only those grid cells that were determined to be significant centres of endemism (Fig. 4), gives insights into relationships among them based on shared branches of the phylogeny. The temperate region of Australia is subdivided into mainly western and mainly eastern clusters. The Southwestern Australian Floristic Region²⁹ is recognized as one of the world’s biodiversity hotspots. We found a cluster specific to that zone (I, J, K, L and M in Fig. 4); there is interesting geographic substructure in this region with a distinctive SW–NE gradient. These gradients are well documented in the literature and mainly reflect the high rainfall zone on the western regions, a semi-arid transitional rainfall zone towards the north east and a southeastern zone with relatively high rainfall²⁹. Clusters E and F consist of sites scattered in the interior and north. Cluster H groups sites in the Eremaean biome in the centre of the continent and South Australia, while cluster A groups scattered sites in the centre of the continent and the southern coast of Victoria and Western Australia. Cluster D groups a distinctive set of sites on the southern coast of South Australia. The Southeast temperate biome (including Tasmania), contains areas of mountains with a combination of tropical, subtropical and Mediterranean climates; it is represented by cluster B. Wet tropical sites in coastal northern Queensland (cluster C) are quite distinct, but group with cluster B rather than with the sites in the northern tropical Monsoonal biome (cluster E), which includes all of the northern regions from the Kimberley to Cape York Peninsula. The Gascoyne cluster (cluster G) is located on the western side of the Eremaean biome in an area of topographical complexity and interestingly groups with the monsoonal and central Western Australia clusters E and F rather than the nearby southwestern cluster (I, J, K, L and M), marking a major biogeographic break.

Conservation prioritization can be evaluated from Figs 3a and 4. For example, the three most important large areas of paleo-endemism to conserve in terms of complementarity with each other would be southwest Western Australia, the Gascoyne region and Tasmania. Reserves located in central-west Western Australia would capture more PD than any others. By overlaying our results with the current protected areas database³⁰ we found 25 cells that do not intersect with any currently protected areas. These cells fell into seven of the clusters (A, E, F, G, H, I and K) and are indicated with black borders on the map in Fig. 4. The clusters with the poorest current protection are E and F, their unprotected grid cells are pointed out in Fig. 4.

Much future work is needed for the continent of Australia (and elsewhere) to add comparable analyses of PD and PE in other groups with different phylogenetic time-depths and biological attributes. The methods proposed here allow, for the first time, a quantitative distinction between centres of neo-endemism and centres of paleo-endemism, and enable meta-analyses across groups to identify general patterns in the biota for ecological and evolutionary explanation and for overall conservation assessment. These methods are valuable additions to the conservation decision-making process; reserve design can be guided by assessment of phylogeny rather than species counts alone and can identify complementary areas of biodiversity¹² that have unique evolutionary histories and traits in need of conservation.

Methods

Assembly of geographic data

We extracted all Acacia records from the Australia’s Virtual Herbarium database³¹, totalling 218,388 records. These were corrected as outlined in González-Orozco et al.²⁶ To ensure a standard taxonomy in the analyses, we only used species names accepted by the Australian Plant Census³². Varieties and subspecies were included at the species level. A total of 171,758 records remained following the correction process, comprising the 1,020 species of Acacia occurring in Australia. A data subset, containing 132,295 records, was generated that contained the data for the 508 species, which are sampled in the phylogenetic analysis. This data set is available from the Dryad digital repository: http://doi.org/10.5061/dryad.dv4qk.

Assembly of molecular data

The sampling consisted of 510 taxa, representing single specimens of 508 Acacia species, and two outgroup taxa, Parachidendron pruionsum and Paraserianthes lophantha subsp. lophantha, that were selected based on results of previous studies^{33,34,35,36,37}. Each Acacia species in the sample set was chosen from a larger set of 1,152 sequenced samples of the same 508 species in the following way. In the majority of cases, multiple specimens of a single species were monophyletic and the specimen with the best DNA sequence coverage of the six DNA loci was used to represent that species. In the case that multiple specimens representing a species were polyphyletic, the representative specimen was chosen by (1) belonging to the largest clade of specimens for that species and (2) by reference to the Flora of Australia²³. DNA was extracted from fresh leaf samples that were collected either in the field or from cultivated plants of known provenance, and where no other material was available, from herbarium specimens. Six regions were amplified and sequenced, which included four plastid: psbA-trnH intergenic spacer, trnL-F intron and intergenic spacer, rpl32-trnL intergenic spacer, and a portion of the matK intron, and two nuclear: ETS and ITS. Details of the procedures can be found in Miller et al.³⁸ All DNA sequences are deposited in Genbank; accession codes for sequences newly generated for this study are provided in the Accession codes section below (see also Supplementary Table 1).