Elevated Extinction Rates as a Trigger for Diversification Rate Shifts: Early Amniotes as a Case Study

Tree shape analyses are frequently used to infer the location of shifts in diversification rate within the Tree of Life. Many studies have supported a causal relationship between shifts and temporally coincident events such as the evolution of “key innovations”. However, the evidence for such relationships is circumstantial. We investigated patterns of diversification during the early evolution of Amniota from the Carboniferous to the Triassic, subjecting a new supertree to analyses of tree balance in order to infer the timing and location of diversification shifts. We investigated how uneven origination and extinction rates drive diversification shifts, and use two case studies (herbivory and an aquatic lifestyle) to examine whether shifts tend to be contemporaneous with evolutionary novelties. Shifts within amniotes tend to occur during periods of elevated extinction, with mass extinctions coinciding with numerous and larger shifts. Diversification shifts occurring in clades that possess evolutionary innovations do not coincide temporally with the appearance of those innovations, but are instead deferred to periods of high extinction rate. We suggest such innovations did not cause increases in the rate of cladogenesis, but allowed clades to survive extinction events. We highlight the importance of examining general patterns of diversification before interpreting specific shifts.


a) Formation of the supertree
A large number of methods exist for combining phylogenetic hypotheses into a supertree. "Matrix representation with parsimony" (MRP), developed independently by Baum 1 and Ragan 2 is the most frequently used and has been subjected to extensive analyses regarding its performance and biases 3-7 . This method treats each node in each source tree as a character. If a species is a descendant of a particular node in a particular source tree, it is scored as "1" for the character representing that node. If it is not a descendant of the node it is scored as "0". Species not included in that source tree are scored as "?". These "characters" are combined into a matrix which may be subjected to a parsimony analysis, rooted on an all-0 outgroup.
All published hypotheses of phylogenetic relationships created using computer algorithms (rather than manually generated) containing 3 or more amniote or diadectomorph taxa from the time period under study were considered as source trees for the analysis. In order that the supertree input data was "accountable" (a concern raised by Gatesy et al. (8)), published trees which did not include full details of their method e.g. did not include a character matrix, details of algorithms and outgroups used, were rejected.
In order to reduce instances of tree non-independence (another issue raised by Gatesy et al. 8 ; including many trees based on the same character list would bias the supertree towards topologies based on those characters) the following procedure was followed (modified from 9 ): (1) if one study uses a character list and a taxon list that is identical to, or a subset of, another analysis, then only the more inclusive study was included as a source tree; (2) if one study uses a character list which is identical to, or a subset of, another analysis, but the taxon lists are not identical nor is one a subset of the other, then a mini supertree (using the MRP method) was constructed of the two or more trees. This mini supertree was included as a source tree; (3) if a study uses different methods to analyse the same dataset e.g. analysing the dataset using both parsimony and Bayesian methods, a mini supertree was constructed from the results of each analysis, and this mini supertree was included as a source tree.
After pruning the list of published phylogenetic analyses in this way, 177 source trees remained (see below). These trees then needed to be standardised with respect to the taxonomic level: different phylogenetic analyses study phylogeny at different levels: the family, genus or species level. This is a problem when combining trees, as the greatest resolution is wanted, but not all studies are undertaken at the species level. The following procedures were carried out to standardise the level of the source trees: (1) If a taxon is not studied at species level in any included analysis, then it is included at the genus or family level in all source trees e.g., Sphenacodon spp., Mesosauridae (2) If the paper specifies that their coding for a higher level taxon above the level of species is based primarily on a particular species, then that species is used to replace the higher-level taxon in the source tree.
(3) If a taxon is included at the genus level or higher in one or more studies, and one or more different studies use more than one different species of that taxon, then all species included in a phylogenetic analysis are included as a polytomy replacing the higher-level taxon e.g. if Ophiacodontidae is used as a terminal taxon in a source tree, it is replaced in this analysis with a polytomy containing Ophiacodon mirus, O.retroversus, Archaeothyris florensis Varanosaurus acutirostris and Stereophallodon ciscoesnsis. As well as standardising the taxonomic level of the source trees, the taxonomy was updated; taxa now considered nomina dubia were removed.
Having been modified in this way, all source trees were incorporated into an MRP matrix using the program Supertree085b 10 (see Supplementary Data 1 for the data supplied to this program, including all source trees in phylip format). The Baum and Regan method was used, with all nodes and trees given equal weight. The MRP matrix (Supplementary Data 2) was subjected to parsimony analysis using the Willi Hennig Society edition of TNT 11 , using the sectorial search, parsimony ratchet and drift algorithms. The minimum length was searched for 100 times. However the MRP matrix of all 177 trees could not be analysed due to the poor overlap between many of the trees; more trees were produced in a single round of searches than could be stored in the memory of TNT. In order to deal with this problem, the list of source trees was divided into 8 categories, based on universal agreement of monophyly: Synapsida, Parareptilia, Archosauromorpha, Lepidosauriformes, Sauropterygia, Ichthyopterygia, and "Basal". The source trees were divided between these categories based on which clade they were representing the relationships of. Those in the "Basal" category include studies examining the relationships of multiple clades relative to each other and those including basal amniotes and basal eureptiles. An MRP matrix was produced for each category, and a supertree created for each clade, using the programs described above. Because of the uncertainty surrounding the position of turtles (either within parareptiles or lepidosauromophs), the categories Parareptilia, Sauropterygia and Basal were combined, and a single supertree of the taxa in the categories produced in order to test which of these relationships was best supported. The supertrees produced in each of these separate analyses were combined, again with MRP.
The final supertree, after collapsing all nodes containing no descendant taxa from the time interval under study and removing post hoc several taxa whose position could not be resolved, contained 686 species (Supplementary Data 3). It should be noted that the lack of resolution of the position of those taxa which needed to be removed was sometimes due to controversy surrounding their relationships, but it could also be due simply to the fact that a species had not been tested against a wide enough sample of taxa for the MRP method to resolve its position e.g. the assignment to Nothosaurus of N. haasi, N. jagisteus, N. edingerae, N. marchicus, N. winterswijkensis, N. youngi, N. juvenilis, N. tchernovi, N. winkelhorsti, N. yangiuanensis is not currently controversial and was supported in a recent study 12 . However, since this study employed few outgroup taxa and no other study has included any Nothosaurus species other than the type and N. giganteus, the MRP methods could not resolve the position of all these species relative to other sauropterygians.
It should be noted that molecular analyses have produced an alternative position of turtles not identified by morphological studies: as the sister to Archosauria 13, 14 . Since molecular studies are not included in this analysis (which only includes tree analysing taxa present in the Palaeozoic-early Mesozoic), this potential position of turtles is not tested. While this could potentially affect an analysis of diversification patterns in amniotes later in the Mesozoic or Cenozoic, it is unlikely to affect this analysis; only four turtle taxa in the supertree are identified as present and only three have ghost lineages extending into the Triassic. Their presence in the Lepidosauromorpha in the supertree has not driven any diversification rate shifts in this clade (only two lepidosauromorph shifts have been identified, one at the base of Sauropterygia, one within Ichthyopterygia). If they were placed as the sister to Archosauria, as suggested by the molecular analyses, one of two outcomes could be observed: 1) the diversification rate shift identified at the base of the Archosauria would move, instead occurring in the clade containing Archosauria+Chelonia; 2) the shift would be identified in the same location, and Archosaurs would be shown to have experienced a higher diversification rate than Chelonia during the Triassic.

b) List of source trees used in the formation of the supertree
For references in which more than one dataset was analysed, the tree used in the study is identified by the figure in which the results were presented.
Archosauromorpha it. The zero-length branches may be shared equally among the ancestral branch 18 or, if a morphological character matrix exists, adjusting branch lengths to represent the proportion of morphological change occurring along each branch 19 . In this case the former method is used due to the lack of character data.
100 of the MPTs found in the MRP analysis were randomly selected for the analysis.
While SymmeTREE does include the option to randomly resolve polytomies, some random resolutions will be suboptimal (less parsimonious) 20 . In order to deal with the uncertainty surrounding the ages of certain taxa, 100 sets of first and last appearences were drawn for each taxon from a uniform proability distribution covering the full possible range of ages for each taxon 21 .

e. Sensitivity analyses
Two possible sources of error were investigated: poorly supported relationships and missing taxa. Ideally, a supertree analysis should represent a summary of researchers' analyses into clade relationships. Although in theory an MRP supertree should not contain any relationships that have never before been suggested, unsupported relationships can appear, albeit rarely 5 . For these reasons, it is necessary to provide a support measure indicating to what extent the source trees support the relationships shown in the supertree. In this study, the V measure was used 22 . Each node is assigned a value between -1 and 1, representing the proportion of source trees supporting the node in question relative to the source trees that do not support it (see Supplementary Data 9 for the support values assigned to each node). In order to ensure that poorly supported relationships do not unduly affect the inferences, the analyses with SymmeTREE were repeated after all nodes with negative support were collapsed into a polytomy, which were then randomly resolved 100 times.
The second potential source of error is the impact of missing taxa. While this supertree is the most comprehensive representation of early amniote relationships available, there are taxa which have not yet been included in any phylogenetic analyses and higher taxa which have not been included in analyses below genus or family level e.g. Sphenacodon, Mesosauridae. One could address this by replacing all higher taxa in the supertree with their constituant species in a polytomy, and then randomly resolving those polytomies in the analysis. However, in the case of early amniotes, this presents a problem; a great many clades have not had their species level taxonomy examined in a cladistic context; in fact a great many of the species level taxonomies have not been questioned since the 1960s or even earlier 23 , a time when classification was often based on stratigraphy, location or body size. In order to assess the impact of the use of higher taxa, an expanded supertree was subjected to the same set of analyses in SymmeTREE. Terminal taxa above species level in the supertree were replaced with a polytomy containing all species currently considered valid, provided this species level taxonomy had been confirmed in recent study (more recently than 25 years). Higher taxa whose species-level taxonomy had not been examined since 1990 were kept as higher taxa. These polytomies were randomly resolved 100 times. The expanded supertree contained 742 taxa and is presented in Supplementary Data 4.
We found that neither uncertainty surrounding taxon age nor poorly supported relationships substantially affect the results. When nodes with negative support are collapsed into a polytomy, the timing and location of substantial and significant diversification shifts identified by SymmeTree are unchanged from those found using the original supertree, with one exception; a diversification rate shift within the Kannemeyeriiformes is not identified when poorly supported nodes are collapsded (Supplementary Data 7). When missing taxa are included, three extra diversification rate shifts are identified. Two of these occur within the therapsid clade Gorgonopsia (at the base of Rubidginae and at the base of Gorgonopsidae), due to the fact that many speciose gorgonopsian genera are only included at the genus level in the basic supertree. The third new diversification rate shift was identified at the base of the ichthyosaur clade Thunnosauria, again due to the large number of speciose taxa only included at the genus level in the original supertree.
The impact of using different methods of time slicing was investigated. Ruta et al. 19 argued for pruning the tree to include only those taxa present in a particular time slice, since Symmetree is designed for use on ultrametric trees. Tarver & Donoghue 24 suggested an alternative in which the tree was grown through time, with new taxa added in each substage, and retaining taxa from preceding time slices. The rationale behind this method is that each speciation event is retained in the tree, allowing the identification of uneven rates of origination or extinction as the driver behind a particular shift 24 . We carried out the shift analyses using both time-slicing methods, hereafter referred to as the Ruta and the Tarver methods.
The use of different time-slicing methods has a greater effect on shift distributions than age (Supplementary Data 7). The Ruta method identified 22 clades experiencing substantial diversification shifts, while the Tarver method identified 28. Of these clades, only 14 were identified by both methods simultaneously. The timing of these 14 shifts differed in most cases; in only five of these 14 clades was the timing of the diversification shift found to be the same when both methods of time slicing are used. Eight clades show shifts only when the Ruta method was used, and 14 clades show shifts only when the Tarver method was used.
The results presented in the main text are those using the ages that take into account uncertainties in dating, as well as the Ruta method of time-slicing. The Tarver method is rejected as it reintroduces the problem that Ruta et al. 19 were trying to solve: the inclusion of extinct taxa means not all lineages will have had equal time to diversify and the assumptions of the Yule model are violated. It has been argued that the Ruta method of time slicing is misleading as the time sliced trees do not include all cladogenetic events. However we would argue that this is not a serious problem when investigating shifts in the rate of diversification.
If less cladogenetic events are observed in one lineage than in its sister, this could imply increased cladogenesis in the sister, or it could imply increased extinction in the lineage having "removed" the speciation events from the observed tree. However, both of these possibilities imply a diversification rate shift. It is true that under the Ruta method one cannot distinguish between these two possibilities, but one can investigate which is more likely by comparing the number and magnitude of shifts to origination and extinction rates, as is done in this paper.

f. Inferring Origination and Extinction Rates from Phylogenies
The inference of origination and extinction rates is central to the analyses presented here and is key to many other macroevolutionary investigations. However, as in all fossilbased evolutionary studies, such analyses have to contend with the issue of sampling biases in the fossil record. First documented occurrences of species in the fossil record are unlikely to be synchronous with the time of origination of those species. It follows that quality and intensity of sampling in any time interval will dictate whether a speciation event occurred in the same time interval where a specimen has been observed, or in an earlier interval. Thus, the better the quality of the fossil record in the time bin immediately preceding the time of first appearance of any given taxon, the less likely the chance that the taxon in question was actually present in earlier time bins than that where it is observed. The same reasoning can be extended to estimates of extinction rates; the last documented occurrence of a lineage in the fossil record is unlikely to represent its very last individual, and the likelihood that the lineage actually went extinct in the same time bin as its observed disappearance will depend on the quality of sampling in later time bins.
A wide range of methods have been devised to incorporate estimates of sampling into origination and extinction rate estimates [25][26][27][28] . However, the vast majority of such methods may not be appropriate for early amniotes, certainly due to the fact that many of the proposed methods rely on a random and extensive sample of multiple individuals (different invertebrate groups offer notable examples) for which dating is relatively unproblematic. Most available methods rely on observed distributions of taxa within and across adjacent time intervals, including whether such taxa cross the bottom and/or top boundaries of an interval, and whether they occur in adjacent intervals. Moreover, such methods do not include singleton taxa (i.e., taxa found exclusively within an interval) 25,26 , which are extensively represented among early amniotes.
Here, a new method is proposed that seeks to correct for sampling in origination and extinction rates, using a time-calibrated phylogeny as a starting point. The per-lineage origination and extinction rates are calculated from the tree, the former by counting the number of cladogenetic events in each time bin, the latter by counting the number of lineages terminating in that time bin.
The origination rate for a given time bin is then multiplied by an estimate of sampling probability from the previous time bin. If the sampling probability in the previous time bin is low, then we can be less sure that the taxa observed in the time bin of interest actually originated in that bin, and therefore our origination rate needs to be lowered. Multiplying the observed extinction rate by the sampling probability (a value between 0 and 1) means that the lower the sampling probability, the more the origination rate will be lowered, representing the fact that one can be less sure that the origination events observed in a particular time bin actually occurred in that bin. A similar correction applies in the case for extinction rates; thus, the estimated extinction rate of the time bin of interest is multiplied by an index that gives a measure of sampling from the following time bin.
The shape of the phylogeny offers a way to estimate sampling probability in each time bin. For each time bin, the number of observed lineages (those represented by fossil occurrences) is divided by the total number of lineages (observed plus ghost lineages) in that time bin. As ghost lineages represent portions of the fossil record that may be inferred to be present but have not been sampled, the proportion of observed lineages relative to combined observed and ghost lineages provides an operational estimate (albeit one that can be constantly refined by novel taxon discoveries) of sampling in that bin (R code for calculating the origination and extinction rates using these methods are supplied in Supplementary Data 5 and 6).
In order to ensure that the proposed method provides reliable estimates of origination and extinction, simulations were carried out. The simulations were undertaken using the simFossilTaxa function in the R paleotree package 29 . This function produces a stochastic birth-death model which allows anagenetic, cladogenetic, and budding speciation based upon user-defined probabilities (for the analysis presented here, the probability of each mode of speciation was set to be equal). A second paleotree function, taxa2cladogram, was used to produce a cladogram using information from the output of the birth-death model. This cladogram takes into account resolutions of ancestor-descendant relationships based upon current phylogenetic methods (i.e., ancestors are resolved either as sister taxa to, or in a polytomy with, descendants, depending on the model of speciation).
These two functions together produce a list of taxa, their temporal ranges, and a cladogram summarising patterns of relationships based upon initial parameters (e.g. origination and extinction rates). The simulated time period was spilt into equal-length bins, and for each taxon in each bin we assigned a sampling probability (the simulations were run with sampling probabilities from 0.1 to 0.9, with equal increments of 0.1). Once the sampling filter had been applied, the tree (with unsampled taxa pruned) was time calibrated using the new age ranges (potentially ranges might be reduced by sampling). The per-lineage origination and extinction rates were calculated using two methods: 1) a raw count, which consists in dividing the observed number of cladogenetic events in each time bin by the observed number of lineages in that bin in order to get the origination rate (for extinction rates, we divided the observed number of lineage terminations in each time bin by the observed number of lineages in that bin); and 2) an inferred count (the method described above), whereby we correct for sampling using the proportion of ghost lineages in each time bin. Both of these methods were compared to the 'true' origination and extinction rate curves obtained from the simulated data before applying the sampling filter, using the Spearman's rank correlation test. These simulations were repeated 1000 times at each sampling probability.
The results of the simulations (Supplementary Table 1) suggest that, at low sampling levels, the new method proposed here provides a close correspondence to the true origination and extinction rates, and outperforms the raw count method. Unsurprisingly, both methods perform worse at low levels of sampling, but the drop in performance of the new method is considerably less pronounced; even at a sampling probability of 0.2, the mean Spearman's rho value (comparing the rates produced by the new method to the 'true' rates generated by simulations) is greater than 0.6, but only about 0.3 for the raw count.
Interestingly, at high levels of sampling (probability of sampling greater than 0.6), the raw count method outperforms the new method. A possible explanation for this finding can be found in the treatment of ancestors in the phylogeny. As discussed above, our current phylogenetic methods resolve ancestors either as the sister to, or in a polytomy with, their descendants. This means that a ghost lineage is inferred where one may not exist. At higher sampling levels, it is considerably more likely that both an ancestor and its descendant will be sampled in the same time bin, a false ghost lineage will be inferred and the sampling estimate will be less reliable. This shows that our new method for estimating origination and extinction rates may be inappropriate for very well sampled clades. However, the new method offers much promise, particularly in studies of Palaeozoic terrestrial vertebrates, for which the quality of the fossil record is often regarded as poor 23,30-31 .