General statistical model shows that macroevolutionary patterns and processes are consistent with Darwinian gradualism

Macroevolution posed difficulties for Darwin and later theorists because species’ phenotypes frequently change abruptly, or experience long periods of stasis, both counter to the theory of incremental change or gradualism. We introduce a statistical model that accommodates this uneven evolutionary landscape by estimating two kinds of historical change: directional changes that shift the mean phenotype along the branches of a phylogenetic tree, and evolvability changes that alter a clade’s ability to explore its trait-space. In mammals, we find that both processes make substantial independent contributions to explaining macroevolution, and are rarely linked. ‘Watershed’ moments of increased evolvability greatly outnumber reductions in evolutionary potentials, and large or abrupt phenotypic shifts are explicable statistically as biased random walks, allowing macroevolutionary theory to engage with the language and concepts of gradualist microevolution. Our findings recast macroevolutionary phenomena, illustrating the necessity of accounting for a variety of evolutionary processes simultaneously.

The scenarios described in Supplementary Figure 1 pose a challenge to the Fabric model. For example, a directional change will alter the mean difference between the species in the dark grey clade and the species in the other two clades. This will increase the phenotypic variance between the dark grey clade and its sister clade (and even to some extent with the outgroup clade). The model might respond to this by increasing the Brownian variance at the node marked by the asterisk, rather than by putting a directional effect along the purple branch. That is, what could look like an increased variance from the perspective of the asterisk-node is actually a directional change that can be explained as a biased random walk with no change to the Brownian variance.
The signal that discriminates these two scenarios is the phenotypic variance within the dark grey clade: if it conforms to that expected from the Brownian variance, then the mean phenotypic change is almost certainly the result of a directional effect and not a change to evolvability. If the phenotypic variance within the dark grey clade is increased relative to that expected from the Brownian variance, then the apparent phenotypic change might be the result of a change in evolvability.
Similarly, a change to evolvability acting at the green circle could be mistaken for a directional change because altering the variance within the dark grey clade can, by chance, change its mean value relative to the other clades. In this case, the model might mistakenly place a directional effect along the purple branch leading to the dark grey clade.
We analysed each of the 2600 simulated datasets to see if the model could locate and separate the two kinds of effect. Supplementary Table 1 reports the correlations between the simulated and MCMC estimated parameter values across all 2600 parameter combinations, using the same MCMC model as described in the main text. The estimated directional effects closely tracked the simulated values, independently of any changes to evolvability affecting the descendant clade.
For the evolvability parameters, the correlation was slightly lower but again indicate that the estimated clade-wise phenotypic variances tracked the simulated values, independently of the mean offset of the clade owing to any directional effect. In a multiple regression including both simulated effects and their interaction, and used to predict either the estimated directional effects (R = 0.977) or the estimated evolvability multipliers (R = 0.836), the correlation is unchanged (to three decimal places) over the simple bivariate correlations, and only the relevant main effect was significant in each case.
In a separate set of simulations, we inserted 37 directional and 11 evolvability changes in random places on a background of Brownian motion using the Carnivore clade from the Time Tree of Life (n=257 species). Thirty-seven directional effects and 11 evolvability changes on a tree of this size mimics the ratio of tips to parameters, and their division into the two classes, that we found in the mammal tree. We repeated this 1000 times randomly varying the position in the tree and the magnitudes of the directional and evolvability changes from one simulation to the next, drawing values of both sets of parameters randomly from the priors we used for the analysis model reported in the main text. We used a Brownian variance of 0.0031, similar to that we found across all mammals (see Table 1, main text). Random directional shifts drawn from the prior were randomly assigned to be positive or negative.
The 1000 independent sets of simulated data each contained information on 48 independent parameters --37 directional and 11 evolvability changes. We analysed each set of data with the Fabric model using the same settings as we used for the mammal data, running each Markov chain to stationarity before sampling 1000 iterations for the posterior sample, and estimated effects as the average of their values over this posterior. The model did not know a priori the location or value of the parameters.
The correlations we report in Supplementary Table 2 analyse the value of the parameter the model estimated at the correct position in the tree against the true value of the parameter at that position. The directional changes are well estimated across the range of their prior and in the presence of other directional and evolvability changes. Confining the analysis to estimated directional effects with coefficients of variation < 0.5, indicative of effects that left strong signals in the simulated downstream data, the correlation improves slightly. a) the 37 directional and 11 evolvability changes on a tree of 257 species matches the density and division of parameters found on the full mammal tree; the analysis model is blind to the location and magnitude of the simulated effects; b) estimated directional and evolvability parameters with coefficients of variation in the posterior sample of < 0.5; c) sample restricted to affected clade sizes of 10 or more species.

Supplementary
The evolvability effects are less well estimated (Supplementary Table 2) but the lower correlation between the estimated and true values is still highly significant. This lower correlation is expected from the probability density of the (eq. 7, Methods, main text and dashed curves Figure 3d), whose variance increases steeply as the number of affected taxa gets smaller. Confining the analysis to estimated that affect 10 or more taxa, the correlation and slope improve considerably.

Identifiability.
When two or more alternative sets of parameters in a statistical model predict identical probability distributions of outcomes, the model cannot find a unique set of parameters to explain the outcome or response variable and the model becomes non-identifiable.
It is useful for studying identifiability of the Fabric model to express it as a multiple linear regression. Let X be a × + 1 design matrix, where the n rows of X correspond to the n species in the phylogeny and the k+1 columns correspond to a column of 1s, followed by k additional columns of dummy codes. The dummy codes correspond to the number of directional effects in the model.
A directional effect falling anywhere in the tree affects all of its downstream descendants' (as defined by the phylogeny) trait values by an amount × (see main text). Let / be the kth directional effect (the ordering of these k effects is arbitrary). Its dummy code in the matrix X will then have a '1' for each of these downstream species, and a '0' for all others.
With X defined this way we can write the model of equation 3 (main text) as: where Y is now a vector of the n species' trait values, X is the design matrix, is the vector of k directional effects, and e is a vector of normally distributed random errors as defined in the main text (equation 3). Then, is the usual ordinary least squares estimator of the directional effects, where V is the variancecovariance matrix as given by the phylogeny under the assumption of Brownian motion, including any changes to the variance as a result of evolvability scalars (main text). Where V takes the form of an identity matrix scaled by a common variance -as in most non-phylogenetic settings -the equation for simplifies to the familiar = ( 5 ) 7 ( 5 ). For simplicity, the vector is here equivalent to a vector of individual × such as we report in the main text. To retrieve the individual on their own, the dummy codes in the design matrix X must be scaled by the appropriate branch lengths.
Least squares regression models are identifiable so long as 4 5 7 9 7 has an inverse (non-singular matrix). We used this property to study the identifiability of the Fabric model for the mammal dataset used in this paper, and in simulated data. In the current context, the design matrix X is not known in advance. Rather, Markov chain Monte Carlo can be seen as a methodology for discovering design matrices that, in this instance, retrieve historical effects, something our simulations confirm. At stationarity, the Markov chain yields a set of parameters that fluctuate around some average or stationary number. We ran six separate Fabric model replicates (the combined model of Table 1, main text) to stationarity on the mammal data, sampling 1000 widely-spaced iterations of the model from the end of the stationary chain, constructing X matrices for each iteration. We found that on average 91.7% of the design matrices yielded identifiable models, ranging from 91.5 to 92.8 across the six independent chains.
We then generated 1000 sets of 37 directional effects and 11 evolvability effects, placing the effects in random locations on the Carnivore clade of the Time Tree of Life (as used for the simulations in Supplementary Table 2). As with the previous simulations, this number of parameters mimics the density of directional and evolvability parameters we found in the mammal tree. On average 90.1% The point corresponding to the density of directional effects we found for the mammal data is labelled (dashed yellow line). For these simulations the number of evolvability scalars was held constant at 11, but this doesn't affect the shape of these curves as the two sets of parameters can be added, here yielding 48 parameters. Dark purple curve: same relationship but having deleted from the sets of 1000 replicates instances in which the scenario in Supplementary Figure 2b arises (see also 2b text). b. Hypothetical and simplified example of non-identifiability that confronts all models of macroevolution concerned with trait evolution. Assume the species in clade 1, and the individual species A and B have trait values drawn from the same statistical distribution such that they differ only by random variation, save for the directional differences indicated in species A and B. Assume all branches are of length = 1. The design matrix X codes for two alternative sets of directional change parameters, { ; , = } and { ? , @ } that predict identical outcomes in species A and B (coloured rows of X matrix). The set { ; , = } envisions a directional change leading to the common ancestor of species A and B, followed by a slight reduction along the branch leading to species A. The set { ? , @ } describes independent changes leading to species A and B from a common ancestral value. Without further information as to the value of the trait in the common ancestor to species A and B, these two explanations of the species' values produce identical outcomes, and the matrix (X'V -1 X) -1 does not have an inverse (there is an exact linear dependency between = and ? ). In this hypothetical case, the set of parameters { ; , = } is more likely given the prior beliefs about the distribution of directional effects (see Methods, main text) and the MCMC will favour it over the set { ? , @ }. If instead of species A and B both differing from the expected distribution, only one of them does, then only a single change would be needed along the branch leading to that species and the parameters would become identifiable. Source data are provided as a Source Data file.
Nearly all cases of non-identifiability take the form of a problem in ancestral state or ancestral evolutionary process (e.g., directional change, evolvability change) reconstruction that all macroevolutionary-comparative models must confront. Two hypothetical species A and B in Supplementary Figure 2b have trait values arbitrarily set at +2 and +3 units larger than expected under a simple Brownian model. The parameter set { ; , = } expresses the belief the trait had increased by the time of their common ancestor, followed by a slight reduction in the branch leading to species 2 (assume all branches are of length = 1). The parameter set { ? , @ } suggests the two increases in the value of the trait occurred independently in the two final branches. These two sets of parameters, { ; , = } and { ? , @ }, yield identical predictions, and are formally non-identifiable (Supplementary Figure 2b). Other values could be assigned to these two sets of parameters that will yield equivalent outcomes, but all such sets will by definition be non-identifiable (if instead of species A and B both having increased values, only one of them does, then only a single change would be required along the branch leading to that species and the model would become identifiable).
Without outside or extra information about the value of the trait in the past, deciding which set of changes is more likely requires assumptions about how evolution progresses. Parsimony and minimum evolution methods algorithmically resolve questions of identifiability by favouring scenarios that require fewer changes or less evolution. Here, a minimum evolution method would assign an intermediate value to the common ancestor followed by an increase in one species and a decrease in the other. The Fabric model decides between the scenarios in Supplementary Figure 2b by asking which is more likely given the prior assumptions the model makes about the distribution of directional effect sizes. The set { ; , = }, requiring one large change and one small change is more probable than the two intermediate-sized changes of { ? , @ } and the MCMC will favour it.
Approximately half of the branches of a phylogeny lead to species, making this hypothetical scenario likely when effects are placed on a tree at random locations. Deleting from the sets of 1000 replicates cases in which this scenario occurred yields the upper curve in Supplementary Figure 2a; now nearly all cases are identifiable within the density of directional effects we find.
Fossil evidence for abrupt and early large-magnitude changes. Alroy's 1 database of fossil mammals records n=1109 pairs of putative ancestor-descendant estimated body sizes, and the length of time separating the two fossils. We calculated the absolute change in log10(mass) for each pair, and normalised them by calculating an expected phenotypic variance using the Brownian variance from the directional change model (σ = = 0.004, main text, Table 1) and the elapsed time separating the two fossils. We chose those pairs with z-scores > 2, as changes that might be indicative of a strong directional effect. This yielded n=588 pairs of changes (note: restricting the sampled pairs to the upper 10% of the z-scores yields 59 pairs and gives the same qualitative results as shown in Supplementary Figure 3).

Supplementary Figure 3. Changes in size in ancestral-descendant pairs of mammalian fossils. a
The absolute change in log10size for 588 ancestral-descendant pairs of mammalian fossils. The dashed curve is the prior used on the directional effects in the mammal data (main text), and the solid curve is a best-fitting log-normal; b Changes plotted against the length of time over which the change was measured (data from Alroy 1 ). The curved lower limit of absolute change versus time indicates that changes of that magnitude or less fail to surpass the two-standard deviation criterion to be included as a directional change given their elapsed time; c Same data plotted against time of first appearance, appearing to show that a disproportionate number of large-magnitude changes occurred early in mammalian evolution, and smaller changes closer to the present, paralleling Figures 5a,b (main text); d After an initially slow period, directional effects accumulate broadly linearly with time (r 2 = 0.958), in contrast to Figure 6 (main text) possibly reflecting idiosyncrasies of this fossil dataset of ancestral-descendant pairs. Source data are provided as a Source Data file.
The histogram of absolute log10 changes for these 588 pairs (Supplementary Figure 3a) resembles that of Figure 3a (main text), albeit with a smaller mean. The absolute changes are relatively independent of time ( Supplementary Figure 3b), similar to Figure 3b (main text). Elapsed times for the fossil data are approximately 1/10 th of those in the mammal tree, and many of the largest changes occur over the shortest time spans. This supports the idea that the changes inferred from the model (see main text) occur over far shorter time periods than the length of the branch on the phylogenetic tree implies, and that the relationship in Figure 3b (main text) is not an artefact of long phylogenetic branches that might conceal extinction events. Figure 5b (main text) shows that the size of directional phenotypic changes ( . . , × ) the model detects early in mammalian evolution are large (absolute value) and decline closer to the present (slope=-0.0106, p<0.0001, n=417). We questioned whether that relationship is a limitation of the comparative method: perhaps only strong signals from the distant past will survive in the contemporary data.
Two lines of evidence are informative about this question. One is that the fossil data also show this decline (Supplementary Figure 3c), although with a shallower slope (slope=-0.0012, p<0.0009, n=588), leaving open the possibility that at least part of the decline in the reconstructed directional effects arises because small effects from the distant past get erased by later events.
The simulated data that we reported in the previous section provide a second line of evidence. We find that across all values of the simulated directional effects, the slope of the estimated effect against time since the common ancestor of the tree is not different from zero: slope = 0.0004988 (p=0.41). On the other hand, restricting ourselves to branches that have coefficients of variation < 0.5 for their associated , the slope is -0.0061 (p<0.0001). This suggests that investigators should be aware that the declining size of phenotypic changes such as we observed in Figure 5b (main text) could be influenced by a limitation of the comparative method.

Magnitude of directional changes is compatible with a biased random walk.
Under the null model of no directional effects in trait evolution, the expected amount of change that occurs along branches is zero, with a variance equal to = , where = is the Brownian variance and t is the length of the branch in which they are found. Converting the variances ( = ) to standard deviations yields changes in units of the trait (here, log10 body mass). The red (lower) curve in Supplementary Figure 4 plots the size of a two-standard-deviation unbiased random walk for a given branch length, and assuming a Brownian variance of σ = = 0.004 per million years. When summed over many O + 1 -O segments in a branch of total length t, the first term on the right above is just the variance of incremental Brownian motion, = , the second term records the contribution to the variance of the many individual displacements , and the third term has an expectation of zero because the individual displacements of the random walk are unbiased and centred around zero. The variance of the trait undergoing this biased walk over longer time period is then

Supplementary
The green, purple and tan curves in Supplementary Figure 4 correspond to cases of a two-standard deviation random walk, but where ct takes the value of one, two or five standard deviations of the variance, that is, the bias had led to the trait changing by one, two or five standard deviations over the length of the branch, on top of any variance of change from the random walk.
Equation S1 can be used to calculate the variances of change in the trait values brought about by the directional parameters, where now takes the role of . Because the record the change along the length of the entire branch, as inferred from the actual differences among the species at the tips of the phylogeny, they include the variance of the random walk. Thus, I = for these inferred changes is Nearly all the 417 observed directional changes fall above the red two-standard deviation unbiased Brownian random walk curve and below the biased random walk corresponding to a five standard deviation change per million years. We want to know whether these directional effects require any special evolutionary explanation. Previous reviews of field studies of wild populations suggest that three to five standard deviation changes to morphological traits can be achieved in as few as 50 generations 2,3 . But a non-trivial amount of the phenotypic variance that selection has acted on in these field population studies will have already existed in the study populations, and so they do not address the question of whether the supply of new mutational variance that enters the population each generation is sufficient to fuel directional selection occuring over the many millions of years represented by some of the branch lengths of a phylogenetic tree.
The critical question is whether typical within-population phenotypic variances, Z = , are large enough to support sufficient amounts of per generation genetic change over long time periods to produce effects of the sizes observed in Supplementary Figure 4. The average within-population phenotypic variance in body size for twenty-three mammal species in one study 4 was 0.010 (log-transformed data) or 0.005 removing outliers (Supplementary Table 2). In a study of sixteen bird species 5 which reported the variance of the morphological trait with the largest within-population variance for each species, the value was 0.011 (log-transformed data, excluding two-outliers). Estimated variances using logtransformed snout-vent length measurements in nineteen rattlesnake populations (Crotalus viridis) average 0.002 for females, and 0.003 for males 6 ; including both sexes the average variance is 0.004.
These observed within-population variances are roughly comparable to the estimated Brownian variance per million years in this study, illustrating the slow realised pace of Mammalian morphological evolution. The per generation mutational variance component of the within-population variance is on the order of 10 -2 to 10 -4 of the environmental variance, which itself can be estimated at one-half or more of the phenotypic variance 7,8 . Using these values as guides, the within-population mutational variances [ = estimated from the phenotypic variances reported above for mammals are on the order of ~2.5 × 10 -5 to 10 -7 (Supplementary Table 2) The median of the n=417 variances of change per million years calculated from supplementary equation 2 for the directional changes in the mammals, is σ I(W) = / = 0.036 (95% 0.007 − 0.889). Adopting a conservative assumption of around 250,000 generations per million years for the average mammal (many mammals will have a million or more generations in this time), the 0.036 figure translates to a per generation variance of change of ≤ 1.5 × 10 7i (Supplementary Table 3). This falls at the smaller end of the range of mutational variances, and is consistent with previous work comparing between and within-species variances 5,7 . The figure for the changes the model deemed compatible with neutral drift (light purple dots in Supplementary Figure 4 is ~1.6 × 10 7j , an order of magnitude smaller. This analysis, adapting a methodology developed by Lynch 7 , and using conservative assumptions, points to the mutational variance being an adequate source of new variation to sustain long-term directional selection. It yields the expectation that the directional shifts we observe in the mammals are explicable as biased random walks against a constant-variance background, agreeing with previous work on this question 7,9 ; indeed the range of directional phenotypic changes we observe is compatible with neutral drift 7,9 . Continued work on estimating within-population variances can usefully refine these conclusions. It might even be possible to use the estimated within-population mutational variance to define a prior on the directional effects in the Markov chain.

Supplementary
The plotted points in Figure 4  Six early evolvability changes are not part of an evolutionary trajectory. Supplementary Figure 5 shows the six earliest changes to evolvability on the tree (main text, Figure 2). Red dots identify the locations of increased evolvability ( > 1, an increase to the Brownian variance), blue dots are the reverse ( < 1). The dotted circular-line identifies the approximate position of the K-T extinction event. Although one evolvability effect < 1 is nested within an earlier event of increased evolvability, it is not part of a general trajectory of a narrowing of the evolutionary variance in the mammals. The other two blue dots occur after the two early red dots but are not descendant taxa. Even if these changes were part of a general evolutionary trajectory, this is opposite to the pattern expected from an OU process. That process expects that the realised differences among species are small early on, gradually rising to an equilibrium value. For later evolvability changes (see Figures 2, 4 main text).
Supplementary Figure 5. The mammalian tree used in this study. The position and timing of the first six evolvability multipliers, showing that the progression from enhanced (red) to reduced (blue) evolvability is unlikely to be part of a trend because they are only partly nested. Dashed line is the K-T boundary.