Siddall claims that "minimizing gaps is our optimality criterion" when using likelihood. Forey criticized the "baggage" of probability analyses when criticizing likelihood. I would like to explore both criticisms using empirical examples, using two likelihood approaches for evaluating hypothesized stratigraphic gaps.
Does likelihood minimize stratigraphic gaps?
A parsimony tree of fossil and extant crocodiles1 implied a sum of gaps totaling 743 million years (Ma). This debt distributed over 62 taxa implies a sampling intensity (R) of 0.022 finds per million years2. The stratigraphic ranges of the analyzed taxa reveal that the vast majority of taxa are known from single localities, suggesting a low value of R (ref. 3). In fact, the most likely R given these data (0.011 finds per Ma) 4 is lower than implied by the parsimony tree.
As a result, trees positing >1200 Ma of gaps are considered more likely by stratigraphic data than is the parsimony tree (Fig. 1a). Because ln L [R = 0.022|ranges] = -0.2, stratigraphy would not reject the parsimony tree.
Note that a tree positing no gaps (and a minimum R = 0.97) has a log likelihood of -22. Stratigraphic data would reject such a tree.
Many fossil hyaenids are known from numerous localities and have ranges of more than two Mammal Neogene (MN) zones5. Hyaenid trees positing no debt are considerably more likely than the parsimony tree, which posits 50 units of debt. However, trees positing 6-7 units of stratigraphic debt are the most likely hypotheses (Fig. 1b). This still leaves one gap for every three species.
Testing the appropriate hypothesis
Likelihood analyses of continuous stratigraphic data do estimate the most likely duration to be the observed range6-8. These methods use probabilities of observations given a hypothesis that a taxon is 'already' (or 'still') present. A taxon certainly is present once it is observed and the likelihood is high that the taxon exists immediately prior to its first observation, but the certainty is less than 1.0 (Fig. 2a). This implies that the most likely origination is the first appearance.
Figure 2 Likelihoods of hypothesized gaps given continuous stratigraphic data.
An alternative approach tests a slightly different (but I think more appropriate) hypothesis: what is the likelihood of a particular duration given a specific sampling? This requires estimating R as finds per sampling opportunity ('horizons'). Suppose there are X finds over Y horizons for a species. Because the actual duration is almost always greater than the observed duration, R = X/Y usually overestimates R.
However, we also know that the species is found X-2 times over Y-2 horizons. This gives an estimate, R = (X-2)/(Y-2), which is unbiased (that is as likely to overestimate as underestimate) 3,9. The likelihood of a hypothesized duration given X finds and R = (X-2)/(Y-2) can now be calculated. A duration greater than Y will be more likely than duration=Y unless the taxon is sampled from every horizon over a range of greater than three horizons.
The Ordovician gastropod Lophospira serrulata is sampled 17 times over 91 horizons10. This implies R = 0.169. Parsimony suggests a duration that spans an additional 30 horizons. This hypothesis is much less likely than a gap of 5-6 horizons (Fig. 2a). However, the parsimony hypothesis is considerably more likely under this approach than when the 'presence' hypothesis is tested.
Maximum likelihood (ML) gaps approximate 50% confidence intervals6 because both estimate "typical" gaps. (50% confidence intervals have been used as a most likely durations by prior workers11). Poorly-sampled taxa have larger ML gaps than well-sampled taxa (Fig. 2b). A tree positing no gap would have a likelihood less than the maximum (unless it is known from every horizon within its range and is sampled more than three times).
Does likelihood assume speciation patterns?
Siddall presents an example (his fig. 5) in which three species are derived from a single morphospecies ('budding cladogenesis'). Siddall then claims that likelihood does not consider such trees. However, likelihood not only considers these possibilities, likelihood trees for both hyaenids2 and lophospiroid gastropods12 suggest multiple examples of the budding cladogenic pattern. For example, likelihood derives L. serrulata (and several other lineages) from a long-lived species, L. perangulata13. Likelihood would consider Siddall's fig. 5 for a poorly-sampled clade such as crocodiles, but it (probably) would reject such trees for failing to posit enough gaps.
Likelihood trees for hyaenids and lophospiroids also imply some bifurcation. Likelihood reconstructs apparent speciation patterns from character and stratigraphic data, not through initial assumptions. This is not unique to likelihood: other approaches can explicitly identify budding, bifurcating or anagenetic patterns without assuming any one a priori10,14,15.
The baggage of probability versus the baggage of philosophy
Forey decried the "baggage" of probability. Forey presumably refers to the assumptions entailed when calculating probability. However, the explicit statement of assumptions allows probability to be structured as predicate logic. In predicate logic, the assumptions of each proposition (hypothesis) can themselves be phrased as hypotheses and tested. Ultimately, there can be a hierarchy of increasingly specific propositions implicit in the initial hypothesis that are tested quantitatively.
Non-probabilistic approaches such as parsimony rarely test the assumptions of the initial proposition (for example that maximum character congruence reflects phylogeny). Applying simple propositional logic to real-world problems results in single all-encompassing assumptions. I consider that to be excess baggage.
In Fig. 2a, a homogenous R was assumed. Homogeneity is itself an unknown that can be tested16-18. If tests reject a homogenous R, then ML gaps can be calculated using R's for the initial and/or final intervals. L. serrulata is sampled 13 times in the first 50 opportunities but only four times in the final 41. In this case, homogeneity of R is rejected*.
Likelihoods of hypothesized originations and extinctions now can be recalculated (Fig. 2b). The most likely duration is now closer to the first appearance, but much further from the last appearance.
An important point here is that the violated assumptions of the initial test overestimated the likelihood of the parsimony hypothesis in this particular case. The situation might have been reversed. However, because we clearly stated our premises, we could have recognized this and adjusted our test accordingly.
Likelihood cannot be stereotyped
Hypothetical examples such as Siddall's cannot be made without detailed hypothetical stratigraphic data any more than hypothetical examples about the performance of parsimony can be made without detailed hypothetical character data. The claim that probability entails excessive assumptions ignores the fact that these assumptions are usually testable and that they are less broad than the assumptions made by philosophical methods.
Hypotheses about historical patterns make predictions about numerous aspects of data sets, not simply about character congruence. Probabilistic or not, tests of a hypothesis are always based on assumptions. Devising tests for those assumptions is and will remain an important methodological concern for palaeontologists.
Peter Wagner
Field Museum of Natural History, Chicago, USA