Main

Five decades ago, Zuckerkandl and Pauling published two seminal papers in which they proposed the concept of the molecular evolutionary clock1,2; that is, that the rate of evolution at the molecular level is approximately constant through time and among species. The idea arose when the pioneers of molecular evolution compared protein sequences (haemoglobins, cytochrome c and fibrinopeptides) from different species of mammals1,3,4 and observed that the number of amino acid differences between species correlated with their divergence time based on the fossil record. The field of molecular evolution was revolutionized by this hypothesis, albeit not without controversy5,6,7,8 (Box 1), and biologists took on the task of using the molecular clock as a technique for inferring the dates of major species divergence events in the Tree of Life9.

From the outset, the molecular clock was not perceived as a perfect timepiece but rather as a stochastic clock in which mutations accumulate at random intervals, albeit at approximately the same rate in different species, thus keeping time as a clock does. Initial statistical clock dating methodology that was based on distance and maximum likelihood methods assumed a perfectly constant rate of evolution (the 'strict' clock) and used fossil-age calibrations as point values (even though the fossil record can never provide a precise date estimate for a clade).

Subsequent tests of the molecular clock10,11 showed that it is often 'violated'; that is, the molecular evolutionary rate is not constant, except in comparisons of closely related species, such as the apes. Multiple factors might influence the varying molecular evolutionary rates among species (such as generation time, population size, basal metabolic rate and so on); however, the exact mechanisms of rate variation and the relative importance of these factors are still a matter of debate7,12,13. When the clock is violated, methods for dealing with rate variation include the removal of species that exhibit unusual rates from the analyses14, as well as the so-called local-clock models, which arbitrarily assign branches to rate classes15,16.

Sophisticated statistical models that take into account uncertainty in the fossil record as well as variation in evolutionary rate — and thus enable the strict clock assumption to be 'relaxed' — were not developed until the advent of Bayesian methods in the late 1990s and early 2000s. It is now generally acknowledged that the molecular clock cannot be applied globally or to distantly related species. However, for closely related species, or in the analysis of population data, the molecular clock is a good approximation of reality (Box 2).

Next-generation sequencing technologies and advances in Bayesian phylogenetics over the past decade have led to a dramatic increase in molecular clock dating studies. Examples of recent applications of the molecular clock include the rapid analysis of the 2014 Ebola virus outbreak17, the characterization of the origin and spread of HIV18 and influenza19,20, ancient DNA studies to reconstruct a timeline for the origin and migration patterns of modern humans21,22,23, the use of time trees to infer macroevolutionary patterns of speciation and extinction through time24,25, and the co-evolution of life and the Earth26,27. Knowledge of the absolute times of species divergences has proved critically important for the interpretation of newly sequenced genomes23,28. Exciting new developments in Bayesian phylogenetics include: relaxed clock models to accommodate the violation of the clock29,30,31; modelling of fossil preservation and discovery to generate prior probability distributions of divergence times to be used as calibrations in molecular clock dating32; and the integration of morphological characters from modern and extinct species in a combined analysis with sequencing data33,34.

In this Review we discuss the history, prospects and challenges of using molecular clock dating to estimate the timescale for the Tree of Life, particularly in the genomics era, and trace the rise of the Bayesian molecular clock dating method as a framework for integrating information from different sources, such as fossils and genomes. We do not discuss non-Bayesian clock dating methods35,36,37,38, which typically do not adequately accommodate different sources of uncertainty in a dating analysis. These methods usually involve less computation and may thus be useful for analysing very large data sets for which the Bayesian method is still computationally prohibitive. A detailed review of non-Bayesian clock dating can be found elsewhere39.

Early attempts to estimate the time tree of life

Time trees, or phylogenies with absolute divergence times, provide incomparably richer information than a species phylogeny without temporal information, as they make it possible for species divergence events to be calibrated to geological time, from which correlations can be made to events in the Earth's history and, indeed, to other events in biotic evolution (that is, by calibrating independent but potentially interacting lineages to the same timescale), thus allowing for macroevolutionary hypotheses of species divergences and extinctions to be tested.

As the first protein and DNA sequences became available for a diversity of species, biologists started using the molecular clock as a simple but powerful tool to estimate species divergence times. Underlying the notion that molecules can act as a clock is the theory that the genetic distance between two species, which is determined by the number of mutations accumulated in genes or proteins over time, is proportional to the time of species divergence (Box 1). If the time of divergence between two species is known — from fossil evidence, from a geological event (such as continental break-up or island formation) or from sample dates for bacteria and viruses — the genetic distance between these species can be converted into an estimate of the rate of molecular evolution, which can be applied to all nodes on the species phylogeny to produce estimates of absolute geological times of divergence (Box 2). One of the first applications of this idea was by Sarich and Wilson40, who used a molecular clock to infer the immunological distance of albumins. By assuming a divergence time of 30 Ma between the apes and New World monkeys, they calculated the age of the last common ancestor of humans and African apes (chimpanzees and gorillas) as 5 Ma. This work ignited one of the first 'fossils versus molecules' controversies as, at the time, the divergence between human and African apes was thought to be over 14 Ma on the basis of the ages of the fossils Ramapithecus and Sivapithecus41. The controversy was settled once it was recognized that the fossils are more closely related to the orang-utan than to the African apes.

In response to the expanding genetic sequence data sets that resulted from the PCR revolution in the late 1990s, molecular clock dating was applied to a broad range of species. These studies generated considerable controversy because the clock estimates were much older than the dates suggested by the fossil record, sometimes twice as old42, and many palaeontologists considered the discrepancy to be unacceptably large43. Examples include Mesoproterozoic estimates for the timing of the origin and diversification of the animal phyla relative to their Phanerozoic fossil record44, a Triassic origin of flowering plants relative to a fossil record beginning in the Cretaceous45, and a Jurassic or Cretaceous origin of modern birds and placental mammals relative to fossil evidence that is mostly confined to the period after the end-Cretaceous mass extinction46,47.

The early dating studies suffer from a number of limitations48,49. For example, many studies assumed a strict clock even for distantly related species, and most used point fossil calibrations without regard for their uncertainty25,47. Sometimes, secondary calibrations — that is, node ages estimated in previous molecular clock dating studies — were used48. Despite their limitations, these studies encouraged much discussion about the nature of the fossil record and the molecular clock49 and inspired the development of more sophisticated methods. These early studies proposed a timescale for life on Earth that has now been revised in the newer genome-scale analyses24,50,51.

The Bayesian method of clock dating

The Bayesian method was introduced into molecular clock dating around the year 2000 in a series of seminal papers by Jeff Thorne and colleagues29,52,53. The method has been developed greatly since then30,31,54,55, emerging as the dominant approach to divergence time estimation owing to its ability to integrate different sources of information (in particular, fossils and molecules) while accommodating the uncertainties involved.

The Bayesian method is a general statistical methodology for estimating parameters in a model. Its main feature is the use of statistical distributions to characterize uncertainties in all unknowns. One assigns a prior probability distribution on the parameters, which is combined with the information in the data (in the form of the likelihood function) to produce the posterior probability distribution. In molecular clock dating, the parameters are the species divergence times (t) and the evolutionary rates (r). Given the sequence data (D), the posterior of times and rates is given by the Bayes theorem as follows:

Here, f(t) is the prior on divergence times, which is often specified using a model of cladogenesis (of speciation and extinction54,56, and so on) and incorporates the fossil calibration information52,54; f(r|t) is the prior on the rates of branches on the tree, which is specified using a model of evolutionary rate drift29,30,31; and L(D|t, r) is the likelihood or the probability of the sequence data, which is calculated using standard algorithms11. Figure 1 illustrates the Bayesian clock dating of equation (2) in a two-species case.

Figure 1: Bayesian molecular clock dating.
figure 1

We estimate the posterior distribution of divergence time (t) and rate (r) in a two-species case to illustrate Bayesian molecular clock dating. The data are an alignment of the 12S RNA gene sequences from humans and orang-utans, with 90 differences at 948 nucleotides sites. The joint prior (part a) is composed of two gamma densities (reflecting our prior information on the molecular rate and on the geological divergence time of human–orang-utan), and the likelihood (part b) is calculated under the Jukes–Cantor model. The posterior surface (part c) is the result of multiplying the prior and the likelihood. The data are informative about the molecular distance, d = tr, but not about t and r separately. The posterior is thus very sensitive to the prior. The blue line indicates the maximum likelihood estimate of t and r, and the molecular distance d, with t̂r̂ = . When the number of sites is infinite, the likelihood collapses onto the blue line, and the posterior becomes one-dimensional62.

PowerPoint slide

Direct calculation of the proportionality constant z in equation (2) is not feasible. In practice, a simulation algorithm known as the Markov Chain Monte Carlo algorithm (MCMC algorithm) is used to generate a sample from the posterior distribution. The MCMC algorithm is computationally expensive, and a typical MCMC clock-dating analysis may take from a few minutes to several months for large genome-scale data sets. Methods that approximate the likelihood can substantially speed up the analysis29,57,58. For technical reviews on Bayesian and MCMC molecular clock dating see Refs 59,60.

Nearly a dozen computer software packages currently exist for Bayesian dating analysis (Table 1), all of which incorporate models of rate variation among lineages (the episodic or relaxed clock models envisioned by Gillespie)61. All of these programs can also analyse multiple gene loci and accommodate multiple fossil calibrations in one analysis.

Table 1 Sample of Bayesian programs that use the molecular clock to estimate divergence times*

Limits of Bayesian divergence time estimation

Estimating species divergence times on the basis of uncertain calibrations is challenging. The main difficulty is that molecular sequence data provide information about molecular distances (the product of times and rates) but not about times and rates separately. In other words, the time and rate parameters are unidentifiable. Thus, in Bayesian clock dating, the sequence distances are resolved into absolute times and rates through the use of priors. In a conventional Bayesian estimation problem, the prior becomes unimportant and the Bayesian estimates converge to the true parameter values as more and more data are analysed. However, convergence on truth does not occur in divergence time estimation. The use of priors to resolve times and rates has two consequences. First, as more loci or increasingly longer sequences are included in the analysis but the calibration information does not change, the posterior time estimates do not converge to point values and will instead involve uncertainties31,54,62. Second, the priors on times and on rates have an important impact on the posterior time estimates even if a huge amount of sequence data is used62,63. Errors in the time prior and in the rate prior can lead to very precise but grossly inaccurate time estimates62,64. Great care must always be taken in the construction of fossil calibrations and in the specification of priors on times and on rates in a dating analysis65,66.

As the amount of sequence data approximates genome scale, the molecular distances or branch lengths on the phylogeny are essentially determined without any uncertainty, as are the relative ages of the nodes. However, the absolute ages and absolute rates cannot be known without additional information (in the form of priors). The joint posterior of times and rates is thus one-dimensional. This reasoning has been used to determine the limiting posterior distribution when the amount of sequence data (that is, the number of loci or the length of the sequences) increases without bound31,54. An infinite-sites plot can be used to determine whether the amount of sequence data is saturated or whether including more sequence data is likely to improve the time estimates (Fig. 2). The theory has been extended to the analysis of large but finite data sets to partition the uncertainties in the posterior time estimates according to different sources: uncertain fossil calibrations and finite amounts of sequence data62,63. Application of the theory to the analysis of a few real data sets (including genome-scale data) has indicated that most of the uncertainty in the posterior time estimates is due to uncertain calibrations rather than to limited sequence data24,66.

Figure 2: Infinite-sites plot for Bayesian clock dating of divergences among 38 cat species.
figure 2

There are 37 nodes on the tree and 37 points in the scatter plot. The x axis is the posterior mean of the node ages and the y axis is the 95% posterior credibility interval (CI) width of the node ages. Here the slope (0.612) indicates that every million years of species divergence adds 0.612 million years of uncertainty in the posterior CI. When the amount of sequence data is infinite the points will fall onto a straight line. Here, the high correlation (R2 = 0.98) indicates that the amount of sequence data is very high, and the large uncertainties in the posterior time estimates are mostly due to uncertainties in the fossil calibrations; including more sequence data is unlikely to improve the posterior time estimates. Reproduced from Inoue, J., Donoghue, P. C. J. & Yang, Z. The impact of the representation of fossil calibrations on Bayesian estimation of species divergence times. Syst. Biol. 59 (1), 74–89 (2010), by permission of the Society of Systematic Biologists.

PowerPoint slide

Relaxed clock models — the prior on rates

Unsurprisingly, divergence time estimation under the strict molecular clock is highly unreliable when the clock is seriously violated. In early studies it was common to remove genes and/or lineages that violated the clock from the analysis14, but this method does not make efficient use of the data and is impractical when the clock is violated by too many genes or species. Relaxed clock models have been developed to allow the molecular rate to vary among species. The first methods were developed under the penalized-likelihood and maximum-likelihood frameworks67,68. In Bayesian clock dating, such models are integrated into the analysis as the prior on rates.

Several types of relaxed clock models have been implemented, using either continuous or discrete rates. In the geometric Brownian motion model29,31,52 (also known as the autocorrelated-rates model) the logarithm of the rate drifts over time as a Brownian motion process (Fig. 3a). Let y0 = log(r0) and yt = log(rt), where r0 is the ancestral rate at time 0 while rt is the rate time t later. Then:

Figure 3: Three relaxed clock models of rate drift.
figure 3

The rate of molecular evolution among lineages (species) is described by a time-dependent probability distribution (plotted here for three time points: 1 My, 10 My and 100 My) since the lineages diverged from a common ancestral rate (r0 = 0.35 substitutions per site per 100 My (represented by the dashed line)). a | The geometric Brownian process29,31,52 (here with drift parameter v = 2.4 per 100 My). This model has the undesirable property that the variance increases with time and without bound, and at large times the mode of the distribution is pushed towards zero. b | The geometric Ornstein–Uhlenbeck process (here with v = 2.4 per 100 My and dampening force f = 2 per 100 My) converges to a stationary distribution with constant variance when time is large. c | The independent log-normal distribution30,31 is a stationary process, and the variance of rate among lineages remains constant through time (here with log-variance σ2 = 0.6, the same as the long-term log-variance of the Ornstein–Ulhenbeck process above). The branch length (the amount of evolution along the branch) under the rate-drift models of parts a and b is usually approximated in Bayesian dating software31,52; methods for exact calculation have recently been developed55.

PowerPoint slide

That is, given y0 (or the ancestral rate r0), yt has a normal distribution with mean y0 and variance tν (or rt has a log-normal distribution). Thus, rates on descendent branches are similar to the rate of the ancestral branch, especially if the branches cover short timescales; furthermore, the variance of the rate increases with the passage of time. An unappealing property of Brownian motion is that it does not have a stationary distribution. Over a very long timescale, the log-rate can drift to very negative or very positive values with the rate becoming near zero or very large, and the variance of the rate tends to approach infinity with time. This does not seem to be realistic. A model that does not have this property is the (geometric) Ornstein–Uhlenbeck model (Fig. 3b). The logarithm of the rate follows Brownian motion with a dampening force, leading to a stationary distribution. This model (and the related Cox–Ingersoll–Ross model)55 looks promising and merits further research. Notably, an early implementation of the Ornstein–Uhlenbeck model69 to clock dating inadvertently assumed that evolutionary rates drift to zero with time70. Another type of relaxed clock model assumes a small number of distinct rates on the tree and assigns branches to the rate classes through a random process71,72,73. It is also possible to assume that the rates for branches on the tree do not correlate and are random draws from the same common distribution such as the log-normal30,31 (Fig. 3c).

Fossil calibrations — the prior on times

Molecular clock analyses are most commonly calibrated using evidence from the fossil record74,75. Geological events such as the closure of the Isthmus of Panama or continental break-ups can also be used as calibrations, although such calibrations may also involve many uncertainties owing to assumptions about vicariance, species dispersal potential, and so on76. In Bayesian clock dating, calibration information is incorporated in the analysis through the prior on times.

It has long been recognized that the fossil record is incomplete — temporally, spatially and taxonomically — and long time gaps may exist between the oldest known fossils and the last common ancestor of a group. The first known appearance of a fossil member of a group cannot be interpreted as the time and place of origination of the taxonomic group77. For example, during the 1980s the oldest known members of the human lineage were the Australopithecines, dating to around 4 Ma (Ref. 41), providing a minimum age for the divergence time between humans and chimpanzees. However, since 2000, several fossils belonging to the human lineage have been discovered in quick succession, including Ardipithecus (4.4 Ma), Orrorin (6 Ma) and Sahelanthropus (7 Ma), which pushed the age of the human–chimpanzee ancestor to over 7 Ma (Ref. 78). Some groups have no known fossil record, such as the Malagasy lemurs for which only a few hundred-year-old sub-fossils are known79. The oldest fossil in their sister lineage (the galagos and lorises) dates to 38 Ma, indicating a minimum 38 My gap in the fossil record of lemurs80. Clearly, fossil ages provide good minimum-age bounds on clade ages, but assuming that clade ages are the same as that of their oldest fossil is unwarranted and incorrect81,82.

However, minimum-age bounds alone are insufficient for calibrating a molecular tree. Recent developments in Bayesian dating methodology have enabled soft bounds and arbitrary probability curves to be used as calibrations30,54,83. Soft bounds assign small probabilities (such as 5% or 10%) for the violation of the bounds54. These developments have motivated palaeontologists to formulate probabilistic densities for the true clade ages, rather than focusing on the minimum age. A programme has been launched in palaeontology to reinterpret the fossil record to provide both sharp minimum bounds and soft maximum bounds on clade ages84,85.

We envisage several strategies for generating fossil calibrations, each of which may be appropriate depending on the available data. First, one may use the absence of evidence (the lack of available fossil species in the rock record) as weak evidence of absence and thus construct soft maximum age bounds81,82. Together with hard or sharp minimum-age bounds, they can be used as calibrations. This procedure may involve some subjectivity. Second, fossil occurrences in the rock layers can be analysed using probabilistic models of fossil preservation and discovery to generate posterior distributions of node ages, which can be used in subsequent molecular dating studies32,56,86,87,88. Third, if morphological characters are scored for both modern and fossil species then they can be analysed using models of morphological character evolution to estimate node ages, which serve as calibrations in molecular clock dating. It is advisable to fix the phylogeny for modern species while allowing the placement of the fossil species to be determined by the data. Fossil remains are typically incomplete and their phylogenetic placement most often involves uncertainties89. It is also possible to analyse the fossil or morphological data and the molecular data in one joint analysis, as discussed below (known as total evidence dating)34.

Joint analysis of molecular and morphological data

Morphological characters from both fossil species (which have been dated) and modern species may be analysed jointly with molecular data under models of morphological character evolution to estimate divergence times33,34. The analysis is statistically similar to the analysis of serially sampled sequences in molecular dating of viral or ancient DNA and proteins (Box 3). A perceived advantage of this 'tip-calibration' approach is that it is unnecessary to use constraints on node ages (so-called node calibration). The approach also facilitates the co-estimation of time and topology. Recent applications of this strategy to insects34, arachnids90,91, fish92,93 and mammals94,95,96 have produced surprisingly ancient divergence times97.

Although tip calibration offers a coherent framework for integrating information from molecules and fossils in one combined analysis, its current implementation involves a number of limitations, which may underlie these old date estimates. First, current models of morphological character evolution are simplistic and may not accommodate important features of the data well98. For example, morphological characters tend to be strongly correlated, but almost all current models assume independence. Furthermore, all recent tip-dating studies have analysed discrete morphological characters, but morphologists usually score only variable characters or parsimony-informative characters. Such ascertainment bias, even if correctly accommodated in the model98, greatly reduces information about branch lengths and divergence times in the data. Whereas the removal of constant characters can be easily accommodated98, the removal of parsimony-uninformative characters would require too much computation and is not properly accommodated by any current dating software. Second, a tip-calibrated analysis does not place any constraints on the ages of internal nodes on the tree and may thus be very sensitive to the prior of divergence times or the branching process used to generate that prior compared with dating using node calibrations. In a sense, although node dating uses node calibrations that may be subjective, it allows the palaeontologist's common sense to be injected into the Bayesian analysis. By contrast, tip calibration may be unduly influenced by arbitrary choices of priors implemented in the computer program. Third, it is generally the case that there is far more molecular data than morphological characters, and that morphological characters may undergo convergent evolution in distant species and may evolve at much more variable rates than molecules6. Box 2 presents the case of cranial evolution within the hominoids, in which the rate in the human is about eight times as high as the rate in the chimpanzee. Such drastic changes in morphological evolutionary rate contrast sharply with the near-perfect clock-like evolution of the mitochondrial genome from the same species. Characters with drastically variable evolutionary rates, even if the rate variation is adequately accommodated in the model, will not provide much useful time information for the dating analysis. The small amount of morphological data and the low information content (owing to variable rates) mean that the priors on times and rates will remain important to the dating analysis. Finally, we note that most tip-calibrated studies have not integrated any of the uncertainty associated with fossil dating97.

Resolving the timeline of the Tree of Life

The molecular clock is now serving as a framework for the integration of genomic and palaeontological data to estimate time trees. Advances in Bayesian clock dating methodology, increased computational power and the accumulation of genome-scale sequence data have provided us with an unprecedented opportunity to achieve this objective. However, considerable challenges remain. Although next-generation sequencing technologies99 now enable the cheap and rapid accumulation of genome data for many species100, much work still remains to be carried out to obtain a balanced sampling of biodiversity: some estimates place the fraction of living eukaryotic species that have been described at approximately 14%101, and sequence data are available for a much smaller and skewed fraction. More seriously, fossils are unavailable for most branches of the Tree of Life, and other sources of information (such as geological events76 or experimentally measured mutation rates23) are only rarely available102. The amount of information in fossil morphological characters may never match the information about sequence distances in the genomic data, placing limits on the degree of precision achievable in the estimation of ancient divergence times, because fossil information is essential for resolving sequence distances into absolute times and rates. This problem seems particularly severe in dating ancient divergences, such as the origins of animal phyla103, because at deeper divergences the quality of fossil data tends to be poor, and the evolutionary rates for both morphological characters and sequence data are highly variable among distantly related species.

Challenges also remain in the development of the statistical machinery necessary for molecular clock dating. Current models of morphological evolution are simplistic and should be improved to accommodate different types of data and to account for the correlation between characters. In the analysis of genomic-scale data sets under relaxed clock models, data partitioning is an important but poorly studied area. The rationale for partitioning the sequence data is that sites in the same partition are expected to share the same trajectory of evolutionary rate drift but those in different partitions do not, so that the different partitions constitute independent realizations of the rate-drift process (for example, geometric Brownian motion). Theoretical analysis suggests that the precision of posterior time estimates is mostly determined by the number of partitions rather than by the number of sites in each partition63. However, the different strategies for partitioning large data sets for molecular clock dating analysis are poorly explored. Furthermore, the prior model of rate drift for data of multiple partitions seems to be very important to Bayesian divergence time estimation53, but currently implemented rate models are highly unrealistic. All current dating programs assume independent rates among partitions, failing to accommodate the lineage effect — the fact that some evolutionary lineages or species tend to be associated with high (or low) rates for almost all genes in the genome13. Developing more realistic relaxed clock models for multi-partition data and evaluating their effects on posterior time estimation will be a major research topic for the next few years. Another issue that has been underappreciated in clock dating studies is the fact that speciation events are more recent than gene divergences104 (a result of the coalescent process of gene copies in ancestral populations), and ignoring this may cause important errors when estimating divergence times105.

Despite the multitude of challenges, the prospect for a broadly reliable timescale for life on Earth is currently looking more likely than ever before. Genome-scale sequence data are now being applied to resolve iconic controversies between fossils and molecules. For example, Bayesian clock dating using genome-scale data has demonstrated that modern mammals and birds diversified after the K-Pg boundary24,50 in contrast to non-Bayesian estimates based on limited sequence data that had suggested pre-K-Pg diversification25,47. Similarly, Bayesian clock dating analysis of insect genomes has been used to elucidate the time of insect origination in the Early Ordovician51. We predict that the explosive increase in completely sequenced genomes, together with the development of efficient Bayesian strategies to analyse morphological and molecular data from both modern and fossil species, will eventually allow biologists to resolve the timescale for the Tree of Life. It seems that in reaching its half-century, the molecular clock has finally come of age.