The question of whether or not all life on Earth has an ultimate common origin is a subtle one, complicated by the phenomenon of lateral gene transfer. It has now been tackled with a formal statistical analysis.
Charles Darwin predicted and biologists accept the theory that all extant life traces back to a common ancestor. But how can we formally test the idea? There is a compelling list of circumstantial evidence — for instance, the 'universal' genetic code. However, addressing the question of common origin by applying formal statistical tests to the vast array of molecular sequences now available from all domains of life has long been a challenge. On page 219 of this issue, Theobald1 does just this, and concludes that the accepted view holds.
His approach starts with amino-acid sequences from 23 highly conserved proteins taken from groups that span the three domains of life (eukaryotes, bacteria and archaea). He then applies standard programs for inferring evolutionary trees (or networks) from the protein sequences. The third step is to compare the likelihood values of different models of sequence evolution, and thus different ancestry hypotheses, adjusting for the principle that larger numbers of free parameters are expected to give arbitrary improvement to how well a particular model fits the data. However, taking that into account, Theobald finds strong support for the unity of life compared with even two independent origins.
Perhaps the most interesting aspect of Theobald's work1 is not the conclusion — common ancestry is the default view in science. But a formal test of evolution itself requires considerable ingenuity. Amino-acid sequence similarity alone does not imply common ancestry, because it might be due to convergent evolution. Lateral gene transfer between organisms and uncertainty about the best model of sequence evolution also confound statistical testing of common ancestry.
Theobald's paper reports strong support for the common-ancestry hypothesis over alternatives proposing that any one of the three domains of life had a separate origin (including, for example, some archaea that seem to be genetically and morphologically distinct from other life forms). The findings are in line with a phrase from the much-quoted final paragraph of On the Origin of Species that “probably all organic beings which have ever lived on this earth have descended from some one primordial form”.
Does this mean that life arose just once, more than 3.5 billion years ago? Not necessarily — logically, it is possible that life arose more than once, but that only one of these original life forms has descendants that survive today2. It is also possible that there could have been more than one origin of life that has extant surviving descendants. The claim is simply that all known life has at least one common ancestor, a last universal common ancestor (LUCA). Such a LUCA may also not have been the first organism on Earth. These subtleties concerning origins have recently been discussed by the philosopher Elliot Sober3.
Theobald's analysis1 is definitely not an argument for a 'tree of life' in place of a reticulate network that shows extensive lateral gene transfer, particularly in early life and in bacteria and archaea4,5. Indeed, Theobald considers networks, and 9 of the 23 proteins he analyses are thought to have undergone horizontal transfer early in evolution. There is nothing here that is new. Darwin himself always referred to his “theory of descent with modification”, a phrase that allows for gene transfer between an endosymbiotic organism (such as the mitochondrion precursor) and its host, or laterally between free-living organisms — it is the test of ultimate common origin that is the important part of the current paper.
For decades, biologists have been using DNA- and protein-sequence data to build phylogenetic trees and even a 'tree of life' that stretches across the eukaryotes, bacteria and archaea. It might be assumed that these trees directly demonstrate common ancestry. After all, the various parts of a tree are all connected, so all species will be descended from some ancestral point in the tree — a hypothetical 'root', the position of which may be unknown. The logical problem here is that tree-reconstruction methods will churn out a connected tree for any data, so we need more sophisticated arguments to test common ancestry.
More convincing evidence is the concordance of trees for the same set of taxa across different data sets. This was the basis of the first formal test, performed more than two decades ago, of the process of evolution from a common ancestor in the mammalian tree6. However, tree congruence can also be explained by other processes, and the use of model-selection methods such as the AIC (Akaike information criterion) has since been advocated as a way to test common ancestry7. This method, used by Theobald, makes it possible to compare the strength of support for different hypotheses across a range of models of sequence evolution. An AIC approach helps to adjust for the fact that, with enough free parameters in a complex model, we can explain just about any data.
So what is the signal in sequence data that provides the evidence for common ancestry? In essence, it is site-specific correlations in the amino acids between different species (Fig. 1). These correlations fall off as the coalescence between lineages in a tree becomes deeper in the past8, but if there are sufficient data, the correlations' cumulative significance becomes statistically strong. Conversely, if two lineages have completely separate origins, correlations between amino-acid site patterns in the corresponding two extant species vanish.
As to how much the 'tree of life' is really a tree rather than a tangled network, the jury is still out. One can see evidence for a dominant tree-like signal by using network-based methods that do not force data onto a tree. By contrast, if we ask people to quantify their subjective distances between different colours and run these distances through phylogenetic network software we get a 'colour circle'— nothing like a tree. Yet the same method, applied to distances from many genetic data sets, produces highly tree-like networks, reflecting an underlying bifurcating evolutionary signal9.
Theobald's work1 is unlikely to be the last word on common ancestry. It is difficult to exclude all other explanations for correlations, and further work will probably address this problem. In the meantime, there is now strong quantitative support, by a formal test, for the unity of life.