Key Points
-
Understanding phylogenetic relationships among organisms is a prerequisite of evolutionary studies, as contemporary species all share a common history through their ancestry.
-
The wealth of sequence data generated by large-scale genome projects is transforming phylogenetics — the reconstruction of evolutionary history — into phylogenomics.
-
Traditional sequence-based methods of phylogenetic reconstruction (supermatrix and supertree approaches) can also be used at the genome level.
-
New methods based on whole-genome features are also currently being developed to infer phylogenomic trees.
-
Recent studies have revealed the potential of phylogenomic methods for answering long-standing phylogenetic questions.
-
The supermatrix approach that analyses the concatenation of multiple gene sequences is the best-characterized method. Its potential relies on the increased resolving power provided by the use of a large number of sequence positions, which reduces the sampling error.
-
Including large amounts of data in phylogenomic analyses increases the possibility of obtaining highly supported but incorrect phylogenetic results that are due to inconsistency — that is, the convergence towards an incorrect solution as more data are added.
-
Inconsistency arises because current phylogenetic reconstruction methods do not account for the full complexity of the molecular evolutionary process in their underlying assumptions.
-
The risks of inconsistency in phylogenomics analyses can be reduced by the development of better models of sequence evolution, by the critical evaluation of data properties and by the use of only the most reliable characters.
-
Corroboration of phylogenomic results is an important issue, as whole genomes represent the ultimate source of phylogenetically informative characters. Sources of corroboration include the congruence of results obtained using different phylogenomic methods, and their robustness to taxon sampling.
-
The very nature of the evolutionary process and the limitations of current phylogenetic reconstruction methods imply that parts of the tree of life might prove difficult, if not impossible, to resolve with confidence.
Abstract
As more complete genomes are sequenced, phylogenetic analysis is entering a new era — that of phylogenomics. One branch of this expanding field aims to reconstruct the evolutionary history of organisms on the basis of the analysis of their genomes. Recent studies have demonstrated the power of this approach, which has the potential to provide answers to several fundamental evolutionary questions. However, challenges for the future have also been revealed. The very nature of the evolutionary history of organisms and the limitations of current phylogenetic reconstruction methods mean that part of the tree of life might prove difficult, if not impossible, to resolve with confidence.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Darwin, C. The Origin of Species by Means of Natural Selection (Murray, London, 1859).
Haeckel, E. Generelle Morphologie der Organismen: Allgemeine Grundzüge der Organischen Formen-Wissenschaft, Mechanisch Begründet durch die von Charles Darwin Reformirte Descendenz–Theorie (Georg Reimer, Berlin, 1866) (in German).
Van Niel, C. B. in Perspectives and Horizons in Microbiology (ed. Waksman, S. S.) 3–12 (Rutgers Univ. Press, New Brunswick, 1955).
Zuckerkandl, E. & Pauling, L. Molecules as documents of evolutionary history. J. Theor. Biol. 8, 357–366 (1965).
Woese, C. R. & Fox, G. E. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc. Natl Acad. Sci. USA 74, 5088–5090 (1977).
Eisen, J. A. & Fraser, C. M. Phylogenomics: intersection of evolution and genomics. Science 300, 1706–1707 (2003).
Philippe, H. & Laurent, J. How good are deep phylogenetic trees? Curr. Opin. Genet. Dev. 8, 616–623 (1998).
Rokas, A. & Holland, P. W. Rare genomic changes as a tool for phylogenetics. Trends Ecol. Evol. 15, 454–459 (2000).
Gribaldo, S. & Philippe, H. Ancient phylogenetic relationships. Theor. Popul. Biol. 61, 391–408 (2002).
Holder, M. & Lewis, P. O. Phylogeny estimation: traditional and Bayesian approaches. Nature Rev. Genet. 4, 275–284 (2003).
Qiu, Y. L. et al. The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes. Nature 402, 404–407 (1999).
Moreira, D., Le Guyader, H. & Philippe, H. The origin of red algae: implications for the evolution of chloroplasts. Nature 405, 69–72 (2000).
Baldauf, S. L., Roger, A. J., Wenk-Siefert, I. & Doolittle, W. F. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290, 972–977 (2000).
Madsen, O. et al. Parallel adaptive radiations in two major clades of placental mammals. Nature 409, 610–614 (2001).
Murphy, W. J. et al. Molecular phylogenetics and the origins of placental mammals. Nature 409, 614–618 (2001).
Blair, J. E., Ikeo, K., Gojobori, T. & Hedges, S. B. The evolutionary position of nematodes. BMC Evol. Biol. 2, 7 (2002).
Bapteste, E. et al. The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc. Natl Acad. Sci. USA 99, 1414–1419 (2002). The first phylogenomic study based on the supermatrix approach that includes more than 100 genes for a relatively broad taxon sampling of eukaryotes.
Lerat, E., Daubin, V. & Moran, N. A. From gene trees to organismal phylogeny in prokaryotes: the case of the γ-Proteobacteria. PLoS Biol. 1, e19 (2003).
Rokas, A., Williams, B. L., King, N. & Carroll, S. B. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425, 798–804 (2003). An empirical study on the phylogenomics of yeasts, which shows that, for the same number of positions, a robust phylogenetic tree is recovered more rapidly with randomly selected positions than with entire genes.
Philippe, H. et al. Phylogenomics of eukaryotes: impact of missing data on large alignments. Mol. Biol. Evol. 21, 1740–1752 (2004).
Wolf, Y. I., Rogozin, I. B. & Koonin, E. V. Coelomata and not Ecdysozoa: evidence from genome-wide phylogenetic analysis. Genome Res. 14, 29–36 (2004).
Driskell, A. C. et al. Prospects for building the tree of life from large sequence databases. Science 306, 1172–1174 (2004). References 20 and 22 demonstrate the robustness of the supermatrix approach to a surprisingly high amount of missing data in phylogenomic analyses.
Philippe, H., Lartillot, N. & Brinkmann, H. Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa and Protostomia. Mol. Biol. Evol. 9 February 2005 (10.1093/molbev/msi111). This study demonstrates the impact of the long-branch attraction artefact in phylogenomics and provides evidence for the new animal phylogeny based on relatively large species sampling.
Lecointre, G., Philippe, H., Le, H. L. V. & Le Guyader, H. Species sampling has a major impact on phylogenetic inference. Mol. Phylogenet. Evol. 2, 205–224 (1993).
Graybeal, A. Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47, 9–17 (1998).
Poe, S. & Swofford, D. L. Taxon sampling revisited. Nature 398, 299–300 (1999).
Hillis, D. M., Pollock, D. D., McGuire, J. A. & Zwickl, D. J. Is sparse taxon sampling a problem for phylogenetic inference? Syst. Biol. 52, 124–126 (2003).
Rosenberg, M. S. & Kumar, S. Taxon sampling, bioinformatics, and phylogenomics. Syst. Biol. 52, 119–124 (2003). References 27 and 28 present a recent exchange on the relative importance of character and taxon sampling for phylogenetic inference.
Philippe, H. Rodent monophyly: pitfalls of molecular phylogenies. J. Mol. Evol. 45, 712–715 (1997).
Lin, Y. -H. et al. Four new mitochondrial genomes and the increased stability of evolutionary trees of mammals from improved taxon sampling. Mol. Biol. Evol. 19, 2060–2070 (2002).
Philip, G. K., Creevey, C. J. & McInerney, J. O. The Opisthokonta and the Ecdysozoa may not be clades: stronger support for the grouping of plant and animal than for animal and fungi and stronger support for the Coelomata than Ecdysozoa. Mol. Biol. Evol. 9 February 2005 (10.1093/molbev/msi102).
Sanderson, M. J., Driskell, A. C., Ree, R. H., Eulenstein, O. & Langley, S. Obtaining maximal concatenated phylogenetic data sets from large sequence databases. Mol. Biol. Evol. 20, 1036–1042 (2003).
Kluge, A. G. A concern for evidence and a phylogenetic hypothesis of relationships among Epicrates (Boidae, Serpentes). Syst. Zool. 38, 7–25 (1989).
Felsenstein, J. Inferring Phylogenies (Sinauer Associates, Sunderland, Massachusetts, 2004).
Gatesy, J., Matthee, C., DeSalle, R. & Hayashi, C. Resolution of a supertree/supermatrix paradox. Syst. Biol. 51, 652–664 (2002).
Wiens, J. J. Missing data, incomplete taxa, and phylogenetic accuracy. Syst. Biol. 52, 528–538 (2003).
Bininda-Emonds, O. R. P., Gittleman, J. L. & Steel, M. A. The (super)tree of life: procedures, problems, and prospects. Annu. Rev. Ecol. Syst. 33, 265–289 (2002).
Baum, B. Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon 41, 3–10 (1992).
Ragan, M. A. Phylogenetic inference based on matrix representation of trees. Mol. Phylogenet. Evol. 1, 53–58 (1992).
Bininda-Emonds, O. R. P. The evolution of supertrees. Trends Ecol. Evol. 19, 315–322 (2004).
Liu, F. G. et al. Molecular and morphological supertrees for eutherian (placental) mammals. Science 291, 1786–1789 (2001).
Daubin, V., Gouy, M. & Perriére, G. A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history. Genome Res. 12, 1080–1090 (2002). The first application of a supertree method in phylogenomics showing its usefulness for reconstructing bacterial phylogeny in the presence of horizontal gene transfer.
Gatesy, J., Baker, R. H. & Hayashi, C. Inconsistencies in arguments for the supertree approach: supermatrices versus supertrees of Crocodylia. Syst. Biol. 53, 342–355 (2004).
Salamin, N., Hodkinson, T. R. & Savolainen, V. Building supertrees: an empirical assessment using the grass family (Poaceae). Syst. Biol. 51, 136–150 (2002).
Bininda-Emonds, O. R. P. Trees versus characters and the supertree/supermatrix 'paradox'. Syst. Biol. 53, 356–359 (2004).
Brown, J. R., Douady, C. J., Italia, M. J., Marshall, W. E. & Stanhope, M. J. Universal trees based on large combined protein sequence data sets. Nature Genet. 28, 281–285 (2001).
Brochier, C., Bapteste, E., Moreira, D. & Philippe, H. Eubacterial phylogeny based on translational apparatus proteins. Trends Genet. 18, 1–5 (2002). A comprehensive study of bacterial phylogeny that is based on the supermatrix approach, using statistical methods to detect and exclude genes that are probably affected by horizontal transfer.
Yang, Z. On the best evolutionary rate for phylogenetic analysis. Syst. Biol. 47, 125–133 (1998).
Wolf, Y. I., Rogozin, I. B., Grishin, N. V. & Koonin, E. V. Genome trees and the tree of life. Trends Genet. 18, 472–479 (2002).
Snel, B., Bork, P. & Huynen, M. A. Genome phylogeny based on gene content. Nature Genet. 21, 108–110 (1999).
Tekaia, F., Lazcano, A. & Dujon, B. The genomic tree as revealed from whole proteome comparisons. Genome Res. 9, 550–557 (1999).
Clarke, G. D., Beiko, R. G., Ragan, M. A. & Charlebois, R. L. Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores. J. Bacteriol. 184, 2072–2080 (2002).
Korbel, J. O., Snel, B., Huynen, M. A. & Bork, P. SHOT: a web server for the construction of genome phylogenies. Trends Genet. 18, 158–162 (2002). This paper presents reconstruction of prokaryotic phylogenies based on gene content and the conservation of gene pairs, with a critical view on the impact of horizontal gene transfer on their accuracy.
Dutilh, B. E., Huynen, M. A., Bruno, W. J. & Snel, B. The consistent phylogenetic signal in genome trees revealed by reducing the impact of noise. J. Mol. Evol. 58, 527–539 (2004).
Lin, J. & Gerstein, M. Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. Genome Res. 10, 808–818 (2000).
Wolf, Y. I., Rogozin, I. B., Grishin, N. V., Tatusov, R. L. & Koonin, E. V. Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol. Biol. 1, 8 (2001). A study of bacterial phylogenomics using five independent reconstruction methods to corroborate the emergence of a recurrent phylogenetic pattern.
Fitz-Gibbon, S. T. & House, C. H. Whole genome-based phylogenetic analysis of free-living microorganisms. Nucleic Acids Res. 27, 4218–4222 (1999).
House, C. H. & Fitz-Gibbon, S. T. Using homolog groups to create a whole-genomic tree of free-living organisms: an update. J. Mol. Evol. 54, 539–547 (2002).
House, C. H., Runnegar, B. & Fitz-Gibbon, S. T. Geobiological analysis using whole genome-based tree building applied to the Bacteria, Archaea, and Eukarya. Geobiology 1, 15–26 (2003).
Lake, J. A. & Rivera, M. C. Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. Mol. Biol. Evol. 21, 681–690 (2004).
Gu, X. & Zhang, H. Genome phylogenetic analysis based on extended gene contents. Mol. Biol. Evol. 21, 1401–1408 (2004).
Huson, D. H. & Steel, M. Phylogenetic trees based on gene content. Bioinformatics 20, 2044–2049 (2004).
Sankoff, D. et al. Gene order comparisons for phylogenetic inference: Evolution of the mitochondrial genome. Proc. Natl Acad. Sci. USA 89, 6575–6579 (1992).
Hannenhalli, S. & Pevzner, P. A. Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. J. ACM 46, 1–27 (1999).
Blanchette, M., Kunisawa, T. & Sankoff, D. Gene order breakpoint evidence in animal mitochondrial phylogeny. J. Mol. Evol. 49, 193–203 (1999).
Moret, B., Tang, J. & Warnow, T. in Mathematics of Evolution and Phylogeny (ed. Gascuel, O.) 321–352 (Oxford Univ. Press, Oxford, 2005).
Koski, L. B. & Golding, G. B. The closest BLAST hit is often not the nearest neighbor. J. Mol. Evol. 52, 540–542 (2001).
Philippe, H. & Douady, C. J. Horizontal gene transfer and phylogenetics. Curr. Opin. Microbiol. 6, 498–505 (2003).
Fitch, W. M. Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970).
Stanhope, M. J. et al. Phylogenetic analyses do not support horizontal gene transfers from bacteria to vertebrates. Nature 411, 940–944 (2001).
Sicheritz-Ponten, T. & Andersson, S. G. A phylogenomic approach to microbial evolution. Nucleic Acids Res. 29, 545–552 (2001).
Campbell, A., Mrázek, J. & Karlin, S. Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA. Proc. Natl Acad. Sci. USA 96, 9184–9189 (1999).
Edwards, S. V., Fertil, B., Giron, A. & Deschavanne, P. J. A genomic schism in birds revealed by phylogenetic analysis of DNA strings. Syst. Biol. 51, 599–613 (2002).
Pride, D. T., Meinersmann, R. J., Wassenaar, T. M. & Blaser, M. J. Evolutionary implications of microbial genome tetranucleotide frequency biases. Genome Res. 13, 145–158 (2003). A study showing that phylogenetic signal can be retrieved from the distribution of oligonucleotides in prokaryote genomes.
Qi, J., Wang, B. & Hao, B. I. Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach. J. Mol. Evol. 58, 1–11 (2004).
Nikaido, M., Rooney, A. P. & Okada, N. Phylogenetic relationships among cetartiodactyls based on insertions of short and long interspersed elements: hippopotamuses are the closest extant relatives of whales. Proc. Natl Acad. Sci. USA 96, 10261–10266 (1999).
van Dijk, M. A. et al. Protein sequence signatures support the African clade of mammals. Proc. Natl Acad. Sci. USA 98, 188–193 (2001).
Venkatesh, B., Erdmann, M. V. & Brenner, S. Molecular synapomorphies resolve evolutionary relationships of extant jawed vertebrates. Proc. Natl Acad. Sci. USA 98, 11382–11387 (2001).
Philippe, H. et al. Early-branching or fast-evolving eukaryotes? An answer based on slowly evolving positions. Proc. R. Soc. Lond. B 267, 1213–1221 (2000).
Stechmann, A. & Cavalier-Smith, T. Rooting the eukaryote tree by using a derived gene fusion. Science 297, 89–91 (2002).
Snel, B., Bork, P. & Huynen, M. Genome evolution. Gene fusion versus gene fission. Trends Genet. 16, 9–11 (2000).
Bapteste, E. & Philippe, H. The potential value of indels as phylogenetic markers: position of trichomonads as a case study. Mol. Biol. Evol. 19, 972–977 (2002).
Krzywinski, J. & Besansky, N. J. Frequent intron loss in the White gene: a cautionary tale for phylogeneticists. Mol. Biol. Evol. 19, 362–366 (2002).
Pecon-Slattery, J., Pearks Wilkerson, A. J., Murphy, W. J. & O'Brien, S, J. Phylogenetic assessment of introns and SINEs within the Y chromosome using the cat family Felidae as a species tree. Mol. Biol. Evol. 21, 2299–2309 (2004).
Murphy, W. J. et al. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science 294, 2348–2351 (2001).
Amrine-Madsen, H., Koepfli, K. P., Wayne, R. K. & Springer, M. S. A new phylogenetic marker, apolipoprotein B, provides compelling evidence for eutherian relationships. Mol. Phylogenet. Evol. 28, 225–240 (2003).
Reyes, A. et al. Congruent mammalian trees from mitochondrial and nuclear genes using Bayesian methods. Mol. Biol. Evol. 21, 397–403 (2004).
Soltis, P. S., Soltis, D. E. & Chase, M. W. Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402, 402–404 (1999).
Barkman, T. J. et al. Independent and combined analyses of sequences from all three genomic compartments converge on the root of flowering plant phylogeny. Proc. Natl Acad. Sci. USA 97, 13166–13171 (2000).
Pryer, K. M. et al. Horsetails and ferns are a monophyletic group and the closest living relatives to seed plants. Nature 409, 618–622 (2001).
Soltis, D. E., Soltis, P. S. & Zanis, M. J. Phylogeny of seed plants based on evidence from eight genes. Am. J. Bot. 89, 1670–1681 (2002).
Zanis, M. J., Soltis, D. E., Soltis, P. S., Mathews, S. & Donoghue, M. J. The root of the angiosperms revisited. Proc. Natl Acad. Sci. USA 99, 6848–6853 (2002).
Savolainen, V. & Chase, M. W. A decade of progress in plant molecular phylogenetics. Trends Genet. 19, 717–724 (2003).
King, N. & Carroll, S. B. A receptor tyrosine kinase from choanoflagellates: molecular insights into early animal evolution. Proc. Natl Acad. Sci. USA 98, 15032–15037 (2001).
Lang, B. F., O'Kelly, C., Nerad, T., Gray, M. W. & Burger, G. The closest unicellular relatives of animals. Curr. Biol. 12, 1773–1778 (2002).
Simpson, A. G. & Roger, A. J. The real 'kingdoms' of eukaryotes. Curr. Biol. 14, R693–R696 (2004).
Rivera, M. C. & Lake, J. A. The ring of life provides evidence for a genome fusion origin of eukaryotes. Nature 431, 152–155 (2004).
Esser, C. et al. A genome phylogeny for mitochondria among α-proteobacteria and a predominantly eubacterial ancestry of yeast nuclear genes. Mol. Biol. Evol. 21, 1643–1660 (2004).
Woese, C. R. Bacterial evolution. Microbiol. Rev. 51, 221–271 (1987).
Doolittle, W. F. Phylogenetic classification and the universal tree. Science 284, 2124–2129 (1999).
Ochman, H., Lawrence, J. G. & Groisman, E. A. Lateral gene transfer and the nature of bacterial innovation. Nature 405, 299–304 (2000).
Yang, S., Doolittle, R. F. & Bourne, P. E. Phylogeny determined by protein domain content. Proc. Natl Acad. Sci. USA 102, 373–378 (2005).
Matte-Tailliez, O., Brochier, C., Forterre, P. & Philippe, H. Archaeal phylogeny based on ribosomal proteins. Mol. Biol. Evol. 19, 631–639 (2002).
Felsenstein, J. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27, 401–410 (1978).
Felsenstein, J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791 (1985).
Huelsenbeck, J. P., Ronquist, F., Nielsen, R. & Bollback, J. P. Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294, 2310–2314 (2001).
Huelsenbeck, J. P. Performance of phylogenetic methods in simulation. Syst. Biol. 44, 17–48 (1995).
Swofford, D. L. et al. Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Syst. Biol. 50, 525–539 (2001).
Kolaczkowski, B. & Thornton, J. W. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431, 980–984 (2004). A simulation study showing that the performance of current likelihood-based methods of phylogenetic reconstruction are noticeably affected by heterotachy.
Whelan, S., Lio, P. & Goldman, N. Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends Genet. 17, 262–272 (2001).
Steel, M. A., Lockhart, P. J. & Penny, D. Confidence in evolutionary trees from biological sequence data. Nature 364, 440–442 (1993).
Hendy, M. & Penny, D. A framework for the quantitative study of evolutionary trees. Syst. Zool. 38, 297–309 (1989).
Lopez, P., Casane, D. & Philippe, H. Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19, 1–7 (2002).
Mathews, S. & Donoghue, M. J. The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286, 947–950 (1999).
Goremykin, V. V., Hirsch-Ernst, K. I., Wölfl, S. & Hellwig, F. H. Analysis of the Amborella trichopoda chloroplast genome sequence suggests that Amborella is not a basal angiosperm. Mol. Biol. Evol. 20, 1499–1505 (2003).
Goremykin, V. V., Hirsch-Ernst, K. I., Wölfl, S. & Hellwig, F. H. The chloroplast genome of Nymphaea alba: whole-genome analyses and the problem of identifying the most basal angiosperm. Mol. Biol. Evol. 21, 1445–1454 (2004).
Soltis, D. E. et al. Genome-scale data, angiosperm relationships, and 'ending incongruence': a cautionary tale in phylogenetics. Trends Plant Sci. 9, 477–483 (2004).
Stefanovic, S., Rice, D. W. & Palmer, J. D. Long branch attraction, taxon sampling, and the earliest angiosperms: Amborella or monocots? BMC Evol. Biol. 4, 35 (2004).
Adoutte, A. et al. The new animal phylogeny: reliability and implications. Proc. Natl Acad. Sci. USA 97, 4453–4456 (2000).
Halanych, K. M. The new view of animal phylogeny. Annu. Rev. Ecol. Evol. Syst. 35, 229–256 (2004).
Aguinaldo, A. M. et al. Evidence for a clade of nematodes, arthropods and other moulting animals. Nature 387, 489–493 (1997).
Dopazo, H., Santoyo, J. & Dopazo, J. Phylogenomics and the number of characters required for obtaining an accurate phylogeny of eukaryote model species. Bioinformatics 20 (Suppl. 1), I116–I121 (2004).
Keeling, P. J. & Fast, N. M. Microsporidia: biology and evolution of highly reduced intracellular parasites. Annu. Rev. Microbiol. 56, 93–116 (2002).
Sullivan, J. & Swofford, D. L. Should we use model-based methods for phylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution pattern are violated? Syst. Biol. 50, 723–729 (2001).
Huelsenbeck, J. P. The robustness of two phylogenetic methods: four-taxon simulations reveal a slight superiority of maximum likelihood over neighbor joining. Mol. Biol. Evol. 12, 843–849 (1995).
Gaut, B. S. & Lewis, P. O. Success of maximum likelihood phylogeny inference in the four-taxon case. Mol. Biol. Evol. 12, 152–162 (1995).
Siepel, A. & Haussler, D. Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol. Biol. Evol. 21, 413–428 (2004).
Whelan, S. & Goldman, N. Estimating the frequency of events that cause multiple-nucleotide changes. Genetics 167, 2027–2043 (2004).
Robinson, D. M., Jones, D. T., Kishino, H., Goldman, N. & Thorne, J. L. Protein evolution with dependence among codons due to tertiary structure. Mol. Biol. Evol. 28, 1692–1704 (2003).
Rodrigue, N., Lartillot, N., Bryant, D. & Philippe, H. Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene 347, 207–217 (2005).
Galtier, N. & Gouy, M. Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol. Biol. Evol. 15, 871–879 (1998).
Foster, P. G. Modeling compositional heterogeneity. Syst. Biol. 53, 485–495 (2004).
Fitch, W. M. Rate of change of concomitantly variable codons. J. Mol. Evol. 1, 84–96 (1971).
Tuffley, C. & Steel, M. Modeling the covarion hypothesis of nucleotide substitution. Math. Biosci. 147, 63–91 (1998).
Penny, D., McComish, B. J., Charleston, M. A. & Hendy, M. D. Mathematical elegance with biochemical realism: the covarion model of molecular evolution. J. Mol. Evol. 53, 711–723 (2001).
Galtier, N. Maximum-likelihood phylogenetic analysis under a covarion-like model. Mol. Biol. Evol. 18, 866–873 (2001).
Huelsenbeck, J. P. Testing a covariotide model of DNA substitution. Mol. Biol. Evol. 19, 698–707 (2002).
Lartillot, N. & Philippe, H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, 1095–1109 (2004).
Pagel, M. & Meade, A. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. 53, 571–581 (2004). References 138 and 139 explore promising mixture models to handle sequences that evolved under heterogeneous conditions.
Woese, C. R., Achenbach, L., Rouviere, P. & Mandelco, L. Archaeal phylogeny: reexamination of the phylogenetic position of Archaeoglobus fulgidus in light of certain composition-induced artifacts. Syst. Appl. Microbiol. 14, 364–371 (1991).
Delsuc, F., Phillips, M. J. & Penny, D. Comment on 'Hexapod origins: Monophyletic or paraphyletic?' Science 301, 1482 (2003).
Phillips, M. J. & Penny, D. The root of the mammalian tree inferred from whole mitochondrial genomes. Mol. Phylogenet. Evol. 28, 171–185 (2003).
Gibson, A., Gowri-Shankar, V., Higgs, P. G. & Rattray, M. A comprehensive analysis of mammalian mitochondrial genome base composition and improved phylogenetic methods. Mol. Biol. Evol. 22, 251–264 (2005).
Phillips, M. J., Delsuc, F. & Penny, D. Genome-scale phylogeny and the detection of systematic biases. Mol. Biol. Evol. 21, 1455–1458 (2004). A cautionary tale for phylogenomic studies from the empirical demonstration that compositional bias can lead to inconsistency of some distance methods.
Lopez, P., Forterre, P. & Philippe, H. The root of the tree of life in the light of the covarion model. J. Mol. Evol. 49, 496–508 (1999).
Ruiz-Trillo, I., Riutort, M., Littlewood, D. T. J., Herniou, E. A. & Baguna, J. Acoel flatworms: earliest extant bilaterian metazoans, not members of Platyhelminthes. Science 283, 1919–1923 (1999).
Brinkmann, H. & Philippe, H. Archaea sister group of Bacteria? Indications from tree reconstruction artifacts in ancient phylogenies. Mol. Biol. Evol. 16, 817–825 (1999).
Burleigh, J. G. & Mathews, S. Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life. Am. J. Bot. 91, 1599–1613 (2004).
Pisani, D. Identifying and removing fast-evolving sites using compatibility analysis: an example from the Arthropoda. Syst. Biol. 53, 978–989 (2004).
Miyamoto, M. M. & Fitch, W. M. Testing species phylogenies and phylogenetic methods with congruence. Syst. Biol. 44, 64–76 (1995).
Herniou, E. A. et al. Use of whole genome sequence data to infer baculovirus phylogeny. J. Virol. 75, 8117–8126 (2001).
Riesenfeld, C. S., Schloss, P. D. & Handelsman, J. Metagenomics: genomic analysis of microbial communities. Annu. Rev. Genet. 38, 525–552 (2004).
Philippe, H., Chenuil, A. & Adoutte, A. Can the Cambrian explosion be inferred through molecular phylogeny? Development 120, S15–S25 (1994).
Dobzhansky, T. Nothing in biology makes sense except in the light of evolution. Am. Biol. Teacher 35, 125–129 (1973).
Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).
Rzhetsky, A. & Nei, M. Statistical properties of the ordinary least-squares, generalized least- squares, and minimum-evolution methods of phylogenetic inference. J. Mol. Evol. 35, 367–375 (1992).
Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Thompson, J. D., Higgins, D. G. & Gibson, T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994).
Castresana, J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552 (2000).
Yang, Z. Maximum-likelihood models for combined analyses of multiple sequence data. J. Mol. Evol. 42, 587–596 (1996).
Pupko, T., Huchon, D., Cao, Y., Okada, N. & Hasegawa, M. Combining multiple data sets in a likelihood analysis: which models are the best? Mol. Biol. Evol. 19, 2294–2307 (2002).
Springer, M. S., Amrine, H. M., Burk, A. & Stanhope, M. J. Additional support for Afrotheria and Paenungulata, the performance of mitochondrial versus nuclear genes, and the impact of data partitions with heterogeneous base composition. Syst. Biol. 48, 65–75 (1999).
Swofford, D. L. PAUP*: Phylogenetic Analysis Using Parsimony and other methods (Sinauer Associates, Sunderland, Masachusetts, 2002).
Guindon, S. & Gascuel, O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003).
Philippe, H. & Lopez, P. On the conservation of protein sequences in evolution. Trends Biochem. Sci. 26, 414–416 (2001).
Lockhart, P. J., Larkum, A. W., Steel, M., Waddell, P. J. & Penny, D. Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. Proc. Natl Acad. Sci. USA 93, 1930–1934 (1996).
Philippe, H. & Germot, A. Phylogeny of eukaryotes based on ribosomal RNA: long-branch attraction and models of sequence evolution. Mol. Biol. Evol. 17, 830–834 (2000).
Inagaki, Y., Susko, E., Fast, N. M. & Roger, A. J. Covarion shifts cause a long-branch attraction artifact that unites Microsporidia and Archaebacteria in EF-1α phylogenies. Mol. Biol. Evol. 21, 1340–1349 (2004).
Kishino, H., Miyata, T. & Hasegawa, M. Maximum likelihood inference of protein phylogeny, and the origin of chloroplasts. J. Mol. Evol. 31, 151–160 (1990).
Douady, C. J., Delsuc, F., Boucher, Y., Doolittle, W. F. & Douzery, E. J. P. Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol. Biol. Evol. 20, 248–254 (2003).
Taylor, D. J. & Piel, W. H. An assessment of accuracy, error, and conflict with support values from genome-scale phylogenetic data. Mol. Biol. Evol. 21, 1534–1537 (2004).
Huelsenbeck, J. P. & Rannala, B. Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Syst. Biol. 53, 904–913 (2004).
Lemmon, A. R. & Moriarty, E. C. The importance of proper model assumption in Bayesian phylogenetics. Syst. Biol. 53, 265–277 (2004).
Strimmer, K. & von Haeseler, A. Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13, 964–969 (1996).
Roshan, U., Moret, B. M. E., Williams, T. L. & Warnow, T. in Proc. 3rd Int. IEEE Computational Systems Bioinformatics Conference (CSB, Stanford, California, 2004).
Roshan, U., Moret, B. M. E., Williams, T. L. & Warnow, T. in Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. (ed. Bininda-Emonds, O. R. P.) 301–328 (Springer, Berlin, 2004).
Cavender, J. A. & Felsenstein, J. Invariants of phylogenies in a simple case with discrete states. J. Classif. 4, 57–71 (1987).
Lecointre, G., Philippe, H., Le, H. L. V. & Le Guyader, H. How many nucleotides are required to resolve a phylogenetic problem? The use of a new statistical method applicable to available sequences. Mol. Phylogenet. Evol. 3, 292–309 (1994).
Acknowledgements
We thank N. Rodrigue, N. Rodríguez-Ezpeleta and E. Douzery for critical reading of early versions of the manuscript. Constructive comments from three anonymous referees also helped to make the manuscript more accurate. We apologize to our colleagues whose relevant work has not been cited because of space limitations. The authors gratefully acknowledge the financial support provided by Génome Québec, Canada, the Canadian Research Chair and the Université de Montréal, Canada.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Related links
Related links
FURTHER INFORMATION
Assembling the Fungal Tree of Life
Assembling the Tree of Eukaryotic Diversity — Eu-Tree
Assembling the Tree of Life — Diptera
Assembling the Tree of Life — Early Bird
Blaxter Laboratory Nematode Genomics web site
Cyberinfrastructure for Phylogenetic Research (CIPRes) Project
EMBL Nucleotide Sequence Database
Green Plant Phylogeny Research Coordination Group — DeepGreen
Higher-Level Arthropod Phylogenomics from the Cunnigham Laboratory
NCBI — National Center for Biotechnology Information
Nematode Genome Sequencing Center
PhyCom — a Phylogenetic Community
Glossary
- HOMOLOGOUS CHARACTERS
-
Homologous characters are those that are descended from a common ancestor.
- PRIOR PROBABILITY
-
The probability of a hypothesis (or parameter value) without reference to the available data. This can be derived from first principles, or based on general knowledge or previous experiments.
- NODE
-
Nodes of phylogenetic trees represent taxonomic units. Internal nodes (or branches) refer to hypothetical ancestors, whereas terminal nodes (or leaves) generally correspond to extant species.
- INCONSISTENCY
-
A phylogenetic reconstruction method is statistically inconsistent if it converges towards supporting an incorrect solution with increasing confidence as more data is analysed.
- HOMOPLASY
-
Identical character states (for example, the same nucleotide base in a DNA sequence) that are not the result of common ancestry (not homologous), but that arose independently in different ancestors by convergent mutations.
- CONVERGENCE
-
The independent evolution of similar character states in evolutionarily distinct lineages.
- REVERSAL
-
The independent reacquisition of the ancestral character state in a given evolutionary lineage.
- HOMOLOGY
-
Two sequences are homologous if they share a common ancestor.
- ORTHOLOGY
-
Two sequences are orthologous if they share a common ancestor and originated by speciation.
- HEURISTIC
-
A method of inference that relies on educated guesses or simplifications that limit the parameter space over which solutions are searched. This approach is not guaranteed to find the correct answer.
- BREAKPOINT
-
In the context of phylogenetic methods that are based on gene-order comparison between genomes, a breakpoint is defined when a pair of genes are adjacent in one genome but not in the other.
- HORIZONTAL GENE TRANSFER
-
The transfer of genetic material between the genomes of two organisms, which usually belong to different species, that does not occur through parent–progeny routes.
- PARALLEL GENE LOSS
-
The independent loss of homologous genes in evolutionary distinct lineages.
- SATURATION
-
Mutational saturation occurs when many changes at a given position have randomized the genuine phylogenetic signal.
- ROOT
-
The root of a phylogenetic tree represents the common ancestor of all taxa that are represented in the tree. The position of the root is often determined using an outgroup taxon to determine the order of evolution in the group of taxa of interest.
- MONOPHYLY
-
Monophyletic taxa include all the species that are derived from a single common ancestor.
- STOCHASTIC OR SAMPLING ERROR
-
The error in phylogenetic estimates caused by the finite length of the sequences used in the inference. As the size of the sequences increases, the magnitude of the stochastic error decreases.
- SYSTEMATIC ERROR
-
The error in phylogenetic estimates that is due to the failure of the reconstruction method to fully account for the properties of the data.
- BOOTSTRAP ANALYSIS
-
A type of statistical analysis used to test the reliability of specific branches in an evolutionary tree. The non-parametric bootstrap proceeds by re-sampling the original data, with replacement, to create a series of bootstrap samples of the same size as the original data. The bootstrap percentage of a node is the proportion of times that node is present in the set of trees that is constructed from the new data sets.
- BAYESIAN POSTERIOR PROBABILITY
-
In Bayesian phylogenetics, the posterior probability of a particular node of a tree is the probability that the node is correct, which is conditional on the data and the model used in the analysis both being correct.
- HETEROTACHY
-
The variation of evolutionary rate of a given position of a molecule through time.
- MARKOV CHAIN MONTE CARLO
-
A computational technique for the efficient numerical calculation of likelihoods.
- DISK-COVERING METHODS
-
A family of 'divide-and-conquer' algorithmic methods for large-scale tree reconstruction. They use graph theory to optimally partition the input dataset into small overlapping sets of closely related species, reconstruct phylogenetic trees from these subsets, and combine the subtrees into one tree for the entire set of species.
- COVARION MODEL OF MOLECULAR EVOLUTION
-
In this model, although some sites in a macromolecule are vital to function and can never change through time, most switch between being free to evolve in some species and being invariable in others.
- METAGENOMICS
-
The functional and sequence-based analysis of the collective microbial genomes contained in an environmental sample of uncultured organisms.
Rights and permissions
About this article
Cite this article
Delsuc, F., Brinkmann, H. & Philippe, H. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet 6, 361–375 (2005). https://doi.org/10.1038/nrg1603
Issue Date:
DOI: https://doi.org/10.1038/nrg1603
This article is cited by
-
Incongruence in the phylogenomics era
Nature Reviews Genetics (2023)
-
Application of genomic markers generated for ray-finned fishes in chondrichthyan Phylogenomics
Organisms Diversity & Evolution (2023)
-
UACG: Up-to-Date Archaeal Core Genes and Software for Phylogenomic Tree Reconstruction
Journal of Microbiology (2023)
-
MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution
Genome Biology (2022)
-
Whole-genome resequencing of Coffea arabica L. (Rubiaceae) genotypes identify SNP and unravels distinct groups showing a strong geographical pattern
BMC Plant Biology (2022)