Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Phylogeny estimation: traditional and Bayesian approaches

Key Points

  • Phylogenetic trees can improve the power of comparative sequence analyses by placing raw sequence differences into their historical context — offering an understanding of how the sequences that we see today were created.

  • Neighbour joining provides an extremely fast estimate of the phylogeny that is accurate if relatively little evolution has occurred between sequences.

  • Parsimony can be effectively used if the sampling of sequences is dense (so, long branches are avoided), but this can be difficult to guarantee.

  • Maximum-likelihood techniques use models of sequence evolution to allow for unseen events and account for forces such as variation in rate at different sites in a sequence. These models can improve tree inference when the sequences are not closely related.

  • Bootstrapping provides a robust (though potentially time-consuming) way to assess confidence in phylogenetic estimates.

  • Bayesian techniques rely on the specification of a prior probability and the likelihood (from the data and models of evolution) to assign a posterior probability to hypotheses.

  • Bayesian techniques can account for uncertainty in parameter estimates by marginalizing over ('integrating out') parameters. Marginalization makes the use of complex models of sequence evolution more robust.

  • Markov chain Monte Carlo is an algorithm that allows for efficient estimation of the posterior probability, making Bayesian phylogenetics feasible for most data sets.

Abstract

The construction of evolutionary trees is now a standard part of exploratory sequence analysis. Bayesian methods for estimating trees have recently been proposed as a faster method of incorporating the power of complex statistical models into the process. Researchers who rely on comparative analyses need to understand the theoretical and practical motivations that underlie these new techniques, and how they differ from previous methods. The ability of the new approaches to address previously intractable questions is making phylogenetic analysis an essential tool in an increasing number of areas of genetic research.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Contrast between marginal and joint estimation.

References

  1. Yang, Z. & Bielawski, J. P. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15, 496–502 (2000).

    Article  CAS  Google Scholar 

  2. Huelsenbeck, J. P. & Bollback, J. P. Empirical and hierarchical Bayesian estimation of ancestral states. Syst. Biol. 50, 351–366 (2001).

    Article  CAS  Google Scholar 

  3. Metzker, M. L. et al. Molecular evidence of HIV-1 transmission in a criminal case. Proc. Natl Acad. Sci. USA 99, 14292–14297 (2002).

    Article  CAS  Google Scholar 

  4. Anderson, J. F. et al. Isolation of West Nile virus from mosquitoes, crows, and a Cooper's hawk in Connecticut. Science 286, 2331–2333 (1999).

    Article  CAS  Google Scholar 

  5. Lanciotti, R. S. et al. Origin of the West Nile virus responsible for an outbreak of encephalitis in the northeastern United States. Science 286, 2333–2337 (1999).

    Article  CAS  Google Scholar 

  6. Swofford, D. L., Olsen, G. J., Waddell, P. J. & Hillis, D. M. in Molecular Systematics (eds Hillis, D. M., Moritz, C. & Mable, B. K.) 407–514 (Sinauer Associates, Sunderland, Massachusetts, 1996). An excellent review of parsimony, ML and distance approaches to phylogenetic inference.

    Google Scholar 

  7. Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).

    CAS  Google Scholar 

  8. Studier, J. A. & Keppler, K. J. A note on the neighbor-joining algorithm of Saitou and Nei. Mol. Biol. Evol. 5, 729–731 (1988).

    CAS  PubMed  Google Scholar 

  9. Steel, M. & Penny, D. Parsimony, likelihood and the role of models in molecular phylogenetics. Mol. Biol. Evol. 16, 839–850 (2000).

    Article  Google Scholar 

  10. Nei, M. & Kumar, S. Molecular Evolution and Phylogenetics (Oxford Univ. Press, New York, 2000).

    Google Scholar 

  11. Takahashi, K. & Nei, M. Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. Mol. Biol. Evol. 17, 1251–1258 (2000).

    Article  CAS  Google Scholar 

  12. Farris, J. S. Methods for computing Wagner trees. Syst. Zool. 19, 83–92 (1970).

    Article  Google Scholar 

  13. Fitch, W. M. Toward defining the course of evolution: minimal change for a specific tree topology. Syst. Zool. 20, 406–416 (1971).

    Article  Google Scholar 

  14. Kluge, A. G. & Farris, J. S. Quantitative phyletics and the evolution of anurans. Syst. Zool. 18, 1–32 (1969).

    Article  Google Scholar 

  15. Felsenstein, J. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27, 401–410 (1978). A seminal paper that reported the phenomenon of long-branch attraction.

    Article  Google Scholar 

  16. Hillis, D. M. Inferring complex phylogenies. Nature 383, 130–131 (1996).

    Article  CAS  Google Scholar 

  17. Kim, J. H. General inconsistency conditions for maximum parsimony: effects of branch lengths and increasing numbers of taxa. Syst. Biol. 45, 363–374 (1996).

    Article  Google Scholar 

  18. Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).

    Article  CAS  Google Scholar 

  19. Whelan, S., Lio, P. & Goldman, N. Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends Genet. 17, 262–272 (2001).

    Article  CAS  Google Scholar 

  20. Edwards, A. W. F. Likelihood (Oxford Univ. Press, Oxford, UK, 1972).

    Google Scholar 

  21. Rogers, J. S. & Swofford, D. L. A fast method for approximating maximum likelihoods of phylogenetic trees from nucleotide sequences. Syst. Biol. 47, 77–89 (1998).

    Article  CAS  Google Scholar 

  22. Efron, B. Bootstrap methods: another look at the jackknife. Annals Stat. 7, 1–26 (1979).

    Article  Google Scholar 

  23. Felsenstein, J. Confidence intervals on phylogenies: an approach using the bootstrap. Evolution 39, 783–791 (1985).

    Article  Google Scholar 

  24. Goldman, N., Anderson, J. P. & Rodrigo, A. G. Likelihood-based tests of topologies in phylogenetics. Syst. Biol. 49, 652–670 (2000). A useful taxonomy of the hypothesis-testing approaches for likelihood-based phylogenetics.

    Article  CAS  Google Scholar 

  25. Hillis, D. M. & Bull, J. J. An empirical test of bootstrapping as a methods for assessing confidence in phylogenetic analysis. Syst. Biol. 42, 182–192 (1993).

    Article  Google Scholar 

  26. Zharkikh, A. & Li, W. -H. Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. J. Mol. Evol. 9, 1119–1147 (1992).

    CAS  Google Scholar 

  27. Felsenstein, J. & Kishino, H. Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Syst. Biol. 42, 193–200 (1993).

    Article  Google Scholar 

  28. Efron, B., Halloran, E. & Holmes, S. Bootstrap confidence levels for phylogenetic trees. Proc. Natl Acad. Sci. USA 93, 13429–13434 (1996).

    Article  CAS  Google Scholar 

  29. Swofford, D. L. et al. Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Syst. Biol. 50, 525–539 (2001). A recent contribution to the debate concerning parsimony and likelihood.

    Article  CAS  Google Scholar 

  30. Huelsenbeck, J. P., Ronquist, F., Nielsen, R. & Bollback, J. P. Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294, 2310–2314 (2001). A discussion of the promise that Bayesian phylogenetics holds for transforming evolutionary biology.

    Article  CAS  Google Scholar 

  31. Huelsenbeck, J. P. & Ronquist, F. R. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001).

    Article  CAS  Google Scholar 

  32. Larget, B. & Simon, D. L. Markov Chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16, 750–759 (1999).

    Article  CAS  Google Scholar 

  33. Li, S., Pearl, D. K. & Doss, H. Phylogenetic tree construction using Markov Chain Monte Carlo. J. Am. Stat. Assoc. 95, 493–508 (2000).

    Article  Google Scholar 

  34. Rannala, B. & Yang, Z. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J. Mol. Evol. 43, 304–311 (1996).

    Article  CAS  Google Scholar 

  35. Yang, Z. H. & Rannala, B. Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo method. Mol. Biol. Evol. 14, 717–724 (1997).

    Article  CAS  Google Scholar 

  36. Carlin, B. P. & Louis, T. A. (eds) Bayes and Empirical Bayes Methods for Data Analysis (Chapman and Hall/CRC, Boca Raton, 2000).

    Book  Google Scholar 

  37. Thorne, J. L., Kishino, H. & Painter, I. S. Estimating the rate of evolution of the rate of molecular evolution. Mol. Biol. Evol. 15, 1647–1657 (1998).

    Article  CAS  Google Scholar 

  38. Kishino, H., Thorne, J. L. & Bruno, W. J. Performance of a divergence time estimation method under a probabilistic model of rate evolution. Mol. Biol. Evol. 18, 352–361 (2001).

    Article  CAS  Google Scholar 

  39. Huelsenbeck, J. P., Larget, B. & Swofford, D. L. A compound Poisson process for relaxing the molecular clock. Genetics 154, 1879–1892 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Aris-Brosou, S. & Yang, Z. Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal RNA phylogeny. Syst. Biol. 51, 703–714 (2002).

    Article  Google Scholar 

  41. Zuckerkandl, E. & Pauling, L. in Horizons in Biochemistry (eds Kasha, M. & Pullman, B.) 189–225 (Academic Press, New York, 1962).

    Google Scholar 

  42. Kim, J. Geometry of phylogenetic estimation. Mol. Phylogenet. Evol. 17, 58–75 (2000).

    Article  CAS  Google Scholar 

  43. Hastings, W. K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970).

    Article  Google Scholar 

  44. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953). References 43 and 44 present the Metropolis–Hastings algorithm that is the underpinning of many implementations of MCMC.

    Article  CAS  Google Scholar 

  45. Raftery, A. in Markov Chain Monte Carlo in Practice (eds Gilks, W. R., Richardson, S. & Spiegelhalter, D. J.) 163–187 (Chapman and Hall, New York, 1995).

    Google Scholar 

  46. Consortium, M. G. S. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).

    Article  Google Scholar 

  47. Chang, B. S. W., Jonsson, K., Kazmi, M. A., Donoghue, M. J. & Sakmar, T. P. Recreating a functional ancestral archosaur visual pigment. Mol. Biol. Evol. 19, 1483–1489 (2002).

    Article  CAS  Google Scholar 

  48. Pupko, T., Pe'er, I., Hasegawa, M., Grauer, D. & Friedman, N. A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites: application to the evolution of five gene families. Bioinformatics 18, 1116–1123 (2002).

    Article  CAS  Google Scholar 

  49. Bush, R. M., Bender, C. A., Subbarao, K., Cox, N. J. & Fitch, W. M. Predicting the evolution of human influenza A. Science 286, 1921–1925 (1999).

    Article  CAS  Google Scholar 

  50. Nielsen, R. & Huelsenbeck, J. P. in Pacific Symposium on Biocomputing (eds Altman, R. B., Dunker, A. K., Hunter, L., Lauderdale, K. & Klein, T. E.) 576–588 (World Scientific, Singapore, 2002).

    Google Scholar 

  51. Anisimova, M., Bielawski, J. P. & Yang, Z. H. Accuracy and power of Bayes prediction of amino acid sites under positive selection. Mol. Biol. Evol. 19, 950–958 (2002).

    Article  CAS  Google Scholar 

  52. Suchard, M. A., Weiss, R. E., Dorman, K. S. & Sinsheimer, J. S. Oh brother, where art thou? A Bayes factor test for recombination with uncertain heritage. Syst. Biol. 51, 715–728 (2002).

    Article  Google Scholar 

  53. Fleming, M. A., Potter, J. D., Ramirez, C. J., Ostrander, G. K. & Ostrander, E. A. Understanding missense mutations in the BRCA1 gene: an evolutionary approach. Proc. Natl Acad. Sci. USA 100, 1151–1156 (2001).

    Article  Google Scholar 

  54. Hughes, J. M., Peters, C. J., Cohen, M. L. & Mahy, B. W. Hantavirus pulmonary syndrome: an emerging infections disease. Science 262, 850–851 (1993).

    Article  CAS  Google Scholar 

  55. Thorne, J. L., Kishino, H. & Felsenstein, J. Inching toward reality: an improved likelihood model of sequence evolution. J. Mol. Evol. 34, 3–16 (1992).

    Article  CAS  Google Scholar 

  56. Thorne, J. L., Kishino, H. & Felsenstein, J. An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33, 114–124 (1991).

    Article  CAS  Google Scholar 

  57. Mitchison, G. J. A probabilistic treatment of phylogeny and sequence alignment. J. Mol. Evol. 49, 11–22 (1999).

    Article  CAS  Google Scholar 

  58. Holmes, I. & Bruno, W. J. Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17, 803–820 (2001).

    Article  CAS  Google Scholar 

  59. Lee, M. S. Y. Unalignable sequences and molecular evolution. Trends Ecol. Evol. 16, 681–685 (2001).

    Article  Google Scholar 

  60. Posada, D. & Crandall, K. A. Selecting the best-fit model of nucleotide substitution. Syst. Biol. 50, 580–601 (2001).

    Article  CAS  Google Scholar 

  61. Goldman, N. & Whelan, S. Statistical tests of γ-distributed rate heterogeneity in models of sequence evolution in phylogenetics. Mol. Biol. Evol. 17, 974–978 (2000).

    Article  Google Scholar 

  62. Ota, R., Waddell, P. J., Hasegawa, M., Shimodaira, H. & Kishino, H. Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters. Mol. Biol. Evol. 17, 798–803 (2000).

    Article  CAS  Google Scholar 

  63. Suchard, M. A., Weiss, R. E. & Sinsheimer, J. S. Bayesian selection of continuous-time Markov chain evolutionary models. Mol. Biol. Evol. 18, 1001–1013 (2001).

    Article  CAS  Google Scholar 

  64. Lewis, P. O. A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. Mol. Biol. Evol. 15, 277–283 (1998).

    Article  CAS  Google Scholar 

  65. Matsuda, H. in Pacific Symposium on Biocomputing (eds Hunter, L. & Klein, T. E.) 512–523 (World Scientific, London, 1996).

    Google Scholar 

  66. Lemmon, A. R. & Milinkovitch, M. C. The metapopulation genetic algorithm: an efficient solution for the problem of large phylogeny estimation. Proc. Natl Acad. Sci. USA 99, 10516–10521 (2002).

    Article  CAS  Google Scholar 

  67. Salter, L. A. & Pearl, D. K. Stochastic search strategy for estimation of maximum likelihood phylogenetic trees. Syst. Biol. 50, 7–17 (2001).

    Article  CAS  Google Scholar 

  68. Nixon, K. C. The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15, 407–414 (1999).

    Article  Google Scholar 

Download references

Acknowledgements

This manuscript was greatly improved by comments from three anonymous reviewers. The authors gratefully acknowledge the financial support provided by a grant from the Alfred P. Sloan Foundation/National Science Foundation awarded to P.O.L.

Author information

Authors and Affiliations

Authors

Related links

Related links

DATABASES

LocusLink

BRCA1

FURTHER INFORMATION

BAMBE

BLAST

MEGA

MrBayes

PAML

PAUP

PHYLIP

Phylogeny programs

Glossary

PHYLOGENETIC TREE

A graph depicting the ancestor–descendant relationships between organisms or gene sequences. The sequences are the tips of the tree. Branches of the tree connect the tips to their (unobservable) ancestral sequences.

SYSTEMATICS

The biological discipline that is devoted to characterizing the diversity of life and organizing our knowledge about this diversity (primarily through estimating the phylogenetic relationships between organisms).

BAYESIAN

A branch of statistics that focuses on the posterior probability of hypotheses. The posterior probability is proportional to the product of the prior probability and the likelihood.

PARSIMONY

In systematics, parsimony refers to choosing between trees on the basis of which one requires the fewest possible mutations to explain the data.

CONSENSUS METHOD

A summary of a set of trees in which branches that are not in most of the trees are collapsed to indicate uncertainty.

AGREEMENT SUBTREES

A tree containing the largest subset of sequences for which the relationships among sequences are invariant across all the phylogenies included.

LIKELIHOOD

The probability of the data given the model and tree hypothesis. The likelihood measures how well the data agrees with the predictions made by the model and tree hypothesis.

TRANSITION

A mutation between two pyrimidines (T↔C) or two purines (A↔G).

TRANSVERSION

A mutation between a pyrimidine and a purine (A↔C, A↔T, G↔C or G↔T).

PRIOR PROBABILITY

(The 'prior'). The probability of a hypothesis (or parameter value) without reference to the available data. Priors can be derived from first principles, or based on general knowledge or previous experiments.

BAYES FACTORS

The ratio of the posterior odds to the prior odds for two hypotheses of interest. Bayes factors attempt to measure how strongly the data support or refute a hypothesis.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Holder, M., Lewis, P. Phylogeny estimation: traditional and Bayesian approaches. Nat Rev Genet 4, 275–284 (2003). https://doi.org/10.1038/nrg1044

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg1044

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing