Key Points
-
Phylogenetic trees can improve the power of comparative sequence analyses by placing raw sequence differences into their historical context — offering an understanding of how the sequences that we see today were created.
-
Neighbour joining provides an extremely fast estimate of the phylogeny that is accurate if relatively little evolution has occurred between sequences.
-
Parsimony can be effectively used if the sampling of sequences is dense (so, long branches are avoided), but this can be difficult to guarantee.
-
Maximum-likelihood techniques use models of sequence evolution to allow for unseen events and account for forces such as variation in rate at different sites in a sequence. These models can improve tree inference when the sequences are not closely related.
-
Bootstrapping provides a robust (though potentially time-consuming) way to assess confidence in phylogenetic estimates.
-
Bayesian techniques rely on the specification of a prior probability and the likelihood (from the data and models of evolution) to assign a posterior probability to hypotheses.
-
Bayesian techniques can account for uncertainty in parameter estimates by marginalizing over ('integrating out') parameters. Marginalization makes the use of complex models of sequence evolution more robust.
-
Markov chain Monte Carlo is an algorithm that allows for efficient estimation of the posterior probability, making Bayesian phylogenetics feasible for most data sets.
Abstract
The construction of evolutionary trees is now a standard part of exploratory sequence analysis. Bayesian methods for estimating trees have recently been proposed as a faster method of incorporating the power of complex statistical models into the process. Researchers who rely on comparative analyses need to understand the theoretical and practical motivations that underlie these new techniques, and how they differ from previous methods. The ability of the new approaches to address previously intractable questions is making phylogenetic analysis an essential tool in an increasing number of areas of genetic research.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout

References
Yang, Z. & Bielawski, J. P. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15, 496–502 (2000).
Huelsenbeck, J. P. & Bollback, J. P. Empirical and hierarchical Bayesian estimation of ancestral states. Syst. Biol. 50, 351–366 (2001).
Metzker, M. L. et al. Molecular evidence of HIV-1 transmission in a criminal case. Proc. Natl Acad. Sci. USA 99, 14292–14297 (2002).
Anderson, J. F. et al. Isolation of West Nile virus from mosquitoes, crows, and a Cooper's hawk in Connecticut. Science 286, 2331–2333 (1999).
Lanciotti, R. S. et al. Origin of the West Nile virus responsible for an outbreak of encephalitis in the northeastern United States. Science 286, 2333–2337 (1999).
Swofford, D. L., Olsen, G. J., Waddell, P. J. & Hillis, D. M. in Molecular Systematics (eds Hillis, D. M., Moritz, C. & Mable, B. K.) 407–514 (Sinauer Associates, Sunderland, Massachusetts, 1996). An excellent review of parsimony, ML and distance approaches to phylogenetic inference.
Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).
Studier, J. A. & Keppler, K. J. A note on the neighbor-joining algorithm of Saitou and Nei. Mol. Biol. Evol. 5, 729–731 (1988).
Steel, M. & Penny, D. Parsimony, likelihood and the role of models in molecular phylogenetics. Mol. Biol. Evol. 16, 839–850 (2000).
Nei, M. & Kumar, S. Molecular Evolution and Phylogenetics (Oxford Univ. Press, New York, 2000).
Takahashi, K. & Nei, M. Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. Mol. Biol. Evol. 17, 1251–1258 (2000).
Farris, J. S. Methods for computing Wagner trees. Syst. Zool. 19, 83–92 (1970).
Fitch, W. M. Toward defining the course of evolution: minimal change for a specific tree topology. Syst. Zool. 20, 406–416 (1971).
Kluge, A. G. & Farris, J. S. Quantitative phyletics and the evolution of anurans. Syst. Zool. 18, 1–32 (1969).
Felsenstein, J. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27, 401–410 (1978). A seminal paper that reported the phenomenon of long-branch attraction.
Hillis, D. M. Inferring complex phylogenies. Nature 383, 130–131 (1996).
Kim, J. H. General inconsistency conditions for maximum parsimony: effects of branch lengths and increasing numbers of taxa. Syst. Biol. 45, 363–374 (1996).
Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).
Whelan, S., Lio, P. & Goldman, N. Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends Genet. 17, 262–272 (2001).
Edwards, A. W. F. Likelihood (Oxford Univ. Press, Oxford, UK, 1972).
Rogers, J. S. & Swofford, D. L. A fast method for approximating maximum likelihoods of phylogenetic trees from nucleotide sequences. Syst. Biol. 47, 77–89 (1998).
Efron, B. Bootstrap methods: another look at the jackknife. Annals Stat. 7, 1–26 (1979).
Felsenstein, J. Confidence intervals on phylogenies: an approach using the bootstrap. Evolution 39, 783–791 (1985).
Goldman, N., Anderson, J. P. & Rodrigo, A. G. Likelihood-based tests of topologies in phylogenetics. Syst. Biol. 49, 652–670 (2000). A useful taxonomy of the hypothesis-testing approaches for likelihood-based phylogenetics.
Hillis, D. M. & Bull, J. J. An empirical test of bootstrapping as a methods for assessing confidence in phylogenetic analysis. Syst. Biol. 42, 182–192 (1993).
Zharkikh, A. & Li, W. -H. Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. J. Mol. Evol. 9, 1119–1147 (1992).
Felsenstein, J. & Kishino, H. Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Syst. Biol. 42, 193–200 (1993).
Efron, B., Halloran, E. & Holmes, S. Bootstrap confidence levels for phylogenetic trees. Proc. Natl Acad. Sci. USA 93, 13429–13434 (1996).
Swofford, D. L. et al. Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Syst. Biol. 50, 525–539 (2001). A recent contribution to the debate concerning parsimony and likelihood.
Huelsenbeck, J. P., Ronquist, F., Nielsen, R. & Bollback, J. P. Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294, 2310–2314 (2001). A discussion of the promise that Bayesian phylogenetics holds for transforming evolutionary biology.
Huelsenbeck, J. P. & Ronquist, F. R. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001).
Larget, B. & Simon, D. L. Markov Chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16, 750–759 (1999).
Li, S., Pearl, D. K. & Doss, H. Phylogenetic tree construction using Markov Chain Monte Carlo. J. Am. Stat. Assoc. 95, 493–508 (2000).
Rannala, B. & Yang, Z. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J. Mol. Evol. 43, 304–311 (1996).
Yang, Z. H. & Rannala, B. Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo method. Mol. Biol. Evol. 14, 717–724 (1997).
Carlin, B. P. & Louis, T. A. (eds) Bayes and Empirical Bayes Methods for Data Analysis (Chapman and Hall/CRC, Boca Raton, 2000).
Thorne, J. L., Kishino, H. & Painter, I. S. Estimating the rate of evolution of the rate of molecular evolution. Mol. Biol. Evol. 15, 1647–1657 (1998).
Kishino, H., Thorne, J. L. & Bruno, W. J. Performance of a divergence time estimation method under a probabilistic model of rate evolution. Mol. Biol. Evol. 18, 352–361 (2001).
Huelsenbeck, J. P., Larget, B. & Swofford, D. L. A compound Poisson process for relaxing the molecular clock. Genetics 154, 1879–1892 (2000).
Aris-Brosou, S. & Yang, Z. Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal RNA phylogeny. Syst. Biol. 51, 703–714 (2002).
Zuckerkandl, E. & Pauling, L. in Horizons in Biochemistry (eds Kasha, M. & Pullman, B.) 189–225 (Academic Press, New York, 1962).
Kim, J. Geometry of phylogenetic estimation. Mol. Phylogenet. Evol. 17, 58–75 (2000).
Hastings, W. K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970).
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953). References 43 and 44 present the Metropolis–Hastings algorithm that is the underpinning of many implementations of MCMC.
Raftery, A. in Markov Chain Monte Carlo in Practice (eds Gilks, W. R., Richardson, S. & Spiegelhalter, D. J.) 163–187 (Chapman and Hall, New York, 1995).
Consortium, M. G. S. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
Chang, B. S. W., Jonsson, K., Kazmi, M. A., Donoghue, M. J. & Sakmar, T. P. Recreating a functional ancestral archosaur visual pigment. Mol. Biol. Evol. 19, 1483–1489 (2002).
Pupko, T., Pe'er, I., Hasegawa, M., Grauer, D. & Friedman, N. A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites: application to the evolution of five gene families. Bioinformatics 18, 1116–1123 (2002).
Bush, R. M., Bender, C. A., Subbarao, K., Cox, N. J. & Fitch, W. M. Predicting the evolution of human influenza A. Science 286, 1921–1925 (1999).
Nielsen, R. & Huelsenbeck, J. P. in Pacific Symposium on Biocomputing (eds Altman, R. B., Dunker, A. K., Hunter, L., Lauderdale, K. & Klein, T. E.) 576–588 (World Scientific, Singapore, 2002).
Anisimova, M., Bielawski, J. P. & Yang, Z. H. Accuracy and power of Bayes prediction of amino acid sites under positive selection. Mol. Biol. Evol. 19, 950–958 (2002).
Suchard, M. A., Weiss, R. E., Dorman, K. S. & Sinsheimer, J. S. Oh brother, where art thou? A Bayes factor test for recombination with uncertain heritage. Syst. Biol. 51, 715–728 (2002).
Fleming, M. A., Potter, J. D., Ramirez, C. J., Ostrander, G. K. & Ostrander, E. A. Understanding missense mutations in the BRCA1 gene: an evolutionary approach. Proc. Natl Acad. Sci. USA 100, 1151–1156 (2001).
Hughes, J. M., Peters, C. J., Cohen, M. L. & Mahy, B. W. Hantavirus pulmonary syndrome: an emerging infections disease. Science 262, 850–851 (1993).
Thorne, J. L., Kishino, H. & Felsenstein, J. Inching toward reality: an improved likelihood model of sequence evolution. J. Mol. Evol. 34, 3–16 (1992).
Thorne, J. L., Kishino, H. & Felsenstein, J. An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33, 114–124 (1991).
Mitchison, G. J. A probabilistic treatment of phylogeny and sequence alignment. J. Mol. Evol. 49, 11–22 (1999).
Holmes, I. & Bruno, W. J. Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17, 803–820 (2001).
Lee, M. S. Y. Unalignable sequences and molecular evolution. Trends Ecol. Evol. 16, 681–685 (2001).
Posada, D. & Crandall, K. A. Selecting the best-fit model of nucleotide substitution. Syst. Biol. 50, 580–601 (2001).
Goldman, N. & Whelan, S. Statistical tests of γ-distributed rate heterogeneity in models of sequence evolution in phylogenetics. Mol. Biol. Evol. 17, 974–978 (2000).
Ota, R., Waddell, P. J., Hasegawa, M., Shimodaira, H. & Kishino, H. Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters. Mol. Biol. Evol. 17, 798–803 (2000).
Suchard, M. A., Weiss, R. E. & Sinsheimer, J. S. Bayesian selection of continuous-time Markov chain evolutionary models. Mol. Biol. Evol. 18, 1001–1013 (2001).
Lewis, P. O. A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. Mol. Biol. Evol. 15, 277–283 (1998).
Matsuda, H. in Pacific Symposium on Biocomputing (eds Hunter, L. & Klein, T. E.) 512–523 (World Scientific, London, 1996).
Lemmon, A. R. & Milinkovitch, M. C. The metapopulation genetic algorithm: an efficient solution for the problem of large phylogeny estimation. Proc. Natl Acad. Sci. USA 99, 10516–10521 (2002).
Salter, L. A. & Pearl, D. K. Stochastic search strategy for estimation of maximum likelihood phylogenetic trees. Syst. Biol. 50, 7–17 (2001).
Nixon, K. C. The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15, 407–414 (1999).
Acknowledgements
This manuscript was greatly improved by comments from three anonymous reviewers. The authors gratefully acknowledge the financial support provided by a grant from the Alfred P. Sloan Foundation/National Science Foundation awarded to P.O.L.
Author information
Authors and Affiliations
Glossary
- PHYLOGENETIC TREE
-
A graph depicting the ancestor–descendant relationships between organisms or gene sequences. The sequences are the tips of the tree. Branches of the tree connect the tips to their (unobservable) ancestral sequences.
- SYSTEMATICS
-
The biological discipline that is devoted to characterizing the diversity of life and organizing our knowledge about this diversity (primarily through estimating the phylogenetic relationships between organisms).
- BAYESIAN
-
A branch of statistics that focuses on the posterior probability of hypotheses. The posterior probability is proportional to the product of the prior probability and the likelihood.
- PARSIMONY
-
In systematics, parsimony refers to choosing between trees on the basis of which one requires the fewest possible mutations to explain the data.
- CONSENSUS METHOD
-
A summary of a set of trees in which branches that are not in most of the trees are collapsed to indicate uncertainty.
- AGREEMENT SUBTREES
-
A tree containing the largest subset of sequences for which the relationships among sequences are invariant across all the phylogenies included.
- LIKELIHOOD
-
The probability of the data given the model and tree hypothesis. The likelihood measures how well the data agrees with the predictions made by the model and tree hypothesis.
- TRANSITION
-
A mutation between two pyrimidines (T↔C) or two purines (A↔G).
- TRANSVERSION
-
A mutation between a pyrimidine and a purine (A↔C, A↔T, G↔C or G↔T).
- PRIOR PROBABILITY
-
(The 'prior'). The probability of a hypothesis (or parameter value) without reference to the available data. Priors can be derived from first principles, or based on general knowledge or previous experiments.
- BAYES FACTORS
-
The ratio of the posterior odds to the prior odds for two hypotheses of interest. Bayes factors attempt to measure how strongly the data support or refute a hypothesis.
Rights and permissions
About this article
Cite this article
Holder, M., Lewis, P. Phylogeny estimation: traditional and Bayesian approaches. Nat Rev Genet 4, 275–284 (2003). https://doi.org/10.1038/nrg1044
Issue Date:
DOI: https://doi.org/10.1038/nrg1044
This article is cited by
-
Statistical evaluation of character support reveals the instability of higher-level dinosaur phylogeny
Scientific Reports (2023)
-
New Complex of Cryptic Species Discovered in Genus Biblis (Papilionoidea: Nymphalidae: Biblidinae) in Mexico
Neotropical Entomology (2022)
-
Paraphyly of the genus Boehmeria (Urticaceae): a response to Liang et al. ‘Relationships among Chinese Boehmeria species and the evolution of various clade’
Plant Systematics and Evolution (2021)