The rapid accumulation of genome sequence data has made phylogenetics an indispensable tool to various branches of biology. However, it has also posed considerable statistical and computational challenges to data analysis.
Distance, parsimony, likelihood and Bayesian methods of phylogenetic analysis have different strengths and weaknesses. Although distance methods are good for large data sets of highly similar sequences, likelihood and Bayesian methods often have more power and are more robust, especially for inferring deep phylogenies.
Assessing phylogenetic uncertainty remains a difficult statistical problem.
Data partitioning may have an important influence on the phylogenetic analysis of genome-scale data sets.
Systematic biases, such as long-branch attraction, may be more important than random sampling errors in the analysis of genomic-scale data sets.
Phylogenies are important for addressing various biological questions such as relationships among species or genes, the origin and spread of viral infection and the demographic changes and migration patterns of species. The advancement of sequencing technologies has taken phylogenetic analysis to a new height. Phylogenies have permeated nearly every branch of biology, and the plethora of phylogenetic methods and software packages that are now available may seem daunting to an experimental biologist. Here, we review the major methods of phylogenetic analysis, including parsimony, distance, likelihood and Bayesian methods. We discuss their strengths and weaknesses and provide guidance for their use.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Maximum likelihood pandemic-scale phylogenetics
Nature Genetics Open Access 10 April 2023
Comparative phylogenomic insights of KCS and ELO gene families in Brassica species indicate their role in seed development and stress responsiveness
Scientific Reports Open Access 02 March 2023
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
Maser, P. et al. Phylogenetic relationships within cation transporter families of Arabidopsis. Plant Physiol. 126, 1646–1667 (2001).
Edwards, S. V. Is a new and general theory of molecular systematics emerging? Evolution 63, 1–19 (2009).
Marra, M. A. et al. The genome sequence of the SARS-associated coronavirus. Science 300, 1399–1404 (2003).
Grenfell, B. T. et al. Unifying the epidemiological and evolutionary dynamics of pathogens. Science 303, 327–332 (2004).
Salipante, S. J. & Horwitz, M. S. Phylogenetic fate mapping. Proc. Natl Acad. Sci. USA 103, 5448–5453 (2006).
Gray, R. D., Drummond, A. J. & Greenhill, S. J. Language phylogenies reveal expansion pulses and pauses in pacific settlement. Science 323, 479–483 (2009).
Brady, A. & Salzberg, S. PhymmBL expanded: confidence scores, custom databases, parallelization and more. Nature Methods 8, 367 (2011).
Kellis, M., Patterson, N., Endrizzi, M., Birren, B. & Lander, E. S. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254 (2003).
Pedersen, J. S. et al. Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput. Biol. 2, e33 (2006).
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011).
Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G. & Siepel, A. Bayesian inference of ancient human demography from individual genome sequences. Nature Genet. 43, 1031–1034 (2011).
Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).
Paten, B. et al. Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res. 18, 1829–1843 (2008).
Ma, J. Reconstructing the history of large-scale genomic changes: biological questions and computational challenges. J. Comput. Biol. 18, 879–893 (2011).
Kingman, J. F. C. On the genealogy of large populations. J. Appl. Probab. 19A, 27–43 (1982).
Kingman, J. F. C. The coalescent. Stoch. Process. Appl. 13, 235–248 (1982).
Edwards, S. V., Liu, L. & Pearl, D. K. High-resolution species trees without concatenation. Proc. Natl Acad. Sci. USA 104, 5936–5941 (2007). This paper introduces a method for estimating the species tree despite the presence of conflicting gene trees.
Than, C. & Nakhleh, L. Species tree inference by minimizing deep coalescences. PLoS Comput. Biol. 5, e1000501 (2009).
Rannala, B. & Yang, Z. Phylogenetic inference using whole genomes. Annu. Rev. Genomics Hum. Genet. 9, 217–231 (2008).
Felsenstein, J. Phylogenies and the comparative method. Am. Nat. 125, 1–15 (1985). This paper introduces the bootstrap approach to phylogenetic analysis. This is the most commonly used method for assessing sampling errors in estimated phylogenies.
Yang, Z. in Handbook of Statistical Genetics (eds Balding, D., Bishop, M. & Cannings, C.) 377–406 (Wiley, New York, 2007).
Felsenstein, J. Inferring Phylogenies (Sinauer Associates, Sunderland, Massachusetts, 2004).
Yang, Z. Computational Molecular Evolution (Oxford Univ. Press, UK, 2006).
Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).
Jukes, T. H. & Cantor, C. R. in Mammalian Protein Metabolism (ed. Munro, H. N.) 21–123 (Academic Press, New York, 1969).
Kimura, M. A simple method for estimating evolutionary rate of base substitution through comparative studies of nucleotide sequences. J. Mol. Evol. 16, 111–120 (1980).
Hasegawa, M., Kishino, H. & Yano, T. Dating the human–ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22, 160–174 (1985).
Tavaré, S. Some probabilistic and statistical problems on the analysis of DNA sequences. Lect. Math. Life Sci. 17, 57–86 (1986).
Yang, Z. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39, 105–111 (1994).
Yang, Z. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10, 1396–1401 (1993).
Cavalli-Sforza, L. L. & Edwards, A. W. F. Phylogenetic analysis: models and estimation procedures. Evolution 21, 550–570 (1967).
Fitch, W. M. & Margoliash, E. Construction of phylogenetic trees. Science 155, 279–284 (1967).
Rzhetsky, A. & Nei, M. A simple method for estimating and testing minimum-evolution trees. Mol. Biol. Evol. 9, 945–967 (1992).
Desper, R. & Gascuel, O. Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J. Comput. Biol. 9, 687–705 (2002).
Gascuel, O. & Steel, M. Neighbor-joining revealed. Mol. Biol. Evol. 23, 1997–2000 (2006).
Tamura, K. et al. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739 (2011).
Bruno, W. J., Socci, N. D. & Halpern, A. L. Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol. Biol. Evol. 17, 189–197 (2000).
Fitch, W. M. Toward defining the course of evolution: minimum change for a specific tree topology. Syst. Zool. 20, 406–416 (1971).
Hartigan, J. A. Minimum evolution fits to a given tree. Biometrics 29, 53–65 (1973).
Swofford, D. L. PAUP*: Phylogenetic Analysis by Parsimony (and Other Methods)4.0 Beta (Sinauer Associates, Massachusetts, 2000).
Goloboff, P. A., Farris, J. S. & Nixon, K. C. TNT, a free program for phylogenetic analysis. Cladistics 24, 774–786 (2008).
Felsenstein, J. Cases in which parsimony and compatibility methods will be positively misleading. Syst. Zool. 27, 401–410 (1978).
Huelsenbeck, J. P. Systematic bias in phylogenetic analysis: is the Strepsiptera problem solved? Syst. Biol. 47, 519–537 (1998).
Swofford, D. L. et al. Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Syst. Biol. 50, 525–539 (2001).
Yang, Z. Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol. Evol. 11, 367–372 (1996).
Philippe, H. et al. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470, 255–258 (2011).
Zhong, B. et al. Systematic error in seed plant phylogenomics. Genome Biol. Evol. 3, 1340–1348 (2011).
Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981). This paper introduces the pruning algorithm for likelihood calculation on a tree. This approach forms the basis for modern likelihood and Bayesian methods of phylogenetic analysis.
Yang, Z. Phylogenetic analysis using parsimony and likelihood methods. J. Mol. Evol. 42, 294–307 (1996).
Felsenstein, J. Phylip: Phylogenetic Inference Program, Version 3.6. (Univ. of Washington, Seattle, 2005).
Adachi, J. & Hasegawa, M. MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood. Comput. Sci. Monogr. 28, 1–150 (1996).
Guindon, S. & Gascuel, O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003).
Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006).
Zwickl, D. Genetic Algorithm Approaches for the Phylogenetic Analysis of Large Biological Sequence Datasets Under the Maximum Likelihood Criterion. Thesis, Univ. Texas at Austin (2006).
Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994).
Lartillot, N. & Philippe, H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, 1095–1109 (2004).
Blanquart, S. & Lartillot, N. A site- and time-heterogeneous model of amino acid replacement. Mol. Biol. Evol. 25, 842–858 (2008).
Goldman, N. Statistical tests of models of DNA substitution. J. Mol. Evol. 36, 182–198 (1993).
Zuckerkandl, E. & Pauling, L. in Evolving Genes and Proteins (eds Bryson, V. & Vogel, H. J.) 97–166 (Academic Press, New York, 1965).
Nielsen, R. & Yang, Z. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148, 929–936 (1998).
Yang, Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15, 568–573 (1998).
Yang, Z. & Nielsen, R. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19, 908–917 (2002).
Huelsenbeck, J. P. & Rannala, B. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276, 227–232 (1997).
Whelan, S., Liò, P. & Goldman, N. Molecular phylogenetics: state of the art methods for looking into the past. Trends Genet. 17, 262–272 (2001).
Rannala, B. & Yang, Z. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J. Mol. Evol. 43, 304–311 (1996).
Yang, Z. & Rannala, B. Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo Method. Mol. Biol. Evol. 14, 717–724 (1997).
Mau, B. & Newton, M. A. Phylogenetic inference for binary data on dendrograms using Markov chain Monte Carlo. J. Comput. Graph. Stat. 6, 122–131 (1997).
Li, S., Pearl, D. & Doss, H. Phylogenetic tree reconstruction using Markov chain Monte Carlo. J. Am. Stat. Assoc. 95, 493–508 (2000).
Larget, B. & Simon, D. L. Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16, 750–759 (1999).
Huelsenbeck, J. P. & Ronquist, F. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001).
Drummond, A. J., Ho, S. Y. W., Phillips, M. J. & Rambaut, A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 4, e88 (2006). This paper introduces a Bayesian MCMC algorithm (the BEAST program) to estimate rooted trees under relaxed-clock models.
Felsenstein, J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791 (1985).
Felsenstein, J. & Kishino, H. Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Syst. Biol. 42, 193–200 (1993).
Efron, B., Halloran, E. & Holmes, S. Bootstrap confidence levels for phylogenetic trees. Proc. Natl Acad. Sci. USA 93, 7085–7090 (1996); corrected article Proc. Natl Acad. Sci. USA 93, 13429–13434 (1996).
Berry, V. & Gascuel, O. On the interpretation of bootstrap trees: appropriate threshold of clade selection and induced gain. Mol. Biol. Evol. 13, 999–1011 (1996).
Susko, E. First-order correct bootstrap support adjustments for splits that allow hypothesis testing when using maximum likelihood estimation. Mol. Biol. Evol. 27, 1621–1629 (2010).
Suzuki, Y., Glazko, G. V. & Nei, M. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc. Natl Acad. Sci. USA 99, 16138–16143 (2002).
Lewis, P. O., Holder, M. T. & Holsinger, K. E. Polytomies and Bayesian phylogenetic inference. Syst. Biol. 54, 241–253 (2005).
Yang, Z. & Rannala, B. Branch-length prior influences Bayesian posterior probability of phylogeny. Syst. Biol. 54, 455–470 (2005).
Huelsenbeck, J. P. & Rannala, B. Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Syst. Biol. 53, 904–913 (2004).
Brown, J. M., Hedtke, S. M., Lemmon, A. R. & Lemmon, E. M. When trees grow too long: investigating the causes of highly inaccurate Bayesian branch-length estimates. Syst. Biol. 59, 145–161 (2010).
Rannala, B., Zhu, T. & Yang, Z. Tail paradox, partial identifiability and influential priors in Bayesian branch length inference. Mol. Biol. Evol. 29, 325–335 (2012).
Zhang, C., Rannala, B. & Yang, Z. Robustness of compound Dirichlet priors for Bayesian inference of branch lengths. Syst. Biol. 10 Feb 2012 (doi: 10.1093/sysbio/sys030).
Suchard, M. & Rambaut, A. Many-core algorithms for statistical phylogenetics. Bioinformatics 25, 1370–1376 (2009).
Zierke, S. & Bakos, J. FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods. BMC Bioinform. 11, 184 (2010).
Bininda-Emonds, O. R. P. Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life (Kluwer Academic, the Netherlands, 2004).
de Queiroz, A. & Gatesy, J. The supermatrix approach to systematics. Trends Ecol. Evol. 22, 34–41 (2007).
Yang, Z. Maximum-likelihood models for combined analyses of multiple sequence data. J. Mol. Evol. 42, 587–596 (1996).
Shapiro, B., Rambaut, A. & Drummond, A. J. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol. Biol. Evol. 23, 7–9 (2006).
Ren, F., Tanaka, H. & Yang, Z. A likelihood look at the supermatrix–supertree controversy. Gene 441, 119–125 (2009).
Criscuolo, A., Berry, V., Douzery, E. J. & Gascuel, O. SDM: a fast distance-based approach for (super) tree building in phylogenomics. Syst. Biol. 55, 740–755 (2006).
Wiens, J. J. & Moen, D. S. Missing data and the accuracy of Bayesian phylogenetics. J. Syst. Evol. 46, 307–314 (2008).
Dwivedi, B. & Gadagkar, S. Phylogenetic inference under varying proportions of indel-induced alignment gaps. BMC Evol. Biol. 9, 1471–2148 (2009).
Rodrigue, N., Philippe, H. & Lartillot, N. Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles. Proc. Natl Acad. Sci. USA 107, 4629–4634 (2010).
Pagel, M. & Meade, A. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. 53, 571–581 (2004).
Nishihara, H., Okada, N. & Hasegawa, M. Rooting the Eutherian tree — the power and pitfalls of phylogenomics. Genome Biol. 8, R199 (2007).
Leigh, J. W., Susko, E., Baumgartner, M. & Roger, A. J. Testing congruence in phylogenomic analysis. Syst. Biol. 57, 104–115 (2008).
Higgins, D. G. & Sharp, P. M. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73, 237–244 (1988).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Löytynoja, A. & Goldman, N. An algorithm for progressive multiple alignment of sequences with insertions. Proc. Natl Acad. Sci. USA 102, 10557–10562 (2005).
Löytynoja, A. & Goldman, N. Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320, 1632–1635 (2008).
Thorne, J. L., Kishino, H. & Felsenstein, J. An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33, 114–124 (1991); erratum J. Mol. Evol. 34, 91 (1992).
Hein, J., Jensen, J. L. & Pedersen, C. N. Recursions for statistical multiple alignment. Proc. Natl Acad. Sci. USA 100, 14960–14965 (2003).
Redelings, B. D. & Suchard, M. A. Joint Bayesian estimation of alignment and phylogeny. Syst. Biol. 54, 401–418 (2005).
Lunter, G., Miklos, I., Drummond, A., Jensen, J. L. & Hein, J. Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 6, 83 (2005).
Thorne, J. L., Kishino, H. & Painter, I. S. Estimating the rate of evolution of the rate of molecular evolution. Mol. Biol. Evol. 15, 1647–1657 (1998). This paper describes the first Bayesian MCMC method for dating species divergence using minimum and maximum bounds to incorporate fossil calibrations.
Kishino, H., Thorne, J. L. & Bruno, W. J. Performance of a divergence time estimation method under a probabilistic model of rate evolution. Mol. Biol. Evol. 18, 352–361 (2001).
Rannala, B. & Yang, Z. Inferring speciation times under an episodic molecular clock. Syst. Biol. 56, 453–466 (2007).
Yang, Z. & Rannala, B. Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol. Biol. Evol. 23, 212–226 (2006).
Inoue, J., Donoghue, P. C. H. & Yang, Z. The impact of the representation of fossil calibrations on Bayesian estimation of species divergence times. Syst. Biol. 59, 74–89 (2010).
Tavaré, S., Marshall, C. R., Will, O., Soligos, C. & Martin, R. D. Using the fossil record to estimate the age of the last common ancestor of extant primates. Nature 416, 726–729 (2002).
Wilkinson, R. D. et al. Dating primate divergences through an integrated analysis of palaeontological and molecular data. Syst. Biol. 60, 16–31 (2011).
Knowles, L. L. Statistical phylogeography. Annu. Rev. Ecol. Syst. 40, 593–612 (2009).
Lemey, P., Rambaut, A., Drummond, A. J. & Suchard, M. A. Bayesian phylogeography finds its roots. PLoS Comp. Biol. 5, e1000520 (2009).
Lemey, P., Rambaut, A., Welch, J. J. & Suchard, M. A. Phylogeography takes a relaxed random walk in continuous space and time. Mol. Biol. Evol. 27, 1877–1885 (2010).
Takahata, N., Satta, Y. & Klein, J. Divergence time and population size in the lineage leading to modern humans. Theor. Popul. Biol. 48, 198–221 (1995).
Rannala, B. & Yang, Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656 (2003). This study describes the multi-species coalescent model. This is the basis for carrying out comparative analyses of individual genomes and phylogeographic studies and for applying species tree methods.
Drummond, A. J., Nicholls, G. K., Rodrigo, A. G. & Solomon, W. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161, 1307–1320 (2002).
Hey, J. Isolation with migration models for more than two populations. Mol. Biol. Evol. 27, 905–920 (2010).
Knowles, L. L. & Carstens, B. C. Delimiting species without monophyletic gene trees. Syst. Biol. 56, 887–895 (2007).
Yang, Z. & Rannala, B. Bayesian species delimitation using multilocus sequence data. Proc. Natl Acad. Sci. USA 107, 9264–9269 (2010). This paper describes a Bayesian MCMC method for delimiting species using sequence data from multiple loci under the multi-species coalescent model.
Rohland, N. et al. Genomic DNA sequences from mastodon and woolly mammoth reveal deep speciation of forest and savanna elephants. PLoS Biol. 8, e1000564 (2010).
Bos, K. I. et al. A draft genome of Yersinia pestis from victims of the Black Death. Nature 478, 506–510 (2011).
Patterson, N., Richter, D. J., Gnerre, S., Lander, E. S. & Reich, D. Genetic evidence for complex speciation of humans and chimpanzees. Nature 441, 1103–1108 (2006).
Innan, H. & Watanabe, H. The effect of gene flow on the coalescent time in the human–chimpanzee ancestral population. Mol. Biol. Evol. 23, 1040–1047 (2006).
Becquet, C. & Przeworski, M. A new approach to estimate parameters of speciation models with application to apes. Genome Res. 17, 1505–1519 (2007).
Hobolth, A., Christensen, O. F., Mailund, T. & Schierup, M. H. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 3, e7 (2007).
Burgess, R. & Yang, Z. Estimation of hominoid ancestral population sizes under Bayesian coalescent models incorporating mutation rate variation and sequencing errors. Mol. Biol. Evol. 25, 1979–1994 (2008).
Becquet, C. & Przeworski, M. Learning about modes of speciation by computational approaches. Evolution 63, 2547–2562 (2009).
Yang, Z. A likelihood ratio test of speciation with gene flow using genomic sequence data. Genome Biol. Evol. 2, 200–211 (2010).
Reich, D. et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053–1060 (2010).
Sitnikova, T., Rzhetsky, A. & Nei, M. Interior-branch and bootstrap tests of phylogenetic trees. Mol. Biol. Evol. 12, 319–333 (1995).
Zhong, B., Yonezawa, T., Zhong, Y. & Hasegawa, M. The position of gnetales among seed plants: overcoming pitfalls of chloroplast phylogenomics. Mol. Biol. Evol. 27, 2855–2863 (2010).
Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007).
Kosakovsky Pond, S. L., Frost, S. D. W. & Muse, S. V. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21, 676–679 (2005).
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Lartillot, N. & Philippe, H. Computing Bayes factors using thermodynamic integration. Syst. Biol. 55, 195–207 (2006).
Xie, W., Lewis, P. O., Fan, Y., Kuo, L. & Chen, M.-H. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst. Biol. 60, 150–160 (2011).
We thank the three referees for their constructive comments and M. Hasegawa and B. Zhong for providing the seed-plant phylogenies of Fig. 3. Z.Y. is supported by a UK Biotechnology and Biological Sciences Research Council grant and a Royal Society Wolfson Research Merit Award. B.R. is supported by a US National Institutes of Health grant.
The authors declare no competing financial interests.
The inference of phylogenetic relationships among species and the use of such information to classify species.
The description, classification and naming of species.
The process of joining ancestral lineages when the genealogical relationships of a random sample of sequences from a modern population are traced back.
- Gene trees
The phylogenetic or genealogical tree of sequences at a gene locus or genomic region.
- Statistical phylogeography
The statistical analysis of population data from closely related species to infer population parameters and processes such as population sizes, demography, migration patterns and rates.
- Species tree
A phylogenetic tree for a set of species that underlies the gene trees at individual loci.
- Systematic errors
Errors that are due to an incorrect model assumption. They are exacerbated when the data size increases.
- Random sampling errors
Errors or uncertainties in parameter estimates owing to limited data.
- Cluster algorithm
An algorithm of assigning a set of individuals to groups (or clusters) so that objects of the same cluster are more similar to each other than those from different clusters. Hierarchical cluster analysis can be agglomerative (starting with single elements and successively joining them into clusters) or divisive (starting with all objects and successively dividing them into partitions).
- Markov chain
A stochastic sequence (or chain) of states with the property that, given the current state, the probabilities for the next state do not depend on the past states.
Substitutions between the two pyrimidines (T↔C) or between the two purines (A↔G).
Substitutions between a pyrimidine and a purine (T or C↔A or G).
- Unrooted trees
Phylogenetic trees for which the location of the root is unspecified.
- Long-branch attraction
The phenomenon of inferring an incorrect tree with long branches grouped together by parsimony or by model-based methods under simplistic models.
- Likelihood ratio test
A general hypothesis-testing method that uses the likelihood to compare two nested hypotheses, often using the χ2 distribution to assess significance.
- Molecular clock
The hypothesis or observation that the evolutionary rate is constant over time or across lineages.
- Prior distribution
The distribution assigned to parameters before the analysis of the data.
- Posterior distribution
The distribution of the parameters (or models) conditional on the data. It combines the information in the prior and in the data (likelihood).
- Markov chain Monte Carlo algorithms
(MCMC algorithms). A Monte Carlo simulation is a computer simulation of a biological process using random numbers. An MCMC algorithm is a Monte Carlo simulation algorithm that generates a sample from a target distribution (often a Bayesian posterior distribution).
Groups of species that have descended from a common ancestor.
- Graphical processing units
(GPU). Specialized units that are traditionally used to manipulate output on a video display and have recently been explored for use in parallel computation.
Rights and permissions
About this article
Cite this article
Yang, Z., Rannala, B. Molecular phylogenetics: principles and practice. Nat Rev Genet 13, 303–314 (2012). https://doi.org/10.1038/nrg3186
This article is cited by
Comparative phylogenomic insights of KCS and ELO gene families in Brassica species indicate their role in seed development and stress responsiveness
Scientific Reports (2023)
Maximum likelihood pandemic-scale phylogenetics
Nature Genetics (2023)
From Sequence Analysis to Application