Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous

Abstract

All inferences in comparative biology depend on accurate estimates of evolutionary relationships. Recent phylogenetic analyses have turned away from maximum parsimony towards the probabilistic techniques of maximum likelihood and bayesian Markov chain Monte Carlo (BMCMC). These probabilistic techniques represent a parametric approach to statistical phylogenetics, because their criterion for evaluating a topology—the probability of the data, given the tree—is calculated with reference to an explicit evolutionary model from which the data are assumed to be identically distributed. Maximum parsimony can be considered nonparametric, because trees are evaluated on the basis of a general metric—the minimum number of character state changes required to generate the data on a given tree—without assuming a specific distribution1. The shift to parametric methods was spurred, in large part, by studies showing that although both approaches perform well most of the time2, maximum parsimony is strongly biased towards recovering an incorrect tree under certain combinations of branch lengths, whereas maximum likelihood is not3,4,5,6. All these evaluations simulated sequences by a largely homogeneous evolutionary process in which data are identically distributed. There is ample evidence, however, that real-world gene sequences evolve heterogeneously and are not identically distributed7,8,9,10,11,12,13,14,15,16. Here we show that maximum likelihood and BMCMC can become strongly biased and statistically inconsistent when the rates at which sequence sites evolve change non-identically over time. Maximum parsimony performs substantially better than current parametric methods over a wide range of conditions tested, including moderate heterogeneity and phylogenetic problems not normally considered difficult.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Likelihood-based methods are less accurate than maximum parsimony (MP) under heterogeneous conditions.
Figure 2: Parsimony outperforms likelihood over a wide range of heterotachous conditions.
Figure 3: Maximum parsimony is more accurate than likelihood methods when techniques to improve phylogenetic performance are used.
Figure 4: Poor maximum likelihood performance is due to assuming homogeneous branch lengths.

Similar content being viewed by others

References

  1. Sanderson, M. J. & Kim, J. Parametric phylogenetics? Syst. Biol. 49, 817–829 (2000)

    Article  CAS  Google Scholar 

  2. Hillis, D. M., Huelsenbeck, J. P. & Cunningham, C. W. Application and accuracy of molecular phylogenies. Science 264, 671–677 (1994)

    Article  ADS  CAS  Google Scholar 

  3. Felsenstein, J. Cases in which parsimony and compatibility methods will be positively misleading. Syst. Zool. 27, 401–410 (1978)

    Article  Google Scholar 

  4. Kuhner, M. K. & Felsenstein, J. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11, 459–468 (1994)

    CAS  PubMed  Google Scholar 

  5. Huelsenbeck, J. P. Systematic bias in phylogenetic analysis: is the Strepsiptera problem solved? Syst. Biol. 47, 519–537 (1998)

    CAS  PubMed  Google Scholar 

  6. Gaut, B. S. & Lewis, P. O. Success of maximum likelihood phylogeny inference in the four-taxon case. Mol. Biol. Evol. 12, 152–162 (1995)

    Article  CAS  Google Scholar 

  7. Huelsenbeck, J. P. Testing a covariotide model of DNA substitution. Mol. Biol. Evol. 19, 698–707 (2002)

    Article  CAS  Google Scholar 

  8. Miyamoto, M. M. & Fitch, W. M. Testing the covarion hypothesis of molecular evolution. Mol. Biol. Evol. 12, 503–513 (1995)

    CAS  PubMed  Google Scholar 

  9. Lopez, P., Casane, D. & Philippe, H. Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19, 1–7 (2002)

    Article  CAS  Google Scholar 

  10. Fitch, W. M. The molecular evolution of cytochrome c in eukaryotes. J. Mol. Evol. 8, 13–40 (1976)

    Article  ADS  CAS  Google Scholar 

  11. Pollock, D. D., Taylor, W. R. & Goldman, N. Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol. 287, 187–198 (1999)

    Article  CAS  Google Scholar 

  12. Pupko, T. & Galtier, N. A covarion-based method for detecting molecular adaptation: application to the evolution of primate mitochondrial genomes. Proc. R. Soc. Lond. B 269, 1313–1316 (2002)

    Article  CAS  Google Scholar 

  13. Inagaki, Y., Susko, E., Fast, N. M. & Roger, A. J. Covarion shifts cause a long-branch attraction artifact that unites Microsporidia and Archaebacteria in EF-1{alpha} phylogenies. Mol. Biol. Evol. 21, 1340–1349 (2004)

    Article  CAS  Google Scholar 

  14. Lockhart, P. J. et al. A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol. Biol. Evol. 15, 1183–1188 (1998)

    Article  CAS  Google Scholar 

  15. Misof, B. et al. An empirical analysis of mt 16S rRNA covarion-like evolution in insects: site-specific rate variation is clustered and frequently detected. J. Mol. Evol. 55, 460–469 (2002)

    Article  ADS  CAS  Google Scholar 

  16. Philippe, H. & Lopez, P. On the conservation of protein sequences in evolution. Trends Biochem. Sci. 26, 414–416 (2001)

    Article  CAS  Google Scholar 

  17. Donaldson, T. S. Robustness of the F-test to errors of both kinds and the correlation between the numerator and denominator of the F-ratio. J. Am. Stat. Assoc. 63, 660–676 (1968)

    Google Scholar 

  18. Sullivan, J. & Swofford, D. L. Should we use model-based methods for phylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution pattern are violated? Syst. Biol. 50, 723–729 (2001)

    Article  CAS  Google Scholar 

  19. Hillis, D. M. Taxonomic sampling, phylogenetic accuracy, and investigator bias. Syst. Biol. 47, 3–8 (1998)

    Article  CAS  Google Scholar 

  20. Chang, J. T. Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters. Math. Biosci. 134, 189–215 (1996)

    Article  MathSciNet  CAS  Google Scholar 

  21. Rokas, A., Williams, B. L., King, N. & Carroll, S. B. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425, 798–804 (2003)

    Article  ADS  CAS  Google Scholar 

  22. Russo, C. A., Takezaki, N. & Nei, M. Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny. Mol. Biol. Evol. 13, 525–536 (1996)

    Article  CAS  Google Scholar 

  23. Delarbre, C., Gallut, C., Barriel, V., Janvier, P. & Gachelin, G. Complete mitochondrial DNA of the hagfish, Eptatretus burgeri: the comparative analysis of mitochondrial DNA sequences strongly supports the cyclostome monophyly. Mol. Phylogenet. Evol. 22, 184–192 (2002)

    Article  CAS  Google Scholar 

  24. Naylor, G. J. & Brown, W. M. Amphioxus mitochondrial DNA, chordate phylogeny, and the limits of inference based on comparisons of sequences. Syst. Biol. 47, 61–76 (1998)

    Article  CAS  Google Scholar 

  25. Tuffley, C. & Steel, M. Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull. Math. Biol. 59, 581–607 (1997)

    Article  CAS  Google Scholar 

  26. Swofford, D. L. PAUP*: Phylogenetic Analysis Using Parsimony and Other Methods, v.4.0b10 (Sinauer Associates, Sunderland, Massachusetts, 1998)

    Google Scholar 

  27. Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555–556 (1997)

    CAS  PubMed  Google Scholar 

  28. Huelsenbeck, J. P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001)

    Article  CAS  Google Scholar 

  29. Posada, D. & Crandall, K. A. MODELTEST: testing the model of DNA substitution. Bioinformatics 14, 817–818 (1998)

    Article  CAS  Google Scholar 

  30. Swofford, D. L. et al. Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Syst. Biol. 50, 525–539 (2001)

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank J. Conery for advice, support and programming advice. P. Phillips, R. DeSalle and S. Proulx provided comments and discussion. We benefited from discussions of mixed model methods with D. Zwickl. B.K. was supported by an NSF IGERT training grant in Evolution, Development and Genomics to the University of Oregon.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joseph W. Thornton.

Ethics declarations

Competing interests

The authors declare that they have no competing financial interests.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kolaczkowski, B., Thornton, J. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431, 980–984 (2004). https://doi.org/10.1038/nature02917

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature02917

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing