Inferring ancient divergences requires genes with strong phylogenetic signals

Journal name:
Nature
Volume:
497,
Pages:
327–331
Date published:
DOI:
doi:10.1038/nature12130
Received
Accepted
Published online

Abstract

To tackle incongruence, the topological conflict between different gene trees, phylogenomic studies couple concatenation with practices such as rogue taxon removal or the use of slowly evolving genes. Phylogenomic analysis of 1,070orthologues from 23yeast genomes identified 1,070distinct gene trees, which were all incongruent with the phylogeny inferred from concatenation. Incongruence severity increased for shorter internodes located deeper in the phylogeny. Notably, whereas most practices had little or negative impact on the yeast phylogeny, the use of genes or internodes with high average internode support significantly improved the robustness of inference. We obtained similar results in analyses of vertebrate and metazoan phylogenomic data sets. These results question the exclusive reliance on concatenation and associated practices, and argue that selecting genes with strong phylogenetic signals and demonstrating the absence of significant incongruence are essential for accurately reconstructing ancient divergences.

At a glance

Figures

  1. The yeast species phylogeny recovered from the concatenation analysis of 1,070[thinsp]genes disagrees with every gene tree, despite absolute bootstrap support.
    Figure 1: The yeast species phylogeny recovered from the concatenation analysis of 1,070genes disagrees with every gene tree, despite absolute bootstrap support.

    a, The yeast species phylogeny recovered from concatenation analysis of 1,070genes using maximum likelihood. Asterisks denote internodes that received 100% bootstrap support by the concatenation analysis. Values near internodes correspond to gene-support frequency and internode certainty, respectively. The scale bar is in units of amino-acid substitutions per site. b, The distribution of the agreement between the bipartitions present in the 1,070individual gene trees and the concatenation phylogeny, as well as the distribution of the agreement between the bipartitions present in 1,000randomly generated trees of equal taxon number and the concatenation phylogeny, measured using the normalized Robinson–Foulds tree distance. Average distances between the 1,070gene trees and the concatenation phylogeny, between the 1,070gene trees themselves, and between 1,000randomly generated gene trees that have equal taxon numbers, are also shown. The phylogeny of the 23yeast species analysed in this study is unrooted and contains 20non-trivial bipartitions; because the divergence of Saccharomyces and Candida lineages is well established, the mid-point rooting of the phylogeny is shown for easier visualization.

  2. Differences in yeast phylogenies inferred from different phylogenomic practices.
    Figure 2: Differences in yeast phylogenies inferred from different phylogenomic practices.

    The specific phylogenomic practice tested (Treatment), the average GSF of the internodes of the yeast phylogeny, the tree certainty (TC) of the yeast phylogeny, the numbers of internodes of the yeast phylogeny in which GSF increases or decreases by more than 3% (GSF increases and GSF decreases), and the numbers of internodes of the yeast phylogeny in which internode certainty increases or decreases by more than 0.03 (internode certainty (IC) increases and IC decreases). As the maximum value of internode certainty for a given internode is1, the maximum value of tree certainty for a given phylogeny is the number of internodes, which will equal K3, where K is the number of taxa used. In the analyses concerned with the removal of poorly aligned genes, only genes whose alignment length after gap removal isgreater than or equal to a certain percentage, x, of the original alignment were used. In the analyses concerned with the use of bipartitions, only those bipartitions that displayed bootstrap support greater or equal to 60%,70% or80% in the bootstrap consensus trees of the 1,070genes were used to construct eMRC phylogenies, which were then compared with the default analysis. NA, not applicable.

  3. Incongruence is more prevalent in shorter internodes located deeper on the phylogeny.
    Figure 3: Incongruence is more prevalent in shorter internodes located deeper on the phylogeny.

    The correlation (Pearson’s r) between a measure of internode support (GSF) with internode length and depth was measured for each internode present in three data sets that show lower (vertebrates, 1,086genes), intermediate (yeasts, 1,070genes) and higher (metazoans, 225genes) levels of sequence divergence. a, GSF is positively correlated with internode length in yeasts and metazoans. b, GSF is positively correlated with the root to internode length in all three lineages, indicating that internodes placed deeper in the phylogeny typically have lower GSF. c, GSF is positively correlated with the product of internode length and root to internode length in all three lineages.

References

  1. Dunn, C. W. et al. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452, 745749 (2008)
  2. Rokas, A., Kruger, D. & Carroll, S. B. Animal evolution and the molecular signature of radiations compressed in time. Science 310, 19331938 (2005)
  3. Philippe, H. et al. Phylogenomics revives traditional views on deep animal relationships. Curr. Biol. 19, 706712 (2009)
  4. Schierwater, B. et al. Concatenated analysis sheds light on early metazoan evolution and fuels a modern “urmetazoon” hypothesis. PLoS Biol. 7, e20 (2009)
  5. Regier, J. C. et al. Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences. Nature 463, 10791083 (2010)
  6. Phillips, M. J., Delsuc, F. D. & Penny, D. Genome-scale phylogeny and the detection of systematic biases. Mol. Biol. Evol. 21, 14551458 (2004)
  7. Hess, J. & Goldman, N. Addressing inter-gene heterogeneity in maximum likelihood phylogenomic analysis: yeasts revisited. PLoS ONE 6, e22783 (2011)
  8. Degnan, J. H. & Rosenberg, N. A. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24, 332340 (2009)
  9. Rokas, A. & Carroll, S. B. Bushes in the tree of life. PLoS Biol. 4, e352 (2006)
  10. Philippe, H. et al. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol. 9, e1000602 (2011)
  11. Kocot, K. M. et al. Phylogenomics reveals deep molluscan relationships. Nature 477, 452456 (2011)
  12. Smith, S. A. et al. Resolving the evolutionary relationships of molluscs with phylogenomic tools. Nature 480, 364367 (2011)
  13. Bourlat, S. J. et al. Deuterostome phylogeny reveals monophyletic chordates and the new phylum Xenoturbellida. Nature 444, 8588 (2006)
  14. Delsuc, F., Brinkmann, H., Chourrout, D. & Philippe, H. Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature 439, 965968 (2006)
  15. Huson, D. H. & Bryant, D. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23, 254267 (2006)
  16. Regier, J. C. et al. Resolving arthropod phylogeny: exploring phylogenetic signal within 41 kb of protein-coding nuclear gene sequence. Syst. Biol. 57, 920938 (2008)
  17. Regier, J. C. & Zwick, A. Sources of signal in 62 protein-coding nuclear genes for higher-level phylogenetics of arthropods. PLoS ONE 6, e23408 (2011)
  18. Talavera, G. & Castresana, J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56, 564577 (2007)
  19. Rokas, A., Williams, B. L., King, N. & Carroll, S. B. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425, 798804 (2003)
  20. Byrne, K. P. & Wolfe, K. H. The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species. Genome Res. 15, 14561461 (2005)
  21. Fitzpatrick, D. A., O'Gaora, P., Byrne, K. P. & Butler, G. Analysis of gene evolution and metabolic pathways using the Candida Gene Order Browser. BMC Genomics 11, 290 (2010)
  22. Scannell, D. R., Byrne, K. P., Gordon, J. L., Wong, S. & Wolfe, K. H. Multiple rounds of speciation associated with reciprocal gene loss in polyploid yeasts. Nature 440, 341345 (2006)
  23. Salichos, L. & Rokas, A. Evaluating ortholog prediction algorithms in a yeast model clade. PLoS ONE 6, e18755 (2011)
  24. Slot, J. C. & Rokas, A. Multiple GAL pathway gene clusters evolved independently and by different mechanisms in fungi. Proc. Natl Acad. Sci. USA 107, 1013610141 (2010)
  25. Mossel, E. & Steel, M. A phase transition for a random cluster model on phylogenetic trees. Math. Biosci. 187, 189203 (2004)
  26. Townsend, J. P., Su, Z. & Tekle, Y. I. Phylogenetic signal and noise: predicting the power of a data set to resolve phylogeny. Syst. Biol. 61, 835849 (2012)
  27. Scannell, D. R. et al. The awesome power of yeast evolutionary genetics: new genome sequences and strain resources for the Saccharomyces sensu stricto genus. G3 1, 1125 (2011)
  28. Robinson, D. R. & Foulds, L. R. Comparison of phylogenetic trees. Math. Biosci. 53, 131147 (1981)
  29. Farris, J. S., Kallersjo, M., Kluge, A. G. & Bult, C. Testing significance of incongruence. Cladistics 10, 315319 (1995)
  30. Templeton, A. R. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and apes. Evolution 37, 221244 (1983)
  31. Baker, R. H. & DeSalle, R. Multiple sources of character information and the phylogeny of Hawaiian drosophilids. Syst. Biol. 46, 654673 (1997)
  32. Rodrigo, A. G., Kelly-Borges, M., Bergquist, P. G. & Bergquist, P. L. A randomisation test of the null hypothesis that two cladograms are sample estimates of a parametric phylogenetic tree. N. Z. J. Bot. 31, 257268 (1993)
  33. Yu, Y., Degnan, J. H. & Nakhleh, L. The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genet. 8, e1002660 (2012)
  34. Hittinger, C. T., Rokas, A. & Carroll, S. B. Parallel inactivation of multiple GAL pathway genes and ecological diversification in yeasts. Proc. Natl Acad. Sci. USA 101, 1414414149 (2004)
  35. Rokas, A. & Carroll, S. B. More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. Mol. Biol. Evol. 22, 13371344 (2005)
  36. Jeffroy, O., Brinkmann, H., Delsuc, F. & Philippe, H. Phylogenomics: the beginning of incongruence? Trends Genet. 22, 225231 (2006)
  37. Fitzpatrick, D. A., Logue, M. E., Stajich, J. E. & Butler, G. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis. BMC Evol. Biol. 6, 99 (2006)
  38. Liu, L., Yu, L., Pearl, D. K. & Edwards, S. V. Estimating species phylogenies using coalescence times among sequences. Syst. Biol. 58, 468477 (2009)
  39. Felsenstein, J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783791 (1985)
  40. Hittinger, C. T., Johnston, M., Tossberg, J. T. & Rokas, A. Leveraging skewed transcript abundance by RNA-seq to increase the genomic depth of the tree of life. Proc. Natl Acad. Sci. USA 107, 14761481 (2010)
  41. Kumar, S., Filipski, A. J., Battistuzzi, F. U., Kosakovsky Pond, S. L. & Tamura, K. Statistics and truth in phylogenomics. Mol. Biol. Evol. 29, 457472 (2012)
  42. Cunningham, C. W. Can three incongruence tests predict when data should be combined? Mol. Biol. Evol. 14, 733740 (1997)
  43. Katoh, K. & Toh, H. Recent developments in the MAFFT multiple sequence alignment program. Brief. Bioinform. 9, 286298 (2008)
  44. Abascal, F., Zardoya, R. & Posada, D. Prottest: selection of best-fit models of protein evolution. Bioinformatics 21, 21042105 (2005)
  45. Stamatakis, A. RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 26882690 (2006)
  46. Dujon, B. Yeast evolutionary genomics. Nature Rev. Genet. 11, 512524 (2010)
  47. Scannell, D. R., Butler, G. & Wolfe, K. H. Yeast genome evolution-the origin of the species. Yeast 24, 929942 (2007)
  48. Hall, C., Brachat, S. & Dietrich, F. S. Contribution of horizontal gene transfer to the evolution of Saccharomyces cerevisiae. Eukaryot. Cell 4, 11021115 (2005)
  49. League, G. P., Slot, J. C. & Rokas, A. The ASP3 locus in Saccharomyces cerevisiae originated by horizontal gene transfer from Wickerhamomyces. FEMS Yeast Res. 12, 859863 (2012)
  50. Novo, M. et al. Eukaryote-to-eukaryote gene transfer events revealed by the genome sequence of the wine yeast Saccharomyces cerevisiae EC1118. Proc. Natl Acad. Sci. USA 106, 1633316338 (2009)
  51. Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nature Genet. 25, 2529 (2000)
  52. Beissbarth, T. & Speed, T. P. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20, 14641465 (2004)
  53. Whelan, S. & Goldman, N. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691699 (2001)
  54. Zwickl, D. J. Genetic Algorithm Approaches for the Phylogenetic Analysis of Large Biological Sequence Datasets under the Maximum Likelihood Criterion. Ph.D. thesis, Univ. Texas at Austin. (2006)
  55. Ronquist, F. & Huelsenbeck, J. P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 15721574 (2003)
  56. Bryant, D. in Bioconsensus (eds Janowitz, M. et al.) 163184 (American Mathematical Society and DIMACS, 2003)
  57. Felsenstein, J. Inferring Phylogenies. (Sinauer, 2003)
  58. Alix, B., Boubacar, D. A. & Vladimir, M. T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks. Nucleic Acids Res. 40, W573W579 (2012)
  59. Kuhner, M. K. & Felsenstein, J. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11, 459468 (1994)
  60. Holland, B. R., Huber, K. T., Moulton, V. & Lockhart, P. J. Using consensus networks to visualize contradictory evidence for species phylogeny. Mol. Biol. Evol. 21, 14591461 (2004)
  61. Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379423 (1948)
  62. Rogozin, I. B., Wolf, Y. I., Carmel, L. & Koonin, E. V. Ecdysozoan clade rejected by genome-wide analysis of rare amino acid replacements. Mol. Biol. Evol. 24, 10801090 (2007)
  63. Belinky, F., Cohen, O. & Huchon, D. Large-scale parsimony analysis of metazoan indels in protein-coding genes. Mol. Biol. Evol. 27, 441451 (2010)

Download references

Author information

Affiliations

  1. Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 37235, USA

    • Leonidas Salichos &
    • Antonis Rokas

Contributions

L.S. and A.R. conceived and designed experiments; L.S. carried out experiments; L.S. and A.R. analysed data and wrote the paper.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Information (1 MB)

    This file contains Supplementary Tables 1-2 and Supplementary Figures 1-17.

Additional data