Phylogenomic studies have resolved countless branches of the tree of life, but remain strongly contradictory on certain, contentious relationships. Here, we use a maximum likelihood framework to quantify the distribution of phylogenetic signal among genes and sites for 17 contentious branches and 6 well-established control branches in plant, animal and fungal phylogenomic data matrices. We find that resolution in some of these 17 branches rests on a single gene or a few sites, and that removal of a single gene in concatenation analyses or a single site from every gene in coalescence-based analyses diminishes support and can alter the inferred topology. These results suggest that tiny subsets of very large data matrices drive the resolution of specific internodes, providing a dissection of the distribution of support and observed incongruence in phylogenomic analyses. We submit that quantifying the distribution of phylogenetic signal in phylogenomic data is essential for evaluating whether branches, especially contentious ones, are truly resolved. Finally, we offer one detailed example of such an evaluation for the controversy regarding the earliest-branching metazoan phylum, for which examination of the distributions of gene-wise and site-wise phylogenetic signal across eight data matrices consistently supports ctenophores as the sister group to all other metazoans.
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Wickett, N. J. et al. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl Acad. Sci. USA 111, E4859–E4868 (2014).
Misof, B. et al. Phylogenomics resolves the timing and pattern of insect evolution. Science 346, 763–767 (2014).
Jarvis, E. D. et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346, 1320–1331 (2014).
Shen, X.-X. et al. Reconstructing the backbone of the saccharomycotina yeast phylogeny using genome-scale data. Genes Genom. Genet. 6, 3927–3939 (2016).
Rokas, A., Williams, B. L., King, N. & Carroll, S. B. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425, 798–804 (2003).
Rokas, A. & Carroll, S. B. Bushes in the tree of life. PLoS Biol. 4, e352 (2006).
Delsuc, F., Brinkmann, H. & Philippe, H. Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 6, 361–375 (2005).
Philippe, H., Delsuc, F., Brinkmann, H. & Lartillot, N. Phylogenomics. Annu. Rev. Ecol. Evol. Syst. 36, 541–562 (2005).
Philippe, H. et al. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol. 9, e1000602 (2011).
Riley, R. et al. Comparative genomics of biotechnologically important yeasts. Proc. Natl Acad. Sci. USA 113, 9882–9887 (2016).
Ryan, J. F. et al. The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Science 342, 1242592 (2013).
Pisani, D. et al. Genomic data do not support comb jellies as the sister group to all other animals. Proc. Natl Acad. Sci. USA 112, 15402–15407 (2015).
Nakhleh, L. Computational approaches to species phylogeny inference and gene tree reconciliation. Trends Ecol. Evol. 28, 719–728 (2013).
Degnan, J. H. & Rosenberg, N. A. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24, 332–340 (2009).
Phillips, M. J., Delsuc, F. & Penny, D. Genome-scale phylogeny and the detection of systematic biases. Mol. Biol. Evol. 21, 1455–1458 (2004).
Hess, J. & Goldman, N. Addressing inter-gene heterogeneity in maximum likelihood phylogenomic analysis: yeasts revisited. PLoS One 6, e22783 (2011).
Castoe, T. A. et al. Evidence for an ancient adaptive episode of convergent molecular evolution. Proc. Natl Acad. Sci. USA 106, 8986–8991 (2009).
Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).
Shimodaira, H. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492–508 (2002).
Shimodaira, H. & Hasegawa, M. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17, 1246–1247 (2001).
Shen, X.-X., Salichos, L. & Rokas, A. A genome-scale investigation of how sequence, function, and tree-based gene properties influence phylogenetic inference. Genome Biol. Evol. 8, 2565–2580 (2016).
Rambaut, A. & Grassly, N. C. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235–238 (1997).
Rannala, B. & Yang, Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656 (2003).
Edwards, S. V. Is a new and general theory of molecular systematics emerging? Evolution 63, 1–19 (2009).
Mirarab, S., Bayzid, M. S., Boussau, B. & Warnow, T. Statistical binning enables an accurate coalescent-based estimation of the avian tree. Science 346, 1250463 (2014).
Springer, M. S. & Gatesy, J. The gene tree delusion. Mol. Phylogenet. Evol. 94, 1–33 (2016).
Liu, L., Xi, Z. & Davis, C. C. Coalescent methods are robust to the simultaneous effects of long branches and incomplete lineage sorting. Mol. Biol. Evol. 32, 791–805 (2015).
Shavit Grievink, L., Penny, D. & Holland, B. R. Missing data and influential sites: choice of sites for phylogenetic analysis can be as important as taxon sampling and model choice. Genome Biol. Evol. 5, 681–687 (2013).
Whelan, N., Kocot, K. M., Moroz, L. L. & Halanych, K. M. Error, signal, and the placement of Ctenophora sister to all other animals. Proc. Natl Acad. Sci. USA 112, 5773–5778 (2015).
Borowiec, M. L., Lee, E. K., Chiu, J. C. & Plachetzki, D. C. Extracting phylogenetic signal and accounting for bias in whole-genome data sets supports the Ctenophora as sister to remaining Metazoa. BMC Genomics 16, 987 (2015).
Yang, Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15, 568–573 (1998).
Foote, A. D. et al. Convergent evolution of the genomes of marine mammals. Nat. Genet. 47, 272–275 (2015).
Stern, D. L. The genetic causes of convergent evolution. Nat. Rev. Genet. 14, 751–764 (2013).
Hahn, M. W. & Nakhleh, L. Irrational exuberance for resolved species trees. Evolution 70, 7–17 (2016).
Li, Y., Liu, Z., Shi, P. & Zhang, J. The hearing gene Prestin unites echolocating bats and whales. Curr. Biol. 20, R55–R56 (2010).
Rokas, A. & Carroll, S. B. More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. Mol. Biol. Evol. 22, 1337–1344 (2005).
Heath, T. A., Hedtke, S. M. & Hillis, D. M. Taxon sampling and the accuracy of phylogenetic analyses. J. Syst. Evol. 46, 239–257 (2008).
Goldstein, R. A., Pollard, S. T., Shah, S. D. & Pollock, D. D. Nonadaptive amino acid convergence rates decrease over time. Mol. Biol. Evol. 32, 1373–1381 (2015).
Salichos, L. & Rokas, A. Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497, 327–331 (2013).
Salichos, L., Stamatakis, A. & Rokas, A. Novel information theory-based measures for quantifying incongruence among phylogenetic trees. Mol. Biol. Evol. 31, 1261–1271 (2014).
Kobert, K., Salichos, L., Rokas, A. & Stamatakis, A. Computing the internode certainty and related measures from partial gene trees. Mol. Biol. Evol. 33, 1606–1617 (2016).
Tarver, J. E. et al. The interrelationships of placental mammals and the limits of phylogenetic inference. Genome Biol. Evol. 8, 330–344 (2016).
Takezaki, N. & Nishihara, H. Resolving the phylogenetic position of coelacanth: the closest relative is not always the most appropriate outgroup. Genome Biol. Evol. 8, 1208–1221 (2016).
Kimball, R. T., Wang, N., Heimer-McGinn, V., Ferguson, C. & Braun, E. L. Identifying localized biases in large datasets: a case study using the avian tree of life. Mol. Phylogenet. Evol. 69, 1021–1032 (2013).
Gatesy, J. et al. Resolution of a concatenation/coalescence kerfuffle: partitioned coalescence support and a robust family-level tree for Mammalia. Cladistics http://doi.org/10.1111/cla.12170 (2016).
Bar-Hen, A., Mariadassou, M., Poursat, M.-A. & Vandenkoornhuyse, P. Influence function for robust phylogenetic reconstructions. Mol. Biol. Evol. 25, 869–873 (2008).
Brown, J. M. & Thomson, R. C. Bayes factors unmask highly variable information content, bias, and extreme influence in phylogenomic analyses. Syst. Biol. http://doi.org/10.1093/sysbio/syw101 (2016).
Mossel, E & Steel, M. in Mathematics of Evolution and Phylogeny (ed. Gascuel, O. ) 384–412 (Oxford Univ. Press, 2005).
Whitfield, J. B. & Lockhart, P. J. Deciphering ancient rapid radiations. Trends Ecol. Evol. 22, 258–265 (2007).
Chen, M.-Y., Liang, D. & Zhang, P. Selecting question-specific genes to reduce incongruence in phylogenomics: a case study of jawed vertebrate backbone phylogeny. Syst. Biol. 64, 1104–1120 (2015).
Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Ihaka, R. & Gentleman, R . R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).
Stamatakis, A., Hoover, P. & Rougemont, J. A rapid bootstrap algorithm for the RAxML web servers. Syst. Biol. 57, 758–771 (2008).
Mirarab, S. & Warnow, T. ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31, i44–i52 (2015).
Dunn, C. W. et al. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452, 745–749 (2008).
Hejnol, A. et al. Assessing the root of bilaterian animals with scalable phylogenomic methods. Proc. R. Soc. B 276, 4261–4270 (2009).
Moroz, L. L. et al. The ctenophore genome and the evolutionary origins of neural systems. Nature 510, 109–114 (2014).
Philippe, H. et al. Phylogenomics revives traditional views on deep animal relationships. Curr. Biol. 19, 706–712 (2009).
Pick, K. S. et al. Improved phylogenomic taxon sampling noticeably affects nonbilaterian relationships. Mol. Biol. Evol. 27, 1983–1987 (2010).
Nosenko, T. et al. Deep metazoan phylogeny: when different genes tell different stories. Mol. Phylogenet. Evol. 67, 223–233 (2013).
We thank members of the Rokas laboratory, and in particular X. Zhou, for discussions and comments. We also thank M. Chen for providing the animal phylogenomic data matrix and J. Leebens-Mack for providing further information about the plant data matrix. This work was conducted in part using the resources of the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University, of the UW-Madison Center for High Throughput Computing, and of the CIPRES Science Gateway. This work was supported by the National Science Foundation (DEB-1442113 to A.R.; DEB-1442148 to C.T.H.), in part by the DOE Great Lakes Bioenergy Research Center (DOE Office of Science BER DE-FC02- 07ER64494), the USDA National Institute of Food and Agriculture (Hatch project 1003258 to C.T.H.), and the National Institutes of Health (NIAID AI105619 to AR). C.T.H. is a Pew Scholar in the Biomedical Sciences, supported by the Pew Charitable Trusts.
The authors declare no competing financial interests.
About this article
Cite this article
Shen, XX., Hittinger, C. & Rokas, A. Contentious relationships in phylogenomic studies can be driven by a handful of genes. Nat Ecol Evol 1, 0126 (2017). https://doi.org/10.1038/s41559-017-0126
Current Biology (2021)
Molecular Phylogenetics and Evolution (2021)
Evidence for sponges as sister to all other animals from partitioned phylogenomics with mixture models and recoding
Nature Communications (2021)
Uncovering dynamic evolution in the plastid genome of seven Ligusticum species provides insights into species discrimination and phylogenetic implications
Scientific Reports (2021)