Abstract
Phylogenomic studies have resolved countless branches of the tree of life, but remain strongly contradictory on certain, contentious relationships. Here, we use a maximum likelihood framework to quantify the distribution of phylogenetic signal among genes and sites for 17 contentious branches and 6 well-established control branches in plant, animal and fungal phylogenomic data matrices. We find that resolution in some of these 17 branches rests on a single gene or a few sites, and that removal of a single gene in concatenation analyses or a single site from every gene in coalescence-based analyses diminishes support and can alter the inferred topology. These results suggest that tiny subsets of very large data matrices drive the resolution of specific internodes, providing a dissection of the distribution of support and observed incongruence in phylogenomic analyses. We submit that quantifying the distribution of phylogenetic signal in phylogenomic data is essential for evaluating whether branches, especially contentious ones, are truly resolved. Finally, we offer one detailed example of such an evaluation for the controversy regarding the earliest-branching metazoan phylum, for which examination of the distributions of gene-wise and site-wise phylogenetic signal across eight data matrices consistently supports ctenophores as the sister group to all other metazoans.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Wickett, N. J. et al. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl Acad. Sci. USA 111, E4859–E4868 (2014).
Misof, B. et al. Phylogenomics resolves the timing and pattern of insect evolution. Science 346, 763–767 (2014).
Jarvis, E. D. et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346, 1320–1331 (2014).
Shen, X.-X. et al. Reconstructing the backbone of the saccharomycotina yeast phylogeny using genome-scale data. Genes Genom. Genet. 6, 3927–3939 (2016).
Rokas, A., Williams, B. L., King, N. & Carroll, S. B. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425, 798–804 (2003).
Rokas, A. & Carroll, S. B. Bushes in the tree of life. PLoS Biol. 4, e352 (2006).
Delsuc, F., Brinkmann, H. & Philippe, H. Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 6, 361–375 (2005).
Philippe, H., Delsuc, F., Brinkmann, H. & Lartillot, N. Phylogenomics. Annu. Rev. Ecol. Evol. Syst. 36, 541–562 (2005).
Philippe, H. et al. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol. 9, e1000602 (2011).
Riley, R. et al. Comparative genomics of biotechnologically important yeasts. Proc. Natl Acad. Sci. USA 113, 9882–9887 (2016).
Ryan, J. F. et al. The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Science 342, 1242592 (2013).
Pisani, D. et al. Genomic data do not support comb jellies as the sister group to all other animals. Proc. Natl Acad. Sci. USA 112, 15402–15407 (2015).
Nakhleh, L. Computational approaches to species phylogeny inference and gene tree reconciliation. Trends Ecol. Evol. 28, 719–728 (2013).
Degnan, J. H. & Rosenberg, N. A. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24, 332–340 (2009).
Phillips, M. J., Delsuc, F. & Penny, D. Genome-scale phylogeny and the detection of systematic biases. Mol. Biol. Evol. 21, 1455–1458 (2004).
Hess, J. & Goldman, N. Addressing inter-gene heterogeneity in maximum likelihood phylogenomic analysis: yeasts revisited. PLoS One 6, e22783 (2011).
Castoe, T. A. et al. Evidence for an ancient adaptive episode of convergent molecular evolution. Proc. Natl Acad. Sci. USA 106, 8986–8991 (2009).
Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).
Shimodaira, H. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492–508 (2002).
Shimodaira, H. & Hasegawa, M. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17, 1246–1247 (2001).
Shen, X.-X., Salichos, L. & Rokas, A. A genome-scale investigation of how sequence, function, and tree-based gene properties influence phylogenetic inference. Genome Biol. Evol. 8, 2565–2580 (2016).
Rambaut, A. & Grassly, N. C. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235–238 (1997).
Rannala, B. & Yang, Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656 (2003).
Edwards, S. V. Is a new and general theory of molecular systematics emerging? Evolution 63, 1–19 (2009).
Mirarab, S., Bayzid, M. S., Boussau, B. & Warnow, T. Statistical binning enables an accurate coalescent-based estimation of the avian tree. Science 346, 1250463 (2014).
Springer, M. S. & Gatesy, J. The gene tree delusion. Mol. Phylogenet. Evol. 94, 1–33 (2016).
Liu, L., Xi, Z. & Davis, C. C. Coalescent methods are robust to the simultaneous effects of long branches and incomplete lineage sorting. Mol. Biol. Evol. 32, 791–805 (2015).
Shavit Grievink, L., Penny, D. & Holland, B. R. Missing data and influential sites: choice of sites for phylogenetic analysis can be as important as taxon sampling and model choice. Genome Biol. Evol. 5, 681–687 (2013).
Whelan, N., Kocot, K. M., Moroz, L. L. & Halanych, K. M. Error, signal, and the placement of Ctenophora sister to all other animals. Proc. Natl Acad. Sci. USA 112, 5773–5778 (2015).
Borowiec, M. L., Lee, E. K., Chiu, J. C. & Plachetzki, D. C. Extracting phylogenetic signal and accounting for bias in whole-genome data sets supports the Ctenophora as sister to remaining Metazoa. BMC Genomics 16, 987 (2015).
Yang, Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15, 568–573 (1998).
Foote, A. D. et al. Convergent evolution of the genomes of marine mammals. Nat. Genet. 47, 272–275 (2015).
Stern, D. L. The genetic causes of convergent evolution. Nat. Rev. Genet. 14, 751–764 (2013).
Hahn, M. W. & Nakhleh, L. Irrational exuberance for resolved species trees. Evolution 70, 7–17 (2016).
Li, Y., Liu, Z., Shi, P. & Zhang, J. The hearing gene Prestin unites echolocating bats and whales. Curr. Biol. 20, R55–R56 (2010).
Rokas, A. & Carroll, S. B. More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. Mol. Biol. Evol. 22, 1337–1344 (2005).
Heath, T. A., Hedtke, S. M. & Hillis, D. M. Taxon sampling and the accuracy of phylogenetic analyses. J. Syst. Evol. 46, 239–257 (2008).
Goldstein, R. A., Pollard, S. T., Shah, S. D. & Pollock, D. D. Nonadaptive amino acid convergence rates decrease over time. Mol. Biol. Evol. 32, 1373–1381 (2015).
Salichos, L. & Rokas, A. Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497, 327–331 (2013).
Salichos, L., Stamatakis, A. & Rokas, A. Novel information theory-based measures for quantifying incongruence among phylogenetic trees. Mol. Biol. Evol. 31, 1261–1271 (2014).
Kobert, K., Salichos, L., Rokas, A. & Stamatakis, A. Computing the internode certainty and related measures from partial gene trees. Mol. Biol. Evol. 33, 1606–1617 (2016).
Tarver, J. E. et al. The interrelationships of placental mammals and the limits of phylogenetic inference. Genome Biol. Evol. 8, 330–344 (2016).
Takezaki, N. & Nishihara, H. Resolving the phylogenetic position of coelacanth: the closest relative is not always the most appropriate outgroup. Genome Biol. Evol. 8, 1208–1221 (2016).
Kimball, R. T., Wang, N., Heimer-McGinn, V., Ferguson, C. & Braun, E. L. Identifying localized biases in large datasets: a case study using the avian tree of life. Mol. Phylogenet. Evol. 69, 1021–1032 (2013).
Gatesy, J. et al. Resolution of a concatenation/coalescence kerfuffle: partitioned coalescence support and a robust family-level tree for Mammalia. Cladistics http://doi.org/10.1111/cla.12170 (2016).
Bar-Hen, A., Mariadassou, M., Poursat, M.-A. & Vandenkoornhuyse, P. Influence function for robust phylogenetic reconstructions. Mol. Biol. Evol. 25, 869–873 (2008).
Brown, J. M. & Thomson, R. C. Bayes factors unmask highly variable information content, bias, and extreme influence in phylogenomic analyses. Syst. Biol. http://doi.org/10.1093/sysbio/syw101 (2016).
Mossel, E & Steel, M. in Mathematics of Evolution and Phylogeny (ed. Gascuel, O. ) 384–412 (Oxford Univ. Press, 2005).
Whitfield, J. B. & Lockhart, P. J. Deciphering ancient rapid radiations. Trends Ecol. Evol. 22, 258–265 (2007).
Chen, M.-Y., Liang, D. & Zhang, P. Selecting question-specific genes to reduce incongruence in phylogenomics: a case study of jawed vertebrate backbone phylogeny. Syst. Biol. 64, 1104–1120 (2015).
Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Ihaka, R. & Gentleman, R . R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).
Stamatakis, A., Hoover, P. & Rougemont, J. A rapid bootstrap algorithm for the RAxML web servers. Syst. Biol. 57, 758–771 (2008).
Mirarab, S. & Warnow, T. ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31, i44–i52 (2015).
Dunn, C. W. et al. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452, 745–749 (2008).
Hejnol, A. et al. Assessing the root of bilaterian animals with scalable phylogenomic methods. Proc. R. Soc. B 276, 4261–4270 (2009).
Moroz, L. L. et al. The ctenophore genome and the evolutionary origins of neural systems. Nature 510, 109–114 (2014).
Philippe, H. et al. Phylogenomics revives traditional views on deep animal relationships. Curr. Biol. 19, 706–712 (2009).
Pick, K. S. et al. Improved phylogenomic taxon sampling noticeably affects nonbilaterian relationships. Mol. Biol. Evol. 27, 1983–1987 (2010).
Nosenko, T. et al. Deep metazoan phylogeny: when different genes tell different stories. Mol. Phylogenet. Evol. 67, 223–233 (2013).
Acknowledgements
We thank members of the Rokas laboratory, and in particular X. Zhou, for discussions and comments. We also thank M. Chen for providing the animal phylogenomic data matrix and J. Leebens-Mack for providing further information about the plant data matrix. This work was conducted in part using the resources of the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University, of the UW-Madison Center for High Throughput Computing, and of the CIPRES Science Gateway. This work was supported by the National Science Foundation (DEB-1442113 to A.R.; DEB-1442148 to C.T.H.), in part by the DOE Great Lakes Bioenergy Research Center (DOE Office of Science BER DE-FC02- 07ER64494), the USDA National Institute of Food and Agriculture (Hatch project 1003258 to C.T.H.), and the National Institutes of Health (NIAID AI105619 to AR). C.T.H. is a Pew Scholar in the Biomedical Sciences, supported by the Pew Charitable Trusts.
Author information
Authors and Affiliations
Contributions
X.X.S. and A.R. conceived and designed the study. X.X.S., C.T.H. and A.R. were responsible for acquisition of data, and analysis and interpretation of data. The manuscript was drafted by X.X.S. and A.R., with critical revision by X.X.S., C.T.H. and A.R.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Figures
Supplementary Figures 1–65 (PDF 30504 kb)
Supplementary Tables
Supplementary Tables 1–10 (XLSX 4619 kb)
Rights and permissions
About this article
Cite this article
Shen, XX., Hittinger, C. & Rokas, A. Contentious relationships in phylogenomic studies can be driven by a handful of genes. Nat Ecol Evol 1, 0126 (2017). https://doi.org/10.1038/s41559-017-0126
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41559-017-0126
This article is cited by
-
Evolution of glial cells: a non-bilaterian perspective
Neural Development (2024)
-
The genus Cortinarius should not (yet) be split
IMA Fungus (2024)
-
Reply to: Phylogenomic and comparative genomic analyses support a single evolutionary origin of flatfish asymmetry
Nature Genetics (2024)
-
Distinct hybridization modes in wide- and narrow-ranged lineages of Causonis (Vitaceae)
BMC Biology (2023)
-
Assessing sequence heterogeneity in Chlorellaceae DNA barcode markers for phylogenetic inference
Journal of Genetic Engineering and Biotechnology (2023)