Article

Contentious relationships in phylogenomic studies can be driven by a handful of genes

Received:
Accepted:
Published online:

Abstract

Phylogenomic studies have resolved countless branches of the tree of life, but remain strongly contradictory on certain, contentious relationships. Here, we use a maximum likelihood framework to quantify the distribution of phylogenetic signal among genes and sites for 17 contentious branches and 6 well-established control branches in plant, animal and fungal phylogenomic data matrices. We find that resolution in some of these 17 branches rests on a single gene or a few sites, and that removal of a single gene in concatenation analyses or a single site from every gene in coalescence-based analyses diminishes support and can alter the inferred topology. These results suggest that tiny subsets of very large data matrices drive the resolution of specific internodes, providing a dissection of the distribution of support and observed incongruence in phylogenomic analyses. We submit that quantifying the distribution of phylogenetic signal in phylogenomic data is essential for evaluating whether branches, especially contentious ones, are truly resolved. Finally, we offer one detailed example of such an evaluation for the controversy regarding the earliest-branching metazoan phylum, for which examination of the distributions of gene-wise and site-wise phylogenetic signal across eight data matrices consistently supports ctenophores as the sister group to all other metazoans.

  • Subscribe to Nature Ecology & Evolution for full access:

    $99

    Subscribe
  • Purchase article full text and PDF:

    $32

    Buy now

Additional access options:

Already a subscriber? Log in now or Register for online access.

References

  1. 1.

    et al. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl Acad. Sci. USA 111, E4859–E4868 (2014).

  2. 2.

    et al. Phylogenomics resolves the timing and pattern of insect evolution. Science 346, 763–767 (2014).

  3. 3.

    et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346, 1320–1331 (2014).

  4. 4.

    et al. Reconstructing the backbone of the saccharomycotina yeast phylogeny using genome-scale data. Genes Genom. Genet. 6, 3927–3939 (2016).

  5. 5.

    , , & Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425, 798–804 (2003).

  6. 6.

    & Bushes in the tree of life. PLoS Biol. 4, e352 (2006).

  7. 7.

    , & Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 6, 361–375 (2005).

  8. 8.

    , , & Phylogenomics. Annu. Rev. Ecol. Evol. Syst. 36, 541–562 (2005).

  9. 9.

    et al. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol. 9, e1000602 (2011).

  10. 10.

    et al. Comparative genomics of biotechnologically important yeasts. Proc. Natl Acad. Sci. USA 113, 9882–9887 (2016).

  11. 11.

    et al. The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Science 342, 1242592 (2013).

  12. 12.

    et al. Genomic data do not support comb jellies as the sister group to all other animals. Proc. Natl Acad. Sci. USA 112, 15402–15407 (2015).

  13. 13.

    Computational approaches to species phylogeny inference and gene tree reconciliation. Trends Ecol. Evol. 28, 719–728 (2013).

  14. 14.

    & Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24, 332–340 (2009).

  15. 15.

    , & Genome-scale phylogeny and the detection of systematic biases. Mol. Biol. Evol. 21, 1455–1458 (2004).

  16. 16.

    & Addressing inter-gene heterogeneity in maximum likelihood phylogenomic analysis: yeasts revisited. PLoS One 6, e22783 (2011).

  17. 17.

    et al. Evidence for an ancient adaptive episode of convergent molecular evolution. Proc. Natl Acad. Sci. USA 106, 8986–8991 (2009).

  18. 18.

    Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).

  19. 19.

    An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492–508 (2002).

  20. 20.

    & CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17, 1246–1247 (2001).

  21. 21.

    , & A genome-scale investigation of how sequence, function, and tree-based gene properties influence phylogenetic inference. Genome Biol. Evol. 8, 2565–2580 (2016).

  22. 22.

    & Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235–238 (1997).

  23. 23.

    & Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656 (2003).

  24. 24.

    Is a new and general theory of molecular systematics emerging? Evolution 63, 1–19 (2009).

  25. 25.

    , , & Statistical binning enables an accurate coalescent-based estimation of the avian tree. Science 346, 1250463 (2014).

  26. 26.

    & The gene tree delusion. Mol. Phylogenet. Evol. 94, 1–33 (2016).

  27. 27.

    , & Coalescent methods are robust to the simultaneous effects of long branches and incomplete lineage sorting. Mol. Biol. Evol. 32, 791–805 (2015).

  28. 28.

    , & Missing data and influential sites: choice of sites for phylogenetic analysis can be as important as taxon sampling and model choice. Genome Biol. Evol. 5, 681–687 (2013).

  29. 29.

    , , & Error, signal, and the placement of Ctenophora sister to all other animals. Proc. Natl Acad. Sci. USA 112, 5773–5778 (2015).

  30. 30.

    , , & Extracting phylogenetic signal and accounting for bias in whole-genome data sets supports the Ctenophora as sister to remaining Metazoa. BMC Genomics 16, 987 (2015).

  31. 31.

    Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15, 568–573 (1998).

  32. 32.

    et al. Convergent evolution of the genomes of marine mammals. Nat. Genet. 47, 272–275 (2015).

  33. 33.

    The genetic causes of convergent evolution. Nat. Rev. Genet. 14, 751–764 (2013).

  34. 34.

    & Irrational exuberance for resolved species trees. Evolution 70, 7–17 (2016).

  35. 35.

    , , & The hearing gene Prestin unites echolocating bats and whales. Curr. Biol. 20, R55–R56 (2010).

  36. 36.

    & More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. Mol. Biol. Evol. 22, 1337–1344 (2005).

  37. 37.

    , & Taxon sampling and the accuracy of phylogenetic analyses. J. Syst. Evol. 46, 239–257 (2008).

  38. 38.

    , , & Nonadaptive amino acid convergence rates decrease over time. Mol. Biol. Evol. 32, 1373–1381 (2015).

  39. 39.

    & Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497, 327–331 (2013).

  40. 40.

    , & Novel information theory-based measures for quantifying incongruence among phylogenetic trees. Mol. Biol. Evol. 31, 1261–1271 (2014).

  41. 41.

    , , & Computing the internode certainty and related measures from partial gene trees. Mol. Biol. Evol. 33, 1606–1617 (2016).

  42. 42.

    et al. The interrelationships of placental mammals and the limits of phylogenetic inference. Genome Biol. Evol. 8, 330–344 (2016).

  43. 43.

    & Resolving the phylogenetic position of coelacanth: the closest relative is not always the most appropriate outgroup. Genome Biol. Evol. 8, 1208–1221 (2016).

  44. 44.

    , , , & Identifying localized biases in large datasets: a case study using the avian tree of life. Mol. Phylogenet. Evol. 69, 1021–1032 (2013).

  45. 45.

    et al. Resolution of a concatenation/coalescence kerfuffle: partitioned coalescence support and a robust family-level tree for Mammalia. Cladistics (2016).

  46. 46.

    , , & Influence function for robust phylogenetic reconstructions. Mol. Biol. Evol. 25, 869–873 (2008).

  47. 47.

    & Bayes factors unmask highly variable information content, bias, and extreme influence in phylogenomic analyses. Syst. Biol. (2016).

  48. 48.

    & in Mathematics of Evolution and Phylogeny (ed. Gascuel, O.) 384–412 (Oxford Univ. Press, 2005).

  49. 49.

    & Deciphering ancient rapid radiations. Trends Ecol. Evol. 22, 258–265 (2007).

  50. 50.

    , & Selecting question-specific genes to reduce incongruence in phylogenomics: a case study of jawed vertebrate backbone phylogeny. Syst. Biol. 64, 1104–1120 (2015).

  51. 51.

    RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

  52. 52.

    , , & IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

  53. 53.

    & , . R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).

  54. 54.

    , & A rapid bootstrap algorithm for the RAxML web servers. Syst. Biol. 57, 758–771 (2008).

  55. 55.

    & ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31, i44–i52 (2015).

  56. 56.

    et al. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452, 745–749 (2008).

  57. 57.

    et al. Assessing the root of bilaterian animals with scalable phylogenomic methods. Proc. R. Soc. B 276, 4261–4270 (2009).

  58. 58.

    et al. The ctenophore genome and the evolutionary origins of neural systems. Nature 510, 109–114 (2014).

  59. 59.

    et al. Phylogenomics revives traditional views on deep animal relationships. Curr. Biol. 19, 706–712 (2009).

  60. 60.

    et al. Improved phylogenomic taxon sampling noticeably affects nonbilaterian relationships. Mol. Biol. Evol. 27, 1983–1987 (2010).

  61. 61.

    et al. Deep metazoan phylogeny: when different genes tell different stories. Mol. Phylogenet. Evol. 67, 223–233 (2013).

Download references

Author information

Affiliations

  1. Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 37235, USA.

    • Xing-Xing Shen
    •  & Antonis Rokas
  2. Laboratory of Genetics, Genome Center of Wisconsin, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA.

    • Chris Todd Hittinger

Authors

  1. Search for Xing-Xing Shen in:

  2. Search for Chris Todd Hittinger in:

  3. Search for Antonis Rokas in:

Contributions

X.X.S. and A.R. conceived and designed the study. X.X.S., C.T.H. and A.R. were responsible for acquisition of data, and analysis and interpretation of data. The manuscript was drafted by X.X.S. and A.R., with critical revision by X.X.S., C.T.H. and A.R.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Antonis Rokas.

Supplementary information

PDF files

  1. 1.

    Supplementary Figures

    Supplementary Figures 1–65

Excel files

  1. 1.

    Supplementary Tables

    Supplementary Tables 1–10