Contentious relationships in phylogenomic studies can be driven by a handful of genes


Phylogenomic studies have resolved countless branches of the tree of life, but remain strongly contradictory on certain, contentious relationships. Here, we use a maximum likelihood framework to quantify the distribution of phylogenetic signal among genes and sites for 17 contentious branches and 6 well-established control branches in plant, animal and fungal phylogenomic data matrices. We find that resolution in some of these 17 branches rests on a single gene or a few sites, and that removal of a single gene in concatenation analyses or a single site from every gene in coalescence-based analyses diminishes support and can alter the inferred topology. These results suggest that tiny subsets of very large data matrices drive the resolution of specific internodes, providing a dissection of the distribution of support and observed incongruence in phylogenomic analyses. We submit that quantifying the distribution of phylogenetic signal in phylogenomic data is essential for evaluating whether branches, especially contentious ones, are truly resolved. Finally, we offer one detailed example of such an evaluation for the controversy regarding the earliest-branching metazoan phylum, for which examination of the distributions of gene-wise and site-wise phylogenetic signal across eight data matrices consistently supports ctenophores as the sister group to all other metazoans.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: A schematic representation of our approach for quantifying and visualizing phylogenetic signal in a phylogenomic data matrix.
Figure 2: Distributions of phylogenetic signal for 17 contentious branches in plant, animal and fungal phylogenomic data matrices.
Figure 3: Quantification of the effect of the removal of tiny amounts of data on the branch’s topology for 17 contentious branches in plant, animal and fungal phylogenomic data matrices.
Figure 4: Tiny amounts of data exert decisive influence in the resolution of certain contentious branches in phylogenomic studies.
Figure 5: The distribution of phylogenetic signal for three alternative topological hypotheses on the earliest-branching metazoan lineage.


  1. 1

    Wickett, N. J. et al. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl Acad. Sci. USA 111, E4859–E4868 (2014).

    CAS  Article  Google Scholar 

  2. 2

    Misof, B. et al. Phylogenomics resolves the timing and pattern of insect evolution. Science 346, 763–767 (2014).

    CAS  Article  Google Scholar 

  3. 3

    Jarvis, E. D. et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346, 1320–1331 (2014).

    CAS  Article  Google Scholar 

  4. 4

    Shen, X.-X. et al. Reconstructing the backbone of the saccharomycotina yeast phylogeny using genome-scale data. Genes Genom. Genet. 6, 3927–3939 (2016).

    Google Scholar 

  5. 5

    Rokas, A., Williams, B. L., King, N. & Carroll, S. B. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425, 798–804 (2003).

    CAS  Article  Google Scholar 

  6. 6

    Rokas, A. & Carroll, S. B. Bushes in the tree of life. PLoS Biol. 4, e352 (2006).

    Article  Google Scholar 

  7. 7

    Delsuc, F., Brinkmann, H. & Philippe, H. Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 6, 361–375 (2005).

    CAS  Article  Google Scholar 

  8. 8

    Philippe, H., Delsuc, F., Brinkmann, H. & Lartillot, N. Phylogenomics. Annu. Rev. Ecol. Evol. Syst. 36, 541–562 (2005).

    Article  Google Scholar 

  9. 9

    Philippe, H. et al. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol. 9, e1000602 (2011).

    CAS  Article  Google Scholar 

  10. 10

    Riley, R. et al. Comparative genomics of biotechnologically important yeasts. Proc. Natl Acad. Sci. USA 113, 9882–9887 (2016).

    CAS  Article  Google Scholar 

  11. 11

    Ryan, J. F. et al. The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Science 342, 1242592 (2013).

    Article  Google Scholar 

  12. 12

    Pisani, D. et al. Genomic data do not support comb jellies as the sister group to all other animals. Proc. Natl Acad. Sci. USA 112, 15402–15407 (2015).

    CAS  Article  Google Scholar 

  13. 13

    Nakhleh, L. Computational approaches to species phylogeny inference and gene tree reconciliation. Trends Ecol. Evol. 28, 719–728 (2013).

    Article  Google Scholar 

  14. 14

    Degnan, J. H. & Rosenberg, N. A. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24, 332–340 (2009).

    Article  Google Scholar 

  15. 15

    Phillips, M. J., Delsuc, F. & Penny, D. Genome-scale phylogeny and the detection of systematic biases. Mol. Biol. Evol. 21, 1455–1458 (2004).

    CAS  Article  Google Scholar 

  16. 16

    Hess, J. & Goldman, N. Addressing inter-gene heterogeneity in maximum likelihood phylogenomic analysis: yeasts revisited. PLoS One 6, e22783 (2011).

    CAS  Article  Google Scholar 

  17. 17

    Castoe, T. A. et al. Evidence for an ancient adaptive episode of convergent molecular evolution. Proc. Natl Acad. Sci. USA 106, 8986–8991 (2009).

    CAS  Article  Google Scholar 

  18. 18

    Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).

    CAS  Article  Google Scholar 

  19. 19

    Shimodaira, H. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492–508 (2002).

    Article  Google Scholar 

  20. 20

    Shimodaira, H. & Hasegawa, M. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17, 1246–1247 (2001).

    CAS  Article  Google Scholar 

  21. 21

    Shen, X.-X., Salichos, L. & Rokas, A. A genome-scale investigation of how sequence, function, and tree-based gene properties influence phylogenetic inference. Genome Biol. Evol. 8, 2565–2580 (2016).

    CAS  Article  Google Scholar 

  22. 22

    Rambaut, A. & Grassly, N. C. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235–238 (1997).

    CAS  PubMed  Google Scholar 

  23. 23

    Rannala, B. & Yang, Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24

    Edwards, S. V. Is a new and general theory of molecular systematics emerging? Evolution 63, 1–19 (2009).

    CAS  Article  Google Scholar 

  25. 25

    Mirarab, S., Bayzid, M. S., Boussau, B. & Warnow, T. Statistical binning enables an accurate coalescent-based estimation of the avian tree. Science 346, 1250463 (2014).

    Article  Google Scholar 

  26. 26

    Springer, M. S. & Gatesy, J. The gene tree delusion. Mol. Phylogenet. Evol. 94, 1–33 (2016).

    Article  Google Scholar 

  27. 27

    Liu, L., Xi, Z. & Davis, C. C. Coalescent methods are robust to the simultaneous effects of long branches and incomplete lineage sorting. Mol. Biol. Evol. 32, 791–805 (2015).

    CAS  Article  Google Scholar 

  28. 28

    Shavit Grievink, L., Penny, D. & Holland, B. R. Missing data and influential sites: choice of sites for phylogenetic analysis can be as important as taxon sampling and model choice. Genome Biol. Evol. 5, 681–687 (2013).

    Article  Google Scholar 

  29. 29

    Whelan, N., Kocot, K. M., Moroz, L. L. & Halanych, K. M. Error, signal, and the placement of Ctenophora sister to all other animals. Proc. Natl Acad. Sci. USA 112, 5773–5778 (2015).

    CAS  Article  Google Scholar 

  30. 30

    Borowiec, M. L., Lee, E. K., Chiu, J. C. & Plachetzki, D. C. Extracting phylogenetic signal and accounting for bias in whole-genome data sets supports the Ctenophora as sister to remaining Metazoa. BMC Genomics 16, 987 (2015).

    Article  Google Scholar 

  31. 31

    Yang, Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15, 568–573 (1998).

    CAS  Article  Google Scholar 

  32. 32

    Foote, A. D. et al. Convergent evolution of the genomes of marine mammals. Nat. Genet. 47, 272–275 (2015).

    CAS  Article  Google Scholar 

  33. 33

    Stern, D. L. The genetic causes of convergent evolution. Nat. Rev. Genet. 14, 751–764 (2013).

    CAS  Article  Google Scholar 

  34. 34

    Hahn, M. W. & Nakhleh, L. Irrational exuberance for resolved species trees. Evolution 70, 7–17 (2016).

    Article  Google Scholar 

  35. 35

    Li, Y., Liu, Z., Shi, P. & Zhang, J. The hearing gene Prestin unites echolocating bats and whales. Curr. Biol. 20, R55–R56 (2010).

    CAS  Article  Google Scholar 

  36. 36

    Rokas, A. & Carroll, S. B. More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. Mol. Biol. Evol. 22, 1337–1344 (2005).

    CAS  Article  Google Scholar 

  37. 37

    Heath, T. A., Hedtke, S. M. & Hillis, D. M. Taxon sampling and the accuracy of phylogenetic analyses. J. Syst. Evol. 46, 239–257 (2008).

    Google Scholar 

  38. 38

    Goldstein, R. A., Pollard, S. T., Shah, S. D. & Pollock, D. D. Nonadaptive amino acid convergence rates decrease over time. Mol. Biol. Evol. 32, 1373–1381 (2015).

    CAS  Article  Google Scholar 

  39. 39

    Salichos, L. & Rokas, A. Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497, 327–331 (2013).

    CAS  Article  Google Scholar 

  40. 40

    Salichos, L., Stamatakis, A. & Rokas, A. Novel information theory-based measures for quantifying incongruence among phylogenetic trees. Mol. Biol. Evol. 31, 1261–1271 (2014).

    CAS  Article  Google Scholar 

  41. 41

    Kobert, K., Salichos, L., Rokas, A. & Stamatakis, A. Computing the internode certainty and related measures from partial gene trees. Mol. Biol. Evol. 33, 1606–1617 (2016).

    CAS  Article  Google Scholar 

  42. 42

    Tarver, J. E. et al. The interrelationships of placental mammals and the limits of phylogenetic inference. Genome Biol. Evol. 8, 330–344 (2016).

    CAS  Article  Google Scholar 

  43. 43

    Takezaki, N. & Nishihara, H. Resolving the phylogenetic position of coelacanth: the closest relative is not always the most appropriate outgroup. Genome Biol. Evol. 8, 1208–1221 (2016).

    CAS  Article  Google Scholar 

  44. 44

    Kimball, R. T., Wang, N., Heimer-McGinn, V., Ferguson, C. & Braun, E. L. Identifying localized biases in large datasets: a case study using the avian tree of life. Mol. Phylogenet. Evol. 69, 1021–1032 (2013).

    Article  Google Scholar 

  45. 45

    Gatesy, J. et al. Resolution of a concatenation/coalescence kerfuffle: partitioned coalescence support and a robust family-level tree for Mammalia. Cladistics (2016).

  46. 46

    Bar-Hen, A., Mariadassou, M., Poursat, M.-A. & Vandenkoornhuyse, P. Influence function for robust phylogenetic reconstructions. Mol. Biol. Evol. 25, 869–873 (2008).

    CAS  Article  Google Scholar 

  47. 47

    Brown, J. M. & Thomson, R. C. Bayes factors unmask highly variable information content, bias, and extreme influence in phylogenomic analyses. Syst. Biol. (2016).

  48. 48

    Mossel, E & Steel, M. in Mathematics of Evolution and Phylogeny (ed. Gascuel, O. ) 384–412 (Oxford Univ. Press, 2005).

    Google Scholar 

  49. 49

    Whitfield, J. B. & Lockhart, P. J. Deciphering ancient rapid radiations. Trends Ecol. Evol. 22, 258–265 (2007).

    Article  Google Scholar 

  50. 50

    Chen, M.-Y., Liang, D. & Zhang, P. Selecting question-specific genes to reduce incongruence in phylogenomics: a case study of jawed vertebrate backbone phylogeny. Syst. Biol. 64, 1104–1120 (2015).

    CAS  Article  Google Scholar 

  51. 51

    Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

    CAS  Article  Google Scholar 

  52. 52

    Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

    CAS  Article  Google Scholar 

  53. 53

    Ihaka, R. & Gentleman, R . R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).

    Google Scholar 

  54. 54

    Stamatakis, A., Hoover, P. & Rougemont, J. A rapid bootstrap algorithm for the RAxML web servers. Syst. Biol. 57, 758–771 (2008).

    Article  Google Scholar 

  55. 55

    Mirarab, S. & Warnow, T. ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31, i44–i52 (2015).

    CAS  Article  Google Scholar 

  56. 56

    Dunn, C. W. et al. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452, 745–749 (2008).

    CAS  Article  Google Scholar 

  57. 57

    Hejnol, A. et al. Assessing the root of bilaterian animals with scalable phylogenomic methods. Proc. R. Soc. B 276, 4261–4270 (2009).

    Article  Google Scholar 

  58. 58

    Moroz, L. L. et al. The ctenophore genome and the evolutionary origins of neural systems. Nature 510, 109–114 (2014).

    CAS  Article  Google Scholar 

  59. 59

    Philippe, H. et al. Phylogenomics revives traditional views on deep animal relationships. Curr. Biol. 19, 706–712 (2009).

    CAS  Article  Google Scholar 

  60. 60

    Pick, K. S. et al. Improved phylogenomic taxon sampling noticeably affects nonbilaterian relationships. Mol. Biol. Evol. 27, 1983–1987 (2010).

    CAS  Article  Google Scholar 

  61. 61

    Nosenko, T. et al. Deep metazoan phylogeny: when different genes tell different stories. Mol. Phylogenet. Evol. 67, 223–233 (2013).

    Article  Google Scholar 

Download references


We thank members of the Rokas laboratory, and in particular X. Zhou, for discussions and comments. We also thank M. Chen for providing the animal phylogenomic data matrix and J. Leebens-Mack for providing further information about the plant data matrix. This work was conducted in part using the resources of the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University, of the UW-Madison Center for High Throughput Computing, and of the CIPRES Science Gateway. This work was supported by the National Science Foundation (DEB-1442113 to A.R.; DEB-1442148 to C.T.H.), in part by the DOE Great Lakes Bioenergy Research Center (DOE Office of Science BER DE-FC02- 07ER64494), the USDA National Institute of Food and Agriculture (Hatch project 1003258 to C.T.H.), and the National Institutes of Health (NIAID AI105619 to AR). C.T.H. is a Pew Scholar in the Biomedical Sciences, supported by the Pew Charitable Trusts.

Author information




X.X.S. and A.R. conceived and designed the study. X.X.S., C.T.H. and A.R. were responsible for acquisition of data, and analysis and interpretation of data. The manuscript was drafted by X.X.S. and A.R., with critical revision by X.X.S., C.T.H. and A.R.

Corresponding author

Correspondence to Antonis Rokas.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Figures

Supplementary Figures 1–65 (PDF 30504 kb)

Supplementary Tables

Supplementary Tables 1–10 (XLSX 4619 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shen, XX., Hittinger, C. & Rokas, A. Contentious relationships in phylogenomic studies can be driven by a handful of genes. Nat Ecol Evol 1, 0126 (2017).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing