Gene gain and loss across the metazoan tree of life


Although recent research has revealed high genomic complexity in the earliest-splitting animals and their ancestors, the macroevolutionary trends orchestrating gene repertoire evolution throughout the animal phyla remain poorly understood. We used a phylogenomic approach to interrogate genome evolution across all animal phyla. Our analysis uncovered a bimodal distribution of recruitment of orthologous genes, with most genes gained very ‘early’ (that is, at deep nodes) or very ‘late’, representing lineage-specific acquisitions. The emergence of animals was characterized by high values of gene birth and duplications. Deuterostomes, ecdysozoans and Xenacoelomorpha were characterized by no gene gain but rampant differential gene loss. Genes considered as animal hallmarks, such as Notch/Delta, were convergently duplicated in all phyla and at different evolutionary depths. Genes duplicated in all nodes from Metazoa to phylum-specific levels were enriched in functions related to the neural system, suggesting that this system has been continuously and independently reshaped throughout evolution across animals. Our results indicate that animal genomes evolved by unparalleled gene duplication followed by differential gene loss, and provide an atlas of gene repertoire evolution throughout the animal tree of life to navigate how, when and how often each gene in each genome was gained, duplicated or lost.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Gene gain, loss and duplication ratios show a similar pattern across animal phyla.
Fig. 2: Gene gain and duplication ratios are high at deeper nodes, and gene loss at shallower ones.
Fig. 3: Genes related to the neural system are among the most highly duplicated.
Fig. 4: The core gene repertoire of metazoans includes genes from a plethora of KEGG pathways that have undergone different degrees of duplication.
Fig. 5: Pairwise gene loss is pervasive across phyla.

Data availability

All data, code and supplementary information are available in the manuscript. The supplementary materials are deposited in the Harvard dataverse repository The accession numbers for all taxa included in each analysis are indicated in Supplementary Mat. 1. All phylomes can be accessed at PhylomeDB 4.0 under the phylome numbers 431, 462, 747, 778, 782, 812, 819, 824, 875, 888, 937, 950 and 953 (that is, for example, for direct access).


  1. 1.

    Paps, J. & Holland, P. W. H. Reconstruction of the ancestral metazoan genome reveals an increase in genomic novelty. Nat. Commun. 9, 1730 (2018).

  2. 2.

    Richter, D. J., Fozouni, P., Eisen, M. B. & King, N. Gene family innovation, conservation and loss on the animal stem lineage. eLife 7, e34226 (2018).

  3. 3.

    Domazet-Loso, T., Brajković, J. & Tautz, D. A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends Genet. 23, 533–539 (2007).

  4. 4.

    Gabaldón, T. & Koonin, E. V. Functional and evolutionary implications of gene orthology. Nat. Rev. Genet. 14, 360–366 (2013).

  5. 5.

    Laurent, J. M. et al. Humanization of yeast genes with multiple human orthologs reveals principles of functional divergence between paralogs. Preprint at bioRxiv (2019).

  6. 6.

    Cannon, J. T. et al. Xenacoelomorpha is the sister group to Nephrozoa. Nature 530, 89–93 (2016).

  7. 7.

    Laumer, C. E. et al. Support for a clade of Placozoa and Cnidaria in genes with minimal compositional bias. eLife 7, e36278 (2018).

  8. 8.

    Shen, X.-X., Hittinger, C. T. & Rokas, A. Contentious relationships in phylogenomic studies can be driven by a handful of genes. Nat. Ecol. Evol. 1, 0126 (2017).

  9. 9.

    Laumer, C. E. et al. Revisiting metazoan phylogeny with genomic sampling of all phyla. Proc. R. Soc. B 286, 20190831 (2019).

  10. 10.

    Luo, Y.-J. et al. Nemertean and phoronid genomes reveal lophotrochozoan evolution and the origin of bilaterian heads. Nat. Ecol. Evol. 2, 141–151 (2018).

  11. 11.

    Albalat, R. & Cañestro, C. Evolution by gene loss. Nat. Rev. Genet. 17, 379–391 (2016).

  12. 12.

    Lynch, M. & Conery, J. S. The origins of genome complexity. Science 302, 1401–1404 (2003).

  13. 13.

    Shen, X.-X. et al. Tempo and mode of genome evolution in the budding yeast subphylum. Cell 175, 1533–1545.e20 (2018).

  14. 14.

    Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).

  15. 15.

    Sebé-Pedrós, A. et al. Early metazoan cell type diversity and the evolution of multicellular gene regulation. Nat. Ecol. Evol. 2, 1176–1188 (2018).

  16. 16.

    Gabaldón, T. Large-scale assignment of orthology: back to phylogenetics? Genome Biol. 9, 235 (2008).

  17. 17.

    Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 157 (2015).

  18. 18.

    Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).

  19. 19.

    Li, L. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).

  20. 20.

    Kelly, S. & Maini, P. K. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments. PLoS ONE 8, e58537 (2013).

  21. 21.

    Huerta-Cepas, J., Dopazo, J. & Gabaldón, T. ETE: a python environment for tree exploration. BMC Bioinformatics 11, 24 (2010).

  22. 22.

    Huerta-Cepas, J., Dopazo, H., Dopazo, J. & Gabaldón, T. The human phylome. Genome Biol. 8, R109 (2007).

  23. 23.

    Marlétaz, F., Katja, T. C., Goto, T., Satoh, N. & Rokhsar, D. S. A new spiralian phylogeny places the enigmatic arrow worms among gnathiferans. Curr. Biol. 29, 312–318.e3 (2019).

  24. 24.

    Huerta-Cepas, J., Capella-Gutiérrez, S., Pryszcz, L. P., Marcet-Houben, M. & Gabaldón, T. PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome. Nucleic Acids Res. 42, D897–D902 (2014).

  25. 25.

    Altenhoff, A. M. et al. Standardized benchmarking in the quest for orthologs. Nat. Methods 13, 425–430 (2016).

  26. 26.

    Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004).

  27. 27.

    Katoh, K. & Standley, D. M. MAFFT: iterative refinement and additional methods. Methods Mol. Biol. 1079, 131–146 (2014).

  28. 28.

    Lassmann, T. & Sonnhammer, E. L. L. Kalign, Kalignvu and Mumsa: web servers for multiple sequence alignment. Nucleic Acids Res. 34, W596–W599 (2006).

  29. 29.

    Wallace, I. M., O’Sullivan, O., Higgins, D. G. & Notredame, C. M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 34, 1692–1699 (2006).

  30. 30.

    Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).

  31. 31.

    Gascuel, O. BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14, 685–695 (1997).

  32. 32.

    Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).

  33. 33.

    Moyers, B. A. & Zhang, J. Toward reducing phylostratigraphic errors and biases. Genome Biol. Evol. 10, 2037–2048 (2018).

  34. 34.

    Moyers, B. & Zhang, J. Phylostratigraphic bias creates spurious patterns of genome evolution. Mol. Biol. Evol. 33, 3031–3031 (2016).

  35. 35.

    Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-Mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).

  36. 36.

    Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2019).

  37. 37.

    Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).

  38. 38.

    Al-Shahrour, F., Diaz-Uriarte, R. & Dopazo, J. FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20, 578–580 (2004).

  39. 39.

    Supek, F., Bošnjak, M., Škunca, N. & Šmuc, T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE 6, e21800 (2011).

  40. 40.

    Kanehisa, M. Enzyme annotation and metabolic reconstruction using KEGG. Methods Mol. Biol. 1611, 135–145 (2017).

  41. 41.

    Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016).

Download references


We are grateful to M. Marcet-Houben and I. Julca for multiple discussions that contributed to greatly improve this study. R.F. was funded by a Juan de la Cierva-Incorporación Fellowship (Government of Spain) and a Marie Skłodowska-Curie Fellowship (747607). T.G. group receives funding from the Spanish Ministry of Economy, Industry, and Competitiveness (MEIC) grants ‘Centro de Excelencia Severo Ochoa 2013-2017’ SEV-2012-0208 and BFU2015-67107 co-funded by the European Regional Development Fund (ERDF); from the CERCA Programme/Generalitat de Catalunya; from the Catalan Research Agency (AGAUR) SGR857, and a grant from the European Union’s Horizon 2020 research and innovation programme under the grant agreement ERC-2016-724173 the Marie Skłodowska-Curie grant agreement no. H2020-MSCA-ITN-2014-642095.

Author information

R.F. and T.G. developed the overall conceptual approach and analysis. R.F. compiled and analysed the data. R.F. and T.G. wrote the manuscript. T.G. supervised the study.

Correspondence to Toni Gabaldón.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fernández, R., Gabaldón, T. Gene gain and loss across the metazoan tree of life. Nat Ecol Evol 4, 524–533 (2020).

Download citation