As part of the Open Tree of Life project (http://opentreeoflife.org), we surveyed publications covering all domains of life and found that most phylogenetic trees and nucleotide alignments from the past two decades have been irrevocably lost.
Of 6,193 papers we surveyed in more than 100 peer-reviewed journals, only 17% present accessible trees and alignments (used to infer relatedness). Contacting lead authors to procure data sets was only 19% successful. DNA sequences were deposited in GenBank for almost all these studies, but it is the actual character alignments that are pivotal for reproducing phylogenetic analyses. We estimate that more than 64% of existing alignments or trees are permanently lost.
This problem will increasingly hinder phylogenetic inference as the use of whole-genome data sets becomes common. Journals need to reinforce a policy of online data deposition, either as supplementary material or in repositories such as TreeBASE (http://treebase.org) or Dryad (http://datadryad.org) — including for data sets based on previously published sequences. Ecologists, evolutionary biologists and others will then have access to rigorous phylogenetics for testing their hypotheses.
Author information
Authors and Affiliations
Corresponding author
Supplementary information
Rights and permissions
About this article
Cite this article
Drew, B. Missing data mean holes in tree of life. Nature 493, 305 (2013). https://doi.org/10.1038/493305f
Published:
Issue Date:
DOI: https://doi.org/10.1038/493305f
This article is cited by
-
An investigation of irreproducibility in maximum likelihood phylogenetic inference
Nature Communications (2020)
-
Dealing with data
Nature Materials (2017)
-
Fees could damage public data archives
Nature (2013)