Data archiving is a good investment

We have found that ongoing financial investment in data-archiving infrastructure yields an impressive scientific return, and believe that it should be whole-heartedly supported by research funding agencies (see, for example, http://go.nature.com/nzftf3).

We used Dryad (see http://datadryad.org), an international, open, cost-effective data repository for the biological sciences, to estimate the cost of archiving data from more than 10,000 publications. We found that these could be curated and the data preserved at an annual cost of about US$400,000.

As an example of how much research is typically published per grant dollar, core grants in population and community ecology from the US National Science Foundation averaged 3–4 publications per $100,000 of grant between 2000 and 2005 (S. Reyes, A. Tessier and S. Mazer, unpublished results). That is, $400,000 invested in original research resulted in about 16 papers.

Dryad cannot yet tell us how effective data archives are in facilitating primary research publications, but the Gene Expression Omnibus (GEO) database at the US National Center for Biotechnology Information offers some insight. To estimate data reuse, we searched the full text of articles in PubMed Central for mention of any of the 2,711 data sets deposited in GEO in 2007. We excluded articles whose authors' names overlapped with those depositing the data set. Extrapolating the 338 hits in PubMed Central to all of PubMed, we estimate that the GEO 2007 data sets made third-party contributions to more than 1,150 published articles by the end of 2010, and reuse continues to accumulate rapidly (H. A. Piwowar, T. J. Vision and M. C. Whitlock Dryad Digital Repository doi:10.5061/dryad.j1fd7; 2011).

Assuming that Dryad has a comparable rate of reuse and collects at least 2,500 data sets annually, an investment of $400,000 in one year should contribute to more than 1,000 papers in the next four years — far more than the accepted value for a research dollar.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Heather A. Piwowar.

Ethics declarations

Competing interests

Heather Piwowar and Todd Vision receive research support from the Dryad data repository project.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Piwowar, H., Vision, T. & Whitlock, M. Data archiving is a good investment. Nature 473, 285 (2011). https://doi.org/10.1038/473285a

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing