Data curation: Act to staunch loss of research data

Journal name:
Nature
Volume:
520,
Page:
436
Date published:
DOI:
doi:10.1038/520436c
Published online

Never before have scientists had the ability to generate and collect so much data — recent estimates suggest that the global scientific output is doubling roughly every decade (see L. Bornmann and R. Mutz, preprint at http://arxiv.org/abs/1402.4578v3; 2014, and go.nature.com/nzejwh). It is alarming, therefore, that the odds of data being lost are estimated to increase by 17% in every year after publication (T. H. Vines et al. Curr. Biol. 24, 9497; 2014). And this does not include the 80% or so of research data that are inaccessible or unpublished (B. P. Heidorn Libr. Trends 57, 280299; 2008).

Information is lost when researchers fail to store, archive or share their data, for example, and as a result of ageing technology or corruption of data-storage devices. A culture of systematic data curation is needed to stem this loss, but it is not yet in place across research fields — even though curation costs a fraction of the funding used to generate the data in the first place. Standardized protocols would ensure that data are shared and properly curated worldwide.

Global networks such as the Confederation of Open Access Repositories can support research institutions in storing their data. National data services are already providing generic support to researchers (see, for example, go.nature.com/uns6zy). Now, different fields need to converge on common formats for data storage and preservation if such measures are to be effective.

Author information

Affiliations

  1. McGill University, Montreal, Canada.

    • Andrew Gonzalez
  2. University of Quebec in Montreal, Canada.

    • Pedro R. Peres-Neto

Corresponding author

Correspondence to:

Author details

Additional data