Open data: curation is under-resourced

Journal name:
Nature
Volume:
538,
Page:
41
Date published:
DOI:
doi:10.1038/538041d
Published online

Science funders and researchers need to recognize the time, resources and effort required to curate open data (see Nature 537, 138; 2016). Although organizations such as the US National Science Foundation and the European Commission are aiming to make data repositories financially self-sustaining, this is unlikely to happen within one or two funding cycles.

There is no reliable business model to finance the curation and maintenance of data repositories. Databases therefore often restrict access to subscribers (see, for example, go.nature.com/2dzc59o), curtailing opportunities for interoperability and collaboration.

Curation is not fully automated for most data types. This means that — in the life sciences, for example — many popular databases must resort to time-consuming manual curation to check data quality, reliability, provenance, format and metadata (S. Leonelli Data-Centric Biology Chicago Univ. Press; 2016).

Crowdsourcing models are promising in this respect because data producers ensure that the deposited data are accurate and reusable, but these models are still not widely deployed (see go.nature.com/2d6p9kc).

To make open data effective as a research tool, computational and field-specific skills need to mesh. This will ensure that data infrastructures are user-friendly and resilient in the face of vertiginous developments.

Author information

Affiliations

  1. University of Exeter, UK.

    • Sabina Leonelli

Corresponding author

Correspondence to:

Author details

Additional data