Box 2: Data-sharing best practices for long-tail data
Discoverable. Data must be modeled and hosted in a way that they can be discovered through search. Many data, particularly those in dynamic databases, are considered to be part of the 'hidden web', that is, they are opaque to search engines such as Google. Authors should make their metadata and data understandable and searchable, (for example, use recognized standards when possible, avoid special characters and non-standard abbreviations), ensure the integrity of all links and provide a persistent identifier (for example, a DOI).
Accessible. When discovered, data can be interrogated. Data and related materials should be available through a variety of methods including download and computational access via the Cloud or web services. Access rights to data should be clearly specified, ideally in a machine-readable form.
Intelligible. Data can be read and understood by both human and machine. Sufficient metadata and context description should be provided to facilitate reuse decisions. Standard nomenclature should be used, ideally derived from a community or domain ontology, to make it machine readable.
Assessable. The reliability of data sources can be evaluated. Authors should ensure that repositories and data links contain sufficient provenance information so that a user can verify the source of the data.
Useable. Data can be reused. Authors should ensure that the data are actionable, for example, that they are in a format in which they can be used without conversion or that they can readily be converted. In general, PDF is not a good format for sharing data. Licenses should make data available with as few restrictions as possible for researchers. Data in the laboratory should be managed as if it is meant to be shared; many research libraries now have data-management programs that can help.
Brain and Spinal Injury Center, Department of Neurological Surgery, University of California at San Francisco, San Francisco, California, USA.
- Adam R Ferguson &
- Jessica L Nielson
Directorate for Biological Sciences, National Science Foundation, Arlington, Virginia, USA.
- Melissa H Cragin
Center for Research in Biological Structure, University of California at San Diego, San Diego, California, USA.
- Anita E Bandrowski &
- Maryann E Martone
Department of Neuroscience, University of California at San Diego, San Diego, California, USA.
- Maryann E Martone
Competing financial interests
M.E. Martone is the principal investigator of the Neuroscience Information Framework. A.E. Bandrowski is the NIF Project Leader. A.R. Ferguson, J.L. Nielson and M.H. Cragin are not affiliated with NIF.
Adam R Ferguson
Jessica L Nielson
Melissa H Cragin
Anita E Bandrowski
Maryann E Martone