EDITORIAL
06 April 2022

Time to recognize authorship of open data

The open data revolution won’t happen unless the research system values the sharing of data as much as authorship on papers.

You have full access to this article via your institution.

Download PDF

Businesswoman pointing at a statistical report and discussing with colleagues in office. — Researchers willing to share data are not always appropriately credited by those evaluating research.Credit: Getty

At times, it seems there’s an unstoppable momentum towards the principle that data sets should be made widely available for research purposes (also called open data). Research funders all over the world are endorsing the open data-management standards known as the FAIR principles (which ensure data are findable, accessible, interoperable and reusable). Journals are increasingly asking authors to make the underlying data behind papers accessible to their peers. Data sets are accompanied by a digital object identifier (DOI) so they can be easily found. And this citability helps researchers to get credit for the data they generate.

But reality sometimes tells a different story. The world’s systems for evaluating science do not (yet) value openly shared data in the same way that they value outputs such as journal articles or books. Funders and research leaders who design these systems accept that there are many kinds of scientific output, but many reject the idea that there is a hierarchy among them.

In practice, those in powerful positions in science tend not to regard open data sets in the same way as publications when it comes to making hiring and promotion decisions or awarding memberships to important committees, or in national evaluation systems. The open-data revolution will stall unless this changes.

Read the paper: Brain charts for the human lifespan

This week, Richard Bethlehem at the University of Cambridge, UK, and Jakob Seidlitz at the University of Pennsylvania in Philadelphia and their colleagues publish research describing brain development ‘charts’ (R. A. I. Bethlehem et al. Nature https://doi.org/10.1038/s41586-022-04554-y; 2022). These are analogous to the charts that record height and weight over the course of a person’s life, which researchers and clinicians can access.

This work has never been done on such a scale: typically in neuroscience, studies are based on relatively small data sets. To create a more globally representative sample, the researchers aggregated some 120,000 magnetic resonance imaging scans from more than 100 studies. Not all the data sets were originally available for the researchers to use. In some cases, for example, formal data-access agreements constrained how data could be shared.

Some of the scientists whose data were originally proprietary became active co-authors on the paper. By contrast, researchers whose data were accessible from the start are credited in the paper’s citations and acknowledgements, as is the convention in publishing.

Such a practice is neither new nor confined to a specific field. But the result tends to be the same: that authors of openly shared data sets are at risk of not being given credit in a way that counts towards promotion or tenure, whereas those who are named as authors on the publication are more likely to reap benefits that advance their careers.

Credit data generators for data reuse

Such a situation is understandable as long as authorship on a publication is the main way of getting credit for a scientific contribution. But if open data were formally recognized in the same way as research articles in evaluation, hiring and promotion processes, research groups would lose at least one incentive for keeping their data sets closed.

Universities, research groups, funding agencies and publishers should, together, start to consider how they could better recognize open data in their evaluation systems. They need to ask: how can those who have gone the extra mile on open data be credited appropriately?

There will always be instances in which researchers cannot be given access to human data. Data from infants, for example, are highly sensitive and need to pass stringent privacy and other tests. Moreover, making data sets accessible takes time and funding that researchers don’t always have. And researchers in low- and middle-income countries have concerns that their data could be used by researchers or businesses in high-income countries in ways that they have not consented to.

But crediting all those who contribute their knowledge to a research output is a cornerstone of science. The prevailing convention — whereby those who make their data open for researchers to use make do with acknowledgement and a citation — needs a rethink. As long as authorship on a paper is significantly more valued than data generation, this will disincentivize making data sets open. The sooner we change this, the better.

Nature 604, 8 (2022)

doi: https://doi.org/10.1038/d41586-022-00921-x

Reprints and permissions

Subjects

Latest on:

A guide to the Nature Index

Nature Index 13 MAR 24

Decoding chromatin states by proteomic profiling of nucleosome readers

Article 06 MAR 24

‘All of Us’ genetics chart stirs unease over controversial depiction of race

News 23 FEB 24

Researchers want a ‘nutrition label’ for academic-paper facts

Nature Index 17 APR 24

How young people benefit from Swiss apprenticeships

Spotlight 17 APR 24

How we landed job interviews for professorships straight out of our PhD programmes

Career Column 08 APR 24

We must protect the global plastics treaty from corporate interference

World View 17 APR 24

UN plastics treaty: don’t let lobbyists drown out researchers

Editorial 17 APR 24

Smoking bans are coming: what does the evidence say?

News 17 APR 24

Jobs

FACULTY POSITION IN PATHOLOGY RESEARCH

Dallas, Texas (US)

The University of Texas Southwestern Medical Center (UT Southwestern Medical Center)
Postdoc Fellow / Senior Scientist

The Yakoub and Sulzer labs at Harvard Medical School-Brigham and Women’s Hospital and Columbia University

Boston, Massachusetts (US)

Harvard Medical School and Brigham and Women's Hospital
Postdoc in Computational Genomics – Machine Learning for Multi-Omics Profiling of Cancer Evolution

Computational Postdoc - Artificial Intelligence in Oncology and Regulatory Genomics and Cancer Evolution at the DKFZ - limited to 2 years

Heidelberg, Baden-Württemberg (DE)

German Cancer Research Center in the Helmholtz Association (DKFZ)
Computational Postdoc

The German Cancer Research Center is the largest biomedical research institution in Germany.

Heidelberg, Baden-Württemberg (DE)

German Cancer Research Center in the Helmholtz Association (DKFZ)
PhD / PostDoc Medical bioinformatics (m/f/d)

The Institute of Medical Bioinformatics and Systems Medicine / University of Freiburg is looking for a PhD/PostDoc Medical bioinformatics (m/w/d)

Freiburg im Breisgau, Baden-Württemberg (DE)

University of Freiburg

Time to recognize authorship of open data

Subjects

Latest on:

Jobs

FACULTY POSITION IN PATHOLOGY RESEARCH

Postdoc Fellow / Senior Scientist

Postdoc in Computational Genomics – Machine Learning for Multi-Omics Profiling of Cancer Evolution

Computational Postdoc

PhD / PostDoc Medical bioinformatics (m/f/d)

Search

Quick links

Related Articles

Subjects

Latest on:

Jobs

FACULTY POSITION IN PATHOLOGY RESEARCH

Postdoc Fellow / Senior Scientist

Postdoc in Computational Genomics – Machine Learning for Multi-Omics Profiling of Cancer Evolution

Computational Postdoc

PhD / PostDoc Medical bioinformatics (m/f/d)

Search

Quick links