5 rules for good data management in the life sciences
Good data partnerships depend on mutual respect, understanding each other’s roles, and strong communication.
22 October 2021
Laurence Dutton/Getty Images
1. Learn the basics
Researchers and data stewards need to speak the same work language to efficiently work together. But because they often come from different disciplines, this is rare. ‘Data steward’ is a relatively new job description which involves managing research data, including backing it up, controlling the metadata and ensuring reusability.
Even departments and labs that don’t have a formally titled data steward need someone who performs the role – the tech-savvy person who pitches in on data management, often a bioinformatician or analytical chemist, for example.
The data steward requires a basic understanding of principles of the sub-discipline in which they are working, while the researcher needs to understand the basics of data management.
A successful partnership of researcher and data steward establishes a fruitful dialogue, rather than a one-way communication, to preserve and get the most out of the data. For example, when the life sciences researcher asks the data steward to format data in a certain way, both need to understand the goal of the process.
Meanwhile, the data ally should never think “I have no idea if this makes sense biologically speaking, but this is what you asked for”.
2. Define roles and expectations
The finalization of datasets for analysis is often a critical moment in an academic study. The data needs to be preserved for short- and long-term usage, but also analyzed quickly to generate new insights.
In this, the data steward and researcher play different roles. The data steward documents and preserves the data and source code, while the researcher analyses and interprets.
Often, this is an iterative process, with data continually reused, reanalyzed, and combined with different datasets. Insufficient data management hampers this or makes it impossible.
Defining roles clearly and early results in a stable, trusting relationship between a data ally and a researcher instead of a build-up of frustration. So, we recommend sketching a simple yet binding workflow diagram to codify who does what, and when.
3. Think across disciplines
Interdisciplinarity has become the linchpin of modern life science. Bioinformatics, in particular, is increasingly prominent in biology, demanding greater data-management expertise from life scientists. A great way to get into the interdisciplinary mindset is to be exposed to different disciplines, techniques, methodologies, and ways of thinking.
Consider organizing consulting activities to bring data allies and researchers together. For researchers, these provide an opportunity to become accustomed to research data management terms and practices.
For instance, when you use a new technique, say microscopy, you can consult with a data steward upon dataset acquisition on issues such as how to archive the resulting data, and how to create the metadata and make it accessible during and after the project. Exposure to different disciplines helps both the researcher and the data steward as long as jargon-free language, understandable to both, is used.
This can not only create a trustful interdisciplinary atmosphere, but also help to position the data steward as the local ‘control tower’ for the researchers.
They can then, for instance, direct researchers with a question about data licenses to librarians and those asking about storage capacity to the IT department.
4. Support one another
Open Science is now a research priority across the world and in the EU in particular. Excellent open science is fueled by high quality data.
Establishing and extending the profession of data stewards can support high data-quality standards, as well as rewarding researchers who invest time and energy on data management.
To ensure that they can implement data standards and policies in their department, data stewards need formal support. In the Netherlands, for example, national policy developments urge researchers to make their data FAIR (Findable, Accessible, Interoperable and Reusable). Data stewards can also be formally recognized by publishing stand-alone datasets. They should also encourage researchers to publish data by itself before formal publication or before the researcher leaves their current position, and ask that their name be added to the publication.
Journals such as Springer Nature’s Scientific Data or Elsevier’s Data in Brief could be considered, for instance.
Researchers themselves might have to perform data management tasks which can take up a lot of time because often there is insufficient funding to hire a designated data steward. Researchers could then support research data stewardship in general by acknowledging data management efforts during scientific meetings and/or by demanding from their superior’s time and credit for data management work.
In addition, data-related funding opportunities are increasingly available in the scheme of Open Science which can help to build more awareness for research data management.
5. Train the next generation
There is currently little formal education available for data stewards, and finding trained personnel can be challenging. Therefore, we need to work to identify and train emerging professionals.
Data stewards can and should support scientists by training them (the ‘train-the-trainer’ principle) on data management.
The should also emphasize the importance of data-management skills in combination with data-analysis skills when their institute hires new staff, and press for hiring a data steward if their institute doesn’t have one. After all, a research community with an embedded data steward has a competitive advantage as funding opportunities now often require good data management. For example, the Dutch Research Agency (NWO) requires a Data Management Plan for its major funding scheme (e.g. VENI for early career researchers).
Become a data steward
Many biologists are already doing data management without being aware of it. If you’re one of them and seek to build upon these skills, we encourage you to become a data steward. This new role has plenty of job opportunities and is being further professionalized.
There are three types of data stewards: those that focus on policy (for example, EU data rule compliance), research (providing guidance to research colleagues), or infrastructure (by building and maintaining the tools required to work with data, such as data deposition services).
As a data steward you can continue in academic research without following the classical academic pathway. In Europe the ELIXIR platform offers training, and in the Netherlands, data steward communities offer skill development, advice, and support.
This article has been contributed by a member of the Nature Index community. See our pitching guidelines.
This article was initiated in the Dutch Techcentre for Life Sciences ‘Data Stewards Interest Group’. The authors would particularly like to thank Petra Bleeker (Associate Professor, Swammerdam Institute for Life Sciences, University of Amsterdam) and the members of the Interest Group Esther Plomp and Mijke Jetten for their feedback.
Marc Galland works as a data scientist and data steward at the department of Plant Physiology at the Swammerdam Institute for Life Sciences (University of Amsterdam, the Netherlands).
Frederike Schmitz has a PhD in life sciences and works as a freelancer in science communication. She is actively involved in the data stewardship community and Open Science movement.