WORLD VIEW
12 December 2023

How to make data open? Stop overlooking librarians

Digital archivists are already experts at tackling the complex challenges of making research data open and accessible. We can help to smooth the transition.

Jessica Farrell⁰

Jessica Farrell
1. Jessica Farrell is a community facilitator at the Educopia Institute in Atlanta, Georgia.
View author publications

You can also search for this author in PubMed Google Scholar

You have full access to this article via your institution.

Download PDF

The ‘Year of Open Science’, as declared by the US Office of Science and Technology Policy (OSTP), is now wrapping up. This followed an August 2022 memo from OSTP acting director Alondra Nelson, which mandated that data and peer-reviewed publications from federally funded research should be made freely accessible by the end of 2025. Federal agencies are required to publish full plans for the switch by the end of 2024.

But the specifics of how data will be preserved and made publicly available are far from being nailed down. I worked in archives for ten years and now facilitate two digital-archiving communities, the Software Preservation Network and BitCurator Consortium, at Educopia in Atlanta, Georgia. The expertise of people such as myself is often overlooked. More open-science projects need to integrate digital archivists and librarians, to capitalize on the tools and approaches that we have already created to make knowledge accessible and open to the public.

How to make your scientific data accessible, discoverable and useful

Making data open and ‘FAIR’ — findable, accessible, interoperable and reusable — poses technical, legal, organizational and financial questions. How can organizations best coordinate to ensure universal access to disparate data? Who will do that work? How can we ensure that the data remain open long after grant funding runs dry?

Many archivists agree that technical questions are the most solvable, given enough funding to cover the labour involved. But they are nonetheless complex. Ideally, any open research should be testable for reproducibility, but re-running scripts or procedures might not be possible unless all of the required coding libraries and environments used to analyse the data have also been preserved. Besides the contents of spreadsheets and databases, scientific-research data can include 2D or 3D images, audio, video, websites and other digital media, all in a variety of formats. Some of these might be accessible only with proprietary or outdated software.

Librarians have many tools that can help, such as ReproZip, created by Rémi Rampin and supported by Vicky Rampin at New York University in 2013. This software brings together into one package all the data files, libraries, environmental variables and options needed to reproduce research. The open-source software BitCurator has supported digital archiving work since 2011. Thanks to years of work by many archivists, the US Library of Congress and the UK National Archives both maintain registries of file formats and what software is needed to open them.

Why NASA and federal agencies are declaring this the Year of Open Science

Legal and organizational barriers are trickier. For example, in the United States, under the 1998 Digital Millennium Copyright Act, a library couldn’t break a digital lock on software, even for preservation or research. A long-lost password, a defunct authentication server or a broken dongle could render data inaccessible. Thanks to advocacy by the Software Preservation Network, updated rules allow libraries to break those locks to preserve software in their collections, ensuring long-term access to data. The Software Preservation Network continues to press for policy changes that enable the preservation of and access to software.

There is also no one body to provide oversight for ensuring data are open. Funders should consider how they could support the formation of organizations that do this, made up of both scientists and information scientists, to help to coordinate across projects and avoid duplications.

All of this requires people to overcome outdated misconceptions of librarianship. If you’re a scientist who has never thought about archivists before, there might be cultural reasons for that. Information science is a feminized field, and archivists are often underpaid and perceived as administrative support staff, not co-creators in the knowledge-production process. Archives are often imagined as boxes of dusty papers, but most archives today maintain vast amounts of digital data. Information management is an academic discipline and should be treated as such.

Make scientific data FAIR

Fortunately, there are examples of fruitful partnerships between researchers and archivists. NASA’s Year of Open Science and the Scientific Information Service at CERN near Geneva, Switzerland, co-hosted an open-science summit in July. My colleague Paul Gignac, a vertebrate palaeontologist at the University of Arizona in Tucson, sought out the expertise of digital archivists when setting up the NSF-funded Non-Clinical Tomography Users Research Network. The project is investigating how to preserve 3D-imaging data sets and how to track important contextual information, such as where the data came from and notes on reproducibility. Gignac found that using information-science tools and standards — such as including metadata about how materials were preserved — helped to ensure that data were FAIR without reinventing the wheel. He also collaborates with the Data Curation Network, a community hub hosted by the University of Minnesota in Minneapolis, which anyone can join.

Many digital archivists and scientists share a vision of a world in which reliable open data are maintained, quality scientific information is accessible regardless of income or location and — as has recently become important — large language models can be trained on well-curated open data instead of on data of unverified quality used without permission. The expertise of digital archivists can help scientists and society to extract maximum benefit from the transition to open access.

Nature 624, 227 (2023)

doi: https://doi.org/10.1038/d41586-023-03935-1

Reprints and permissions

Competing Interests

J.F. is an employee at Educopia Institute, which fiscally hosts some, but not all, of the communities and projects mentioned in this piece. Educopia is a nonprofit research institute.

Subjects

Latest on:

Researchers want a ‘nutrition label’ for academic-paper facts

Nature Index 17 APR 24

Adopt universal standards for study adaptation to boost health, education and social-science research

Correspondence 02 APR 24

How AI is being used to accelerate clinical trials

Nature Index 13 MAR 24

Algorithm ranks peer reviewers by reputation — but critics warn of bias

Nature Index 25 APR 24

Researchers want a ‘nutrition label’ for academic-paper facts

Nature Index 17 APR 24

How young people benefit from Swiss apprenticeships

Spotlight 17 APR 24

NIH pay rise for postdocs and PhD students could have US ripple effect

News 25 APR 24

Canadian science gets biggest boost to PhD and postdoc pay in 20 years

News 17 APR 24

How India can become a science powerhouse

Editorial 16 APR 24

Jobs

ECUST Seeking Global Talents

Join Us and Create a Bright Future Together!

Shanghai, China

East China University of Science and Technology (ECUST)
Position Recruitment of Guangzhou Medical University

Seeking talents around the world.

Guangzhou, Guangdong, China

Guangzhou Medical University
Junior Group Leader

The Imagine Institute is a leading European research centre dedicated to genetic diseases, with the primary objective to better understand and trea...

Paris, Ile-de-France (FR)

Imagine Institute
Director of the Czech Advanced Technology and Research Institute of Palacký University Olomouc

The Rector of Palacký University Olomouc announces a Call for the Position of Director of the Czech Advanced Technology and Research Institute of P...

Czech Republic (CZ)

Palacký University Olomouc
Course lecturer for INFH 5000

The HKUST(GZ) Information Hub is recruiting course lecturer for INFH 5000: Information Science and Technology: Essentials and Trends.

Guangzhou, Guangdong, China

The Hong Kong University of Science and Technology (Guangzhou)

How to make data open? Stop overlooking librarians

Competing Interests

Subjects

Latest on:

Jobs

ECUST Seeking Global Talents

Position Recruitment of Guangzhou Medical University

Junior Group Leader

Director of the Czech Advanced Technology and Research Institute of Palacký University Olomouc

Course lecturer for INFH 5000

Search

Quick links

Competing Interests

Related Articles

Subjects

Latest on:

Jobs

ECUST Seeking Global Talents

Position Recruitment of Guangzhou Medical University

Junior Group Leader

Director of the Czech Advanced Technology and Research Institute of Palacký University Olomouc

Course lecturer for INFH 5000

Search

Quick links