In January 2023, the US National Institutes of Health (NIH) will begin requiring most of the 300,000 researchers and 2,500 institutions it funds annually to include a data-management plan in their grant applications — and to eventually make their data publicly available.
Researchers who spoke to Nature largely applaud the open-science principles underlying the policy — and the global example it sets. But some have concerns about the logistical challenges that researchers and their institutions will face in complying with it. Namely, they worry that the policy might exacerbate existing inequities in the science-funding landscape and could be a burden for early-career scientists, who do the lion’s share of data collection and are already stretched thin.
The mandate, in part, aims to tackle the reproducibility crisis in scientific research. Last year, a US$2-million, eight-year attempt to replicate influential cancer studies found that fewer than half of the assessed experiments stood up to scrutiny. Efforts to tally the cost of irreproducible research in the United States have found that $10 billion to $50 billion is spent on studies that use deficient methods, a cost that is mostly fronted by public funding agencies.
Irreproducible studies not only waste taxpayers’ money, says Lyric Jorgenson, the acting associate director for science policy at the NIH, but also undermine public trust in science. “We want to make sure that we’re making good on the nation’s investment and fostering transparency and accountability in research,” she says.
Joseph Ross, a health-policy researcher at Yale School of Medicine in New Haven, Connecticut, says the mandate’s effects will be felt far beyond US borders because the NIH is the world’s largest public funder of biomedical research. Ensuring that the policy sets the right tone is important, Ross says, because it will signal to scientists all over the world how biomedical research should be done.
A seismic shift
Under the new policy, which will go into effect on 25 January, all NIH grant applications for projects that collect scientific data must include a ‘data management and sharing’ (DMS) plan that contains details about the software or tools needed to analyse the data, when and where the raw data will be published and any special considerations for accessing or distributing that data.
Such a seismic shift in practice has left some researchers worried about the amount of work that the mandate will require when it becomes effective.
Jenna Guthmiller, an immunologist at the University of Chicago in Illinois, can attest that more work will probably be required. She is one of a handful of researchers funded through a US National Institute of Allergy and Infectious Diseases (NIAID) programme that has enacted a policy similar to the NIH-wide plan, she says. For Guthmiller, that meant tracking down information on long-gone reagents and experimental conditions for a project that’s been running for four years. That took 15 hours, she says, “and I was fortunate enough to work with a data manager”.
Because the vast majority of laboratories and institutions don’t have data managers who organize and curate data, the policy — although well-intentioned — will probably put a heavy burden on trainees and early-career principal investigators, says Lynda Coughlan, a vaccinologist at the University of Maryland School of Medicine in Baltimore, who has been leading a research team for fewer than two years and is worried about what the policy will mean for her.
Jorgenson says that, although the policy might require researchers to spend extra time organizing their data, it’s an essential part of conducting research, and the potential long-term boost in public trust for science will justify the extra effort.
Others worry that data-management activities will further sap funds from under-resourced labs. Although the policy outlines certain fees that researchers can add to their proposed budgets to offset the costs of compliance with the mandate, it doesn’t specify what criteria the NIH will use to grant these requests.
For the policy to be successful, Ross says that the NIH needs to be clear about how it will award these resources — especially to early-career researchers and to underfunded institutions — so as not to exacerbate existing inequities in the research community.
Jorgenson responds that the agency is evaluating the costs of compliance and hopes to prepare more guidance and information.
As part of the data-sharing policy, when a research project is complete or when its grant expires — whichever comes first — NIH programme officers will review the DMS plan to ensure that researchers have adhered to it. At that time, the policy stipulates that researchers must share any ‘scientific data’ needed to “validate and replicate research findings, regardless of whether the data are used to support scholarly publications” — although it makes an exception in cases where data sharing would pose a significant legal, ethical or technical burden. The NIH recommends that this data be shared only in a reputable repository; ultimately, researchers will decide where to upload the information.
The broad term ‘scientific data’ has left some researchers confused about exactly what information they’ll be required to share. It’s hard to predict which data might be useful for other researchers, or whether that data will ever be accessed by anyone, Coughlan says.
In response to an early draft of the policy, the American Association for Universities, an organization based in Washington DC that represents 66 universities, wrote in 2020 that the NIH’s definition of scientific data needed to be narrowed, and suggested that the agency limit it to include only data underlying scholarly publications.
Jorgenson says that data collected when experiments don’t work — and therefore that are not in publications — are just as important to communicate, because they include information that could help other researchers to understand the full context of an experiment’s success. The ambiguity in the policy offers researchers flexibility in determining which data are truly necessary to reproduce research findings, she says.
Brian Nosek, executive director of the Center for Open Science, based in Charlottesville, Virginia, points out that it will be a major challenge for the NIH to ensure that all relevant data have been shared at the conclusion of a project. Although the policy is an “important milestone of maturing the open-science movement beyond just thinking about open access”, Nosek worries that some applicants might not take it seriously if there are no consequences for non-compliance. Jorgenson responds that if the policy is not followed, future funding awards for researchers or institutions could be jeopardized.
Despite its potential pitfalls, Ross thinks that the policy will have a ripple effect that will persuade smaller funding agencies and industry to adopt similar changes. “This policy establishes what people expect from clinical research,” he says. “It’s essentially saying the culture of research needs to change.”