Before the coronavirus pandemic struck, researchers were often cautious about sharing their data — if not outright unwilling. “Historically, it’s been seen as a challenge because researchers put a lot of work into collecting data and want to make sure that they receive enough credit,” says Mahsa Shabani, who studies privacy law and bioethics at Ghent University in Belgium.
But the pandemic has prompted a new and more urgent interest in sharing and mining existing data, and in pooling resources. “We’ve seen an increase in submissions across disciplines and we know that’s happening at other repositories as well,” says Daniella Lowenberg, product manager for the data-sharing platform Dryad and a research data specialist at the California Digital Library in Oakland.
Most scientists agree, at least in principle, that data sharing is a moral obligation, says Georgina Humphreys, clinical data-sharing manager at the biomedical-research funder Wellcome in London. The more data researchers can access, the more quickly they can understand the virus and develop therapies and vaccines.
In March, Wellcome partnered with financial-services company Mastercard and the Bill and Melinda Gates Foundation in Seattle, Washington, to set up the COVID-19 Therapeutic Accelerator, a fund that supports swift evaluation of drugs and treatments to tackle the pandemic, and hopes to address other pathogens eventually. Sharing data as widely and rapidly as possible has been a key aim of the initiative, says Humphreys.
More than 2,800 observational clinical trials for COVID-19 treatments are currently listed by the global Cochrane COVID-19 Study Register; Humphreys says that a switch to greater sharing is key to developing a successful treatment by the end of 2020. “Researchers are more worried now about making sure their data are available, so their profile is raised as opposed to their not getting credit for it,” she says.
Humphreys and others hope that the rush to share COVID-19 data will turn into a marathon with a lasting impact. “The importance of data sharing hasn’t changed; COVID-19 highlights how important it is,” says Lowenberg. On the launch of the COVID-19 Therapeutics Accelerator, Mark Suzman, chief executive of the Gates Foundation, said in a statement, “If we want to make the world safe from outbreaks like COVID-19, then we need to find a way to make research and development move faster.”
Marco Liverani, a health-policy researcher with the London School of Hygiene and Tropical Medicine who works largely in southeast Asia, says that most data are underused. “It’s certainly possible to generate valuable knowledge using secondary data sets,” he says. “There are plenty of data to sink one’s teeth into, not only in historical research but also in just the last few years, when there has been a huge volume across disciplines.” Some organizations and funders have designed initiatives that encourage data sharing.
Although the practice is in vogue, it is complicated and requires an understanding of legal, ethical and scientific considerations. Here are six ways to avoid common data-sharing mistakes.
Curate data contributions. Make sure you provide enough metadata — information about the data, including how they were collected — as well as any code that is necessary to process and analyse the data. A file in a format that is not accessible to analytics programs — or that doesn’t have descriptive, machine-readable column headings, ‘readme’ files or usage notes to help other researchers to understand it — is less useful, says Lowenberg. She suggests that researchers seek help from university librarians to determine the most appropriate repository for specific data and what’s needed to make a data set reusable.
Anonymize personal information. When depositing data concerning human study participants, make sure you have the appropriate ethical and legal approvals, says Lowenberg. The data must be properly anonymized and de-identified. “A majority of submissions of COVID-19 data have required major revisions because they contained tons of personally identifying information,” she says — such as patients’ names and entire health records. This information cannot be shared, and must be removed.
Take care when using data. “Make sure you understand the context in which the data was collected — not just the raw data, but the protocols, how and where the data was collected, and the initial reason,” says Humphreys. If it’s not clear from the accompanying documentation, get in touch with the team that generated the data. “Make it a collaboration, if possible,” says Shabani. Some pharmaceutical companies and data repositories such as the UK BioBank or the UK Data Archive — which hosts results from social science and population research — have staff members who can answer queries and are keen for people to use data they list as available, adds Humphreys.
Check your team’s statistical ability. Some scientists don’t have the expertise necessary to work with multiple, complex data sets, says Humphreys. Analysing a single study is very different from conducting analyses on pooled data from various sources, she notes, and a data-access request might be rejected if the researchers making it do not demonstrate the required technical abilities.
Be aware of legal obligations. Particularly in the frenzied early stages of a disease outbreak, some researchers might use smartphone apps to collect data on disease spread, Shabani says. But any future use of those data could be subject to country-specific requirements, such as informed consent. Researchers who collect or use such data, especially from people, should check with their institution’s oversight or ethics committees to ensure that their research protocol is in order and that there are no questions about integrity.
Some databases, such as those in genomics, require data-sharing agreements to safeguard against the misuse of personal data and to avoid compromising privacy. Data users provide background information to authenticate their research institution and to hold it responsible if the agreement is breached, says Shabani. Those who wish to access such data should quickly circulate any agreements to the legal teams at their institution, says Humphreys.
Acknowledge the data generators. If you publish a paper based on shared data, check whether the people who generated the data should be listed as authors — for example, some journals require authorship for anyone who designed, collected or analysed the paper’s intellectual content. Shabani says that all papers should acknowledge the data generators and include relevant information about the original data, as should any presentations based on the papers. Without that level of collegiality, data generators might have little incentive to continue sharing, she says.
Researchers should also consider global inequality concerns. “When it comes to data sharing, research organizations in developing areas of Africa or South America may share data, but the benefits often accrue to academics in high-income countries,” says Liverani. Authors should ensure that colleagues in developing regions get any authorship credit or acknowledgements they deserve, he says.