“I was up all last night,” says Nnaemeka Ndodo, a molecular bioengineer at the Nigeria Centre for Disease Control (CDC) in Abuja. He sequences coronavirus genomes during the day, and then analyses and uploads the results to an online database at night, working tirelessly alongside his colleagues. “We don’t know Saturday, we don’t know Sunday,” he says.
Researchers around the world are racing to spot variants of the coronavirus SARS-CoV-2 so that they can determine whether the mutated viruses will evade vaccines or make COVID-19 deadlier. Like many scientists, Ndodo shares SARS-CoV-2 genome sequences in a popular data repository, GISAID, that requires users to sign in and to credit those whose data they analyse.
But a growing faction of scientists, mostly from wealthy nations, argues that sequences should be shared on databases with no gatekeeping at all. They say this would allow huge analyses combining hundreds of thousands of genomes from different databases to flow seamlessly, and therefore deliver results more rapidly.
The debate has caught the attention of the US National Institutes of Health (NIH) — which runs its own genome repository, called GenBank — and the Bill & Melinda Gates Foundation, which has considered encouraging grantees to share on sites without such strong protections, Nature has learnt.
But many researchers — particularly those in resource-limited countries — are pushing back. They tell Nature that they see potential for exploitation in this no-strings-attached approach — and that GISAID’s gatekeeping is one of its biggest attractions because it ensures that users who analyse sequences from GISAID acknowledge those who deposited them. The database also requests that users seek to collaborate with the depositors.
Fears of inequitable data use are amplified by the fact that only 0.3% of COVID-19 vaccines have gone to low-income countries. “Imagine Africans working so hard to contribute to a database that’s used to make or update vaccines, and then we don’t get access to the vaccines,” says Christian Happi, a microbiologist at the African Centre of Excellence for Genomics of Infectious Diseases in Ede, Nigeria. “It’s very demoralizing.”
GISAID is the most popular repository for SARS-CoV-2 genome sequences, holding 1.4 million sequences as of 4 May. Researchers from under-resourced laboratories say it gives them a chance to participate in big-data analyses or do their own, because of the platform’s terms on acknowledgement and collaboration. Without those, researchers like Ndodo worry that the fruits of their fieldwork and lab work will be scooped up by computer scientists who aren’t burdened with such tasks. Big-data analyses can result in top-tier journal publications — and that, in turn, might lead to lucrative grants and patents for technologies, such as diagnostic tests and vaccines.
Continental Africa and South America more than doubled the number of SARS-CoV-2 sequences they contributed to GISAID between January and April this year. For researchers at the National Institute for Biomedical Research (INRB) in Kinshasa, Democratic Republic of the Congo, the decision to share those sequences was initially fraught. While working in Guinea during the Ebola virus outbreak of 2014–16, one senior scientist was alarmed to learn that all of the specimens collected by African researchers were being shipped out of the country. Most of the scientific papers and patents on those samples were authored by scientists from wealthy countries. Labs in Guinea didn’t sustainably benefit from that work and today remain unable to sequence samples.
So researchers at INRB were wary to share SARS-CoV-2 genome data, says Eddy Kinganda-Lusamaki, a microbiologist at the institute. But after reviewing GISAID’s crediting and collaboration requirements, Kinganda says they decided to share their data prior to publication.
But such caution runs contrary to the growing open-source movement. As of 4 May, an online letter calling for researchers to put genome sequences in the public domain was signed by 778 scientists at universities and pharmaceutical companies — 99% of them based in Europe, the United States and Canada. Rolf Apweiler, the co-director of the group that posted the letter in late January, the European Bioinformatics Institute near Cambridge, UK, tells Nature, “Sequencing is not for enriching the career of individual researchers, but for fighting a pandemic.”
Tulio de Oliveira, director of the KwaZulu-Natal Research and Innovation and Sequencing Platform in Durban, South Africa, agrees. But he counters that the most immediate goal for those sequencing SARS-CoV-2 is guiding their own country’s outbreak response, and that governments listen most often to their own scientists.
Apweiler’s letter caught the attention of NIH director Francis Collins recently. In a 21 April e-mail to dozens of international scientists — shared anonymously with Nature — Collins links to the letter, along with news articles in Nature and Science about complaints over GISAID’s data-sharing policies. He says global health funders, such as the NIH, are best positioned to set standards on sharing, and requests a meeting to discuss how to improve data access while protecting the interests of the scientists depositing data. Glenda Gray, president of the South African Medical Research Council in Cape Town, replied in the e-mail chain that if an open-access requirement comes to fruition, many scientists will stop sharing rapidly. “If one is not careful,” she writes, “one will go back to the model of depositing data only after publication, which can take months or even years.”
Collins did not respond to a request for comment from Nature.
The Gates Foundation is also talking about data sharing. It has told the Africa Centres for Disease Control and Prevention that, in the future, it might encourage grant recipients to share their results on open-access databases, says Yenew Kebede Tebeje, a microbiologist at the agency in Addis Ababa. A representative of the Gates Foundation says that GISAID or any accessible database suffices for sharing genome sequences, but did not answer Nature’s question about future requirements.
An anonymous editorial posted 4 May on the South African online news outlet IOL argues that a push from wealthy countries for open data is suspect, given how often scientists in the global south go unacknowledged. “A neocolonial mentality has long permeated the scientific community,” the editorial says.
Fears of exploitation haven’t changed Apweiler’s mind, however. “The focus on low- and middle-income countries is bizarre because their amount of data is relatively little,” he says. Africa has uploaded around 13,000 sequences to GISAID, and South America has uploaded 14,000 sequences, for instance, compared with about 380,000 from the United Kingdom alone.
But others note that, as COVID-19 rates drop in Europe and the United States, dangerous variants are more likely to pop up in low- and middle-income countries with few vaccines. Sequences from these places will therefore be in demand, says Nuno Faria, a computational virologist at the Institute for Tropical Medicine at the University of São Paulo in Brazil and Imperial College London. Because Brazilian researchers have shared data on GISAID, Faria points out, the P.1 variant, which appears to make vaccines slightly less effective1, is known to now account for 82% of all coronavirus genomes sequenced in the country. And, in Peru, where P.1 is also spreading, researchers say that if GISAID didn’t offer data depositors protection, there would probably be less sharing.