SARS-CoV-2, the virus that causes COVID-19, could have spilled from animals to people multiple times, according to a preliminary analysis of viral genomes sampled from people infected in China and elsewhere early in the pandemic.
If confirmed by further analyses, the findings would add weight to the hypothesis that the pandemic originated in multiple markets in Wuhan, China, and make the hypothesis that SARS-CoV-2 escaped from a laboratory less likely, say some researchers. But the data need to be verified, and the analysis has not yet been peer reviewed.
The earliest viral sequences, taken from people infected in late 2019 and early 2020, are split into two broad lineages, known as A and B, which have key genetic differences.
Lineage B has become the dominant lineage globally and includes samples taken from people who visited the Huanan seafood market in Wuhan, which also sold wild animals. Lineage A spread within China, and includes samples from people linked to other markets in Wuhan.
A crucial question is how the two viral lineages are related. If viruses in lineage A evolved from those in lineage B, or vice versa, that would suggest that the progenitor of the virus jumped just once from animals to people. But if the two lineages have separate origins, then there might have been multiple spillover events.
Dagger in the heart
The latest analysis — posted on the virological.org discussion forum — adds weight to the second possibility by questioning the existence of genomes linking the lineages.
The finding could be the “dagger into the heart” of the hypothesis that SARS-CoV-2 escaped from a lab, rather than originating from the wildlife trade, says Robert Garry, a virologist at Tulane University in New Orleans, Louisiana. But others say that more research is needed, especially given the limited genomic data from early in the pandemic.
“It is a very significant study,” says Garry. “If you can show that A and B are two separate lineages and there were two spillovers, it all but eliminates the idea that it came from a lab.”
The findings are “consistent with there being at least two introductions of SARS-CoV-2 into the human population”, says David Robertson, a virologist at the University of Glasgow, UK.
Lineages A and B are defined by two key nucleotide differences. But some of the earliest genomes have a combination of these differences. Researchers previously thought that these genomes could be those of viruses at intermediate stages of evolution linking the two lineages.
But the researchers behind the latest analysis looked at them in detail and noticed some problems.
They analysed 1,716 SARS-CoV-2 genomes in a popular online genome repository called GISAID that were collected before 28 February 2020, and identified 38 such ‘intermediate’ genomes.
But when they looked at the sequences more closely, they found that many of these also contained mutations in other regions of their genomes. And they say that these mutations are definitively associated with either lineage A or lineage B — which discredits the idea that the corresponding viral genomes date to an intermediate stage of evolution between the two lineages.
The authors suggest that a laboratory or computer error probably occurred in sequencing one of the two mutations in these ‘intermediate’ genomes. “The more we dug, the more it looked like, maybe we can’t trust any of the ‘transitional’ genomes,” says study co-author Michael Worobey, an evolutionary biologist at the University of Arizona in Tucson.
Such sequencing errors are not unusual, say researchers. Software can sometimes fill in gaps in the raw data with incorrect sequences, and viral samples can become contaminated, notes Richard Neher, a computational biologist at the University of Basel in Switzerland. “Such mishaps are not surprising,” he says. “Especially early in the pandemic, when protocols weren’t very established and people tried to generate data as fast as they could.”
Several researchers who sequenced samples included in the study told Nature it is unlikely that their sequences include errors in the two key nucleotides.
But the study authors counter that even if some of the genomes were sequenced correctly, other parts of the same genomes, or the locations from which the samples were collected, still clearly indicate that they belong to only one or the other lineage.
“It is very unlikely” that any of the ‘intermediate’ genomes are actually transitional genomes, says study co-author Joel Wertheim, a molecular epidemiologist at the University of California, San Diego.
Xiaowei Jiang, an evolutionary biologist at Xi’an Jiaotong–Liverpool University in Suzhou, China, says that the team behind the study must verify the findings by getting “the original raw sequencing data for as many genomes as possible”.
If the virus did jump between animals and people on several occasions, the fact that lineages A and B are linked to people who visited different markets in Wuhan suggests that multiple individual animals, of one or more species, that were carrying a progenitor of SARS-CoV-2 could have been transported across Wuhan, infecting people in at least two locations.
A study published in June1 found that live animals susceptible to SARS-CoV-2, such as raccoon dogs and mink, were sold in numerous markets in Wuhan. Previous studies2 of the virus that caused severe acute respiratory syndrome (SARS) have concluded that it, too, probably jumped multiple times from animals to people.
The latest study, if verified, would mean that the scenario of a researcher accidentally being infected in a lab, and then spreading the virus to the population at large, would have had to happen twice, says Garry. It’s much more likely that the pandemic had its origins in the wildlife trade, he says.
To gather more evidence, the team behind the latest analysis now plans to run computer simulations to test how well multiple spillovers would fit with the diversity of known SARS-CoV-2 genomes.