Machine-learning software that uses genetic information about deadly viruses such as Ebola can predict which groups of animals the viruses are likely to circulate in.
Identifying these animal ‘reservoirs’ could help to prevent future outbreaks in humans, says the team behind the software, led by disease ecologist Daniel Streicker of the University of Glasgow, UK. The results are reported on 1 November in Science1.
“Until you know what the reservoir is, it’s difficult to gauge risk, and it’s difficult to do anything to stop a disease from emerging,” says Streicker. Limiting human exposure to these animals, or even vaccinating them, might stop outbreaks from occurring, he adds.
When scientists connect an emerging virus in humans to its animal reservoir — a species that can host the invader without becoming overly ill itself — they are generally using circumstantial evidence. Take Ebola viruses, for example, which many scientists suspect circulate naturally in certain bat species, on the basis of ecological and molecular data.
Bats are common in the remote sub-Saharan African forests where most Ebola outbreaks begin, and field studies have found antibodies2, and even genetic sequences3, of Ebola viruses in these bat populations. However, the specific viruses known to be responsible for outbreaks in humans have not been recovered from wild bats in live, replicating form. (In July, researchers said they had found a new Ebola-virus species that could potentially infect humans in bats in Sierra Leone, but evidence for this statement has yet to be published.)
And even when a virus is detected in an animal, it isn’t clear whether the species is a host reservoir or just an animal that developed the infection, notes Streicker. For instance, Ebola can infect chimpanzees and gorillas, but these apes are not likely to host the virus between human outbreaks, because they contract an often fatal disease similar to that seen in humans.
To better identify the animals important to a virus’s transmission, Streicker’s team gathered epidemiological and genetic data on several hundred viruses from families that can infect humans and whose hosts are already known.
The researchers then used machine learning to build a computer model that could predict which of 11 groups of animals — such as primates and rodents — is most likely to host a virus, using information in a virus’s RNA genome.
The model was based on the inference that genetically related viruses tend to be hosted by similar animals, and it also took into account signals indicating that a virus has adapted its genome to its host.
When tested on the viruses excluded from the building of the model, the software predicted a virus’s host accurately 72% of the time.
When Streicker’s team applied the model to viruses whose animal hosts are not yet known — the real value of such a tool — the predictions made sense in many cases, he says.
For example, the software indicated that all four Ebola viruses it examined were probably hosted by a suborder of bats, called Pteropodiformes, that includes the fruit-bats in which partial Ebola-virus sequences and antibodies have been found.
Surprisingly, the model also predicted that two Ebola-virus species identified in Uganda and Cote D’Ivoire were equally likely to have reservoirs in primates. Streicker is eager for others to test his predictions in field studies.
Sarah Cleaveland, a veterinary epidemiologist at the University of Glasgow, is preparing to do just that. Streicker’s model predicted that the virus responsible for Crimean–Congo haemorrhagic fever, an infection that can be deadly in humans and is seen in Africa, the Middle East, Asia and the Balkans, might be hosted by livestock or bats, rather than ticks — the arachnids typically cited as the virus’s host.
“It has gotten us to question the assumption that’s in the textbooks,” says Cleaveland, whose team now plans to look for Crimean–Congo haemorrhagic fever in livestock in Tanzania.
Peter Daszak, president of the environmental non-profit EcoHealth Alliance in New York City, says that there has been a lot of interest in using computer models to better understand emerging infections.
The study by Streicker’s team is useful, he says, because it predicts reservoirs on the basis of viral sequences — data readily available for most outbreaks. “I think it will go down as a key paper because it’s the first of what will become a future approach to emerging-infectious-disease surveillance,” he says.
But not everyone is sure the approach will be useful. Evolutionary virologist Edward Holmes of the University of Sydney, Australia, says that the animal groups predicted by the model are too broad to be useful. “There are a huge number of different species in each of these groups,” says Holmes. “I’m really not sure what practical use it is predicting that a virus comes from a primate compared to a rodent.”
Moreover, Holmes thinks that the links between viruses and hosts that were used to develop the model aren’t so cut-and-dry, and that the discovery of previously unknown viruses in rarely sampled hosts will alter our understanding of the animal reservoirs responsible for past human outbreaks.
Streicker and his colleagues hope to improve the precision of their predictions, but they say that right now, any hint is useful. “We would love to know the species," Streicker says. "But this is a way to hopefully get to the species faster.”