Since the rise of web search engines in the 1990s, scientists have scoured the Internet for up-to-the-minute clues about the next big pandemic. For the most part, this type of data-mining has remained an academic exercise: researchers have retrospectively shown that Twitter posts and Google search terms can be used to detect and monitor disease outbreaks as accurately as traditional tracking methods can, but with greater speeds and lower costs. Now, however, designers of such online systems say it's time for the technologies to be used as first-line tools by public health agencies worldwide.

“Confidence in these systems is growing,” says John Brownstein, an epidemiologist at Children's Hospital Boston. Last month, Brownstein and his colleagues reported that a data-mining platform called HealthMap could have been used to estimate the dynamics of the 2010 cholera outbreak in Haiti up to two weeks earlier than reports made by health workers on the ground (Am. J. Trop. Med. Hyg. 86, 39–45, 2012). “There's broader acceptance of data-mining as a legitimate method,” Brownstein says, “especially as we produce more papers on the utility of these services.”

Nigel Collier, a computer scientist at Japan's National Institute of Informatics in Tokyo who created a disease detection service called BioCaster, agrees. He points out that data collected from informal media offer many benefits over official reporting structures, including geographically and linguistically wider coverage and lower overhead costs. What's more, the systems “show good performance in a range of countries and diseases, not just in the Internet-saturated developed world,” he says.

Already, HealthMap, BioCaster and similar disease tracking services are changing the way agencies such as the World Health Organization (WHO) and US Centers for Disease Control and Prevention (CDC) detect and react to outbreaks. “If someone calls us and says, for example, that there have been major school closures in a county where we're not already aware of a flu outbreak, the first thing we'll do is look at HealthMap,” says Lyn Finelli, chief of the CDC's Surveillance and Outbreak Response Team in Atlanta. According to Finelli, the information gleaned from digital disease detection programs is considered reliable and actionable by the CDC, but only to provide big-picture insights. “They don't replace what we already have,” she says, “but they give good hints about where to concentrate our efforts.”

Mapping out disease: Internet-mining platforms such as HealthMap track outbreaks in real time. Credit: HealthMap

“The data-mining systems are counting outbreaks, not cases,” notes Marc Lipsitch, an epidemiologist at the Harvard School of Public Health in Boston. He says that the platforms provide complementary tools but cannot yet supplant traditional tracking mechanisms.

The failure to extract important disease information from the daily chatter of the Internet is another reason public health officials are wary about replacing standard epidemiological assessment with web-based protocols. For example, the CDC picked up the 2009 H1N1 influenza outbreak by traditional case reporting, not by any online tracking in place at the time, notes Lawrence Madoff, an infectious disease specialist at the University of Massachusetts Medical School in Worcester who edits the Program for Monitoring Emerging Diseases (ProMED), one of the largest open-source outbreak reporting systems in the world.

“I think they've gotten much better, but there are still limitations to what data-mining systems can pick up,” Madoff says. “It requires judgment to know when something is important.”