Technology to advance infectious disease forecasting for outbreak management

Forecasting is beginning to be integrated into decision-making processes for infectious disease outbreak response. We discuss how technologies could accelerate the adoption of forecasting among public health practitioners, improve epidemic management, save lives, and reduce the economic impact of outbreaks.

push to use clinical trials to confirm that Ebola vaccines could be safe and efficacious (J. Asher, personal communication). Realtime forecasts generated during the outbreak highlighted challenges for the design of the planned clinical trials. These studies showed, based on forecasted incidence rates of EVD, that there was a strong possibility that the trials being proposed during September 2014 would not have sufficient case numbers to demonstrate significant results. This forecasting sped up discussions among senior leaders to pursue more productive, alternative trial designs (J. Asher, personal communication).
In this Comment, we discuss major limitations of the current set of tools used in forecasting outbreaks and highlight existing and emerging technologies that have the potential to significantly enhance forecasting capabilities. We focus on forecasting for outbreak management, specifically the capacity to predict shortterm (i.e., days to weeks) trends of disease activity or incidence (i.e., the number and location of new cases) in an ongoing outbreak. We do not address the prediction of outbreak emergence, which is a separate endeavor with its own opportunities 6 and challenges 7 , nor do we consider projecting multi-year trends of disease burden 8 .
From a data science perspective, the forecasting workflow encompasses three general categories: data, analytics, and communication ( Fig. 1). Each step in the process has challenges and opportunities.

Data collection
Effective data collection and curation is essential for analytics and efficient outbreak management. Yet, for infectious disease forecasting, data quantity, quality and timeliness persist as significant challenges. Few epidemiological data are consistently reported, broadly shared, and available for decision-making during outbreak responses, especially early in outbreaks. Data collection can be a slow process, particularly in low-resource settings lacking sufficiently trained staff, with sporadic communications, limited healthcare systems, and inconsistent electrical power. Improving collection systems and advancing forecasting approaches that address these limitations and leverage existing surveillance data are necessary.
Improving diagnostic capabilities at scale should be a priority area of development. Recent advances have introduced the capacity to collect and share near real-time diagnostic results. For example, Quidel's Sofia platform 9 and BioFire's FilmArray multiplex PCR 10 both provide rapid diagnostic tests for respiratory pathogens that are wirelessly connected to cloud-enabled databases. These early examples demonstrate how rapid, aggregated, and geo-coded diagnostic test results could improve real-time tracking of population health trends. Additionally, they could enable timely and targeted clinical trial recruitment. Determining how to scale these capabilities could provide a significant source of data to improve forecasts.

Data cleaning
Collected data is usually not in a form amenable for immediate analysis that could support decision making, and must be processed and cleaned. Data cleaning has been largely a manual, ad hoc process in outbreak forecasting efforts. Therefore, technologies to clean data would be particularly valuable for forecasting.
Technologies that translate raw, unprocessed data into structured formats would be particularly useful. For instance, software could extract data from line lists of cases or clinical notes in electronic health records, or convert data stored in non-standard formats into machine-readable data. Digitizing handwritten text reliably, quickly and securely from clinical or epidemiological records will be a persistent need for the foreseeable future.

Data sharing
Although tools are improving, epidemiological data sharing remains a problem. Public health agencies provide data via their websites and situational reports 11,12 . These efforts are critical for supplying information to the public but the formats often cause challenges for quantitative analysts. Typically, these reports are provided with a considerable time lag, and are not machinereadable nor provided in standard formats with metadata. This impedes sharing and use of these data.
There have been instances where epidemiological data are available via informal networks of people sharing spreadsheets (D. B. George, personal communication); secure CSV file transfers 13 ; or unofficial APIs 14,15 . These approaches should be lauded, but they are not long-term, enterprise solutions.
Open-science approaches to sharing data have shown promise in recent outbreaks. Epidemiologists and modelers have begun using publicly available repositories, such as GitHub, to aggregate and share digitized data in standardized formats [16][17][18] . This paradigm shift resulted in a rapid improvement in data-sharing capability during the 2014-2015 West Africa Ebola outbreak (D. B. George, personal communication). A team of influenza forecasters in the U.S. also has used GitHub to share forecast data to facilitate the creation of multi-model ensemble forecasts 19,20 . The shift from informal means of sharing data to robust technologies using standardized, machine-readable formats enables more rapid and meaningful engagement of a broader group of analysts. Structured open-science approaches to data sharing that are specifically tailored to forecasting applications should be further supported and explored.
Analytics: training models Over the past several years, academic research on infectious disease forecasting has grown and models have successfully generated predictions for pathogens such as influenza [19][20][21] , dengue 13 , Zika 22 , and Ebola 2 . But, scaling academic research to support public health decision-makers in real-time has received little attention and relatively scarce resources.
The U.S. Department of Health and Human Services has built models for recent outbreaks using a combination of extramural and internal analytical resources. However, the federal government and state and local public health agencies find it  Fig. 1 The forecasting workflow: Generating infectious disease forecasting results that will be useful for managing outbreaks follows a workflow with three main strata: data (blue circles), analytics (green circles), and communication (gray circle). Taken together, these pieces build a workflow that uses analytics to provide decision-makers with information that could be used to plan response activities difficult to recruit and retain scientists capable of developing, interpreting, and communicating quantitative results. Formalized training in "outbreak science" for public health practitioners will be a vital component in ensuring that the public and private sector work-force can respond quickly in case of an emerging epidemic threat 23,24 . Even when scientists are available in public health agencies, the long and bureaucratic processes for acquiring and securing software and data technologies present significant challenges to using current and emerging data science tools.

Analytics: forecasting
The U.S. government wisely spent decades developing weather forecasting capabilities and continues to invest in advancing the personnel, infrastructure, data, analytics and decision frameworks necessary for supporting these activities 25

Visualization and communication
Forecasting results must be communicated effectively to ensure they produce actionable insights. Visualizations play a key role. Academic groups have built data visualization tools to communicate forecasts 29 , but these largely rely on customized code. Analysts who develop forecast models typically have limited time to spend on visualization and lack advanced design skills. This can lead to hard-to-understand visualizations and misinterpretation of results when used to support decision making. However, recent work by CDC has progressively refined information from forecasting results on seasonal influenza and translated that information into actionable risk communications 4 . Such efforts should be encouraged and supported.

Conclusions
Experience from the successful application of analytical technologies across multiple industries can inform the development of technologies for infectious disease forecasting and outbreak science. Improving technologies across the forecasting workflow will significantly advance forecasting capabilities, enable involvement from multiple stakeholders (e.g., industry, government, and academia), and allow the field to develop a robust forecasting architecture. Such advances will improve public health response to outbreaks, mitigate economic losses, and save lives.