Main

Everything that happens twice will surely happen a third time

Paulo Coelho — The Alchemist

In late 2013 and early 2014, a lethal haemorrhagic fever spread throughout forested Guinea (Guinée forestière), undiagnosed for months. By the time it was reported to be Ebola, the virus had spread to three countries1 and was likely past the point at which case-level control measures, such as isolation and infection control, could have contained the nascent outbreak. In 2015, a new dengue-like illness was implicated in a dramatic increase in Brazil's microcephaly cases; one year later, analyses revealed that the Zika virus had been sweeping through the Americas, unnoticed by existing surveillance systems, since late 2013 (Refs 2,3,4).

Although public health surveillance systems have evolved to meet the changing needs of our global population, we continue to dramatically underestimate our vulnerability to pathogens, both old and new5. Indeed, the recent events in West Africa and Brazil highlight the gaps in existing infectious disease surveillance systems, particularly when dealing with novel pathogens or pathogens whose geographic range has extended into a new region. Despite the lessons learned from previous outbreaks6, such as the severe acute respiratory syndrome (SARS) epidemic in 2002–2003 and the 2009 influenza pandemic — particularly the need for enhanced national surveillance and diagnostic capacity — infectious threats continue to surprise and sometimes overwhelm the global health response.

The cost of these epidemics demands that we take action: with fewer than 30,000 cases, the Ebola outbreak ultimately resulted in over 11,000 deaths, left nearly 10,000 children without parents7 and caused cumulative gross domestic product losses of more than 10%8. As with prior crises, in the wake of Ebola, multiple commissions have offered suggestions for essential reforms8,9. Most focus on systems-level change, such as funding research and development or creating a centralized pandemic preparedness and response agency. However, they also call for enhanced molecular diagnostic and surveillance capacity coupled to data-sharing frameworks. This hints at an emerging paradigm for rapid outbreak response, one that employs new tools for pathogen genome sequencing and epidemiological analysis (Fig. 1) and that can be deployed anywhere. In this model, portable, in-country genomic diagnostics are targeted to key settings for routine human, animal and environmental surveillance or rapidly deployed to a setting with a nascent outbreak. Within our increasingly digital landscape, wherein a clinical sample can be transformed into a stream of data for rapid analysis and dissemination in a matter of hours, we face a tremendous opportunity to more proactively respond to disease events. However, the potential benefits of such a system are not guaranteed, and many obstacles remain.

Figure 1: A genomics-informed surveillance and outbreak response model.
figure 1

Portable genome sequencing technology and digital epidemiology platforms form the foundation for both real-time pathogen and disease surveillance systems and outbreak response efforts, all of which exist within the One Health context, in which surveillance, outbreak detection and response span the human, animal and environmental health domains.

PowerPoint slide

Here, we review recent advances in genomics-informed outbreak response, including the role of real-time sequencing in both diagnostics and epidemiology. We outline the opportunities for integrating sequencing with the One Health and digital epidemiology fields, and we examine the ethical, legal and social issues that must be addressed if we are to move towards an era of genomics-informed pathogen surveillance.

Genomics in rapid-response diagnostics

Next-generation sequencing (NGS) platforms have recently moved from proof-of-concept studies to routine use in the clinical microbiology laboratory10. Most NGS services rely on bench-top instruments and sequencing from culture. However, when trying to proactively detect emerging infections or in many rapid outbreak responses, the aetiological agent behind a cluster is often unknown. Even if the agent is known, both the limited culture capacity in a field laboratory and the need for diagnostic turnaround times in hours, not days, preclude sequencing from culture. Sequencing directly from a sample using a portable sequencing platform is therefore more relevant in the field. Similarly, the need for a sequencer that can withstand being shipped and operated under rough field conditions, coupled with the need for rapid turnaround, make small, portable sequencers an attractive option.

Clinical metagenomics. With its untargeted approach to sequencing, clinical metagenomics can cross disciplines in a way that clinical microbiology struggles to — identifying viral, bacterial, fungal and other eukaryotic pathogens in a single assay11 and coupling pathogen detection to pathogen discovery. Given the current high cost of the technique — conservatively estimated at several thousand dollars — it is most often used when dealing with potentially lethal infections that fail the conventional diagnostic paradigm, such as the recent diagnosis of an unusual case of meningoencephalitis caused by the amoeboid parasite Balamuthia mandrillaris12 or the diagnosis and treatment of neuroleptospirosis in a critically unwell teenager13. In the latter case, despite a high index of suspicion for infection, Leptospira santarosai was not detected by culture or PCR, as the diagnostic primer sequences were eventually found to be a poor match to the genome of the pathogen. Intravenous antibiotic therapy resulted in rapid recovery. In such an example, the costs are easily justified, particularly when offset against the cost of a stay in an intensive treatment unit. However, routine diagnostic metagenomics is currently limited to a handful of clinical research laboratories worldwide; it is therefore regarded as a 'test of last resort' and kept in reserve for vexing diagnostic conundrums.

Substantial practical challenges hinder the adoption of metagenomics for diagnostics (Fig. 2) (reviewed in depth in Ref. 11). Chief among these is analytic sensitivity, which depends on pathogen factors (for example, genome size, ease of lysis and life cycle); analytic factors (for example, the completeness of reference databases and the potential to mistake a target for a close genetic relative); and sample factors (for example, pathogen abundance within a sample and contaminating background DNA). As an example of a problematic sample, during Zika surveillance, attempts to perform untargeted metagenomics sequencing on blood yielded few, or in some cases zero, reads owing to low viral titres14. Target-enrichment technologies (reviewed in Ref. 15) such as bait probes can be employed, but even these were unsuccessful at recovering whole Zika genomes, necessitating PCR enrichment14. In addition to sensitivity, universal pathogen detection through clinical metagenomics is complicated by specificity issues arising from misclassification or contaminated reagents, the challenge of reproducing results from a complex clinical workflow, nucleic acid stability under varying assay conditions, ever-changing bioinformatics workflows and cost.

Figure 2: Challenges to in-field clinical metagenomics for rapid diagnosis and outbreak response.
figure 2

A mobile medical unit deploying a portable clinical metagenomics platform has been established at the epicentre of an infectious disease outbreak, but the team faces challenges throughout the diagnostic process and epidemiological response. For example, in the case of Zika virus, samples, such as blood, with low viral titres, a small genome of <11 kb and transient viraemia120 combine to complicate detection of viral nucleic acid by use of a strictly metagenomic approach. Furthermore, obtaining a sufficient amount of viral nucleic acids for genome sequencing beyond simple diagnostics requires a tiling PCR and amplicon sequencing approach14. Other challenges include, for example, access to a reliable Internet connection, the ability to collect sample metadata and translating genomic findings into real-time, actionable recommendations.

PowerPoint slide

Given these issues, could metagenomics replace conventional microbiological and molecular tests for infection? Recent studies have used metagenomics in common presentations, including sepsis16, pneumonia17, urinary tract infections18 and eye infections19. These have generally yielded promising results, albeit typically at a lower sensitivity than conventional tests and at a much greater cost. Despite these problems, two factors will drive sequencing to eventually become routine clinical practice. First, the ever-decreasing cost of sequencing coupled with the potential for cost savings achieved by using a single diagnostic modality versus tens or hundreds of different diagnostic assays — each potentially requiring specific instrumentation, reagents, validation and labour — is attractive from a laboratory operations perspective. Second, and perhaps most compelling, is the additional information afforded by genomics, including the ability to predict virulence or drug resistance phenotypes, the ability to detect polymicrobial infections and phylogenetic reconstruction for outbreak analysis.

Novel technologies: portable sequencing. Given that outbreaks of emerging infectious diseases (EIDs) most often occur in settings with minimal laboratory capacity, where routine culture and bench-top sequencing are simply not feasible, the need for a portable diagnostic platform capable of in situ clinical metagenomics and outbreak surveillance is evident. A trend towards smaller and less expensive bench-top sequencing instruments was seen with the 454 Genome Sequencer Junior system (which has since been discontinued), the Ion Torrent Personal Genome Machine (PGM) system and the Illumina MiSeq system, which were released in close succession20. Each of these instruments costs <$150,000 and puts NGS capability into the hands of smaller laboratories, including clinical settings. In 2014, the MinION from Oxford Nanopore Technologies was released to early access users21, heralding the potential for highly portable 'lab-in-a-suitcase' sequencing. The MinION is pocket-sized and is controlled and powered through a laptop USB connection. It is provided under a model whereby the hardware is free but the consumer pays a premium for the reagent and flow cell consumables. Compared with bench-top instruments, the absence of a rolling service contract or regular engineer visits makes it theoretically possible to scale this platform out to potentially unlimited numbers of laboratories. Importantly, the MinION has been used in field situations, including in diagnostic tent laboratories during the Ebola epidemic22,23 and in a roving bus-based mobile laboratory in Brazil as part of the ZiBRA project3,24. Others have taken the MinION to more extreme environments where even the smallest traditional bench-top sequencer could not go, including the Arctic25 and Antarctic26, a deep mine27 and zero gravity aboard the reduced-gravity aircraft (nicknamed the 'Vomit Comet')28 and the International Space Station29.

However, this technology is not yet a panacea; remaining challenges include high DNA or RNA input requirements (currently hundreds of nanograms), which often necessitate PCR-based amplification approaches; a flow cell cost of $500, keeping the cost per sample high despite multiplexing approaches; and high error rates, which require that genomes are sequenced to high coverage for single nucleotide polymorphism-based analysis and analysed at the signal level. Moreover, although the long reads produced by the MinION overcome a number of challenges in assembling eukaryotic microbial pathogen genomes, such as the presence of discrete chromosomes or long repetitive regions, the upstream nucleic acid extraction steps required to obtain genomic DNA vary across microbial domains and might necessitate reagents and equipment far less portable than the MinION.

Genomic epidemiology

From transmission to epidemic dynamics. Genomics is capable of informing not just pathogen diagnostics but also epidemiology. Pathogen sequencing has been used for decades to understand transmission in viral outbreaks, from early studies of hantavirus in the United States of America30 to human immunodeficiency virus (HIV) in the United Kingdom31; more recently, the approach has been successfully extended to include bacterial pathogens (reviewed in Ref. 32) and has come to be known as genomic epidemiology, a term encompassing everything from population dynamics to the reconstruction of individual transmission events within outbreaks32. Most transmission-focused investigations to date have been retrospective, with only a subset unfolding in real time, as cases are diagnosed33,34,35,36,37.

In transmission-focused investigations, genetic variants are used to identify person-to-person transmission events (Fig. 3), either through manual interpretation of the variants shared between outbreak cases38 or via model-based approaches39, with the result being a transmission network. Epidemic investigations are very different — only a subset of the epidemic cases are sequenced. Thus, the goal is to use the population structure of the pathogen to understand the overall dynamics of the epidemic. Here, phylodynamic approaches are used to infer epidemiological parameters of interest.

Figure 3: Inferring transmission events from genomic data.
figure 3

Genomic approaches to identifying transmission events typically involve four steps. In the first step, outbreak isolates, and often non-outbreak control isolates, are sequenced and their genomes either assembled de novo or mapped against a reference genome. Next, the genomic differences between the sequences are identified — depending on the pathogen and the scale of the outbreak, these may include features such as genetic variants, insertions and deletions or the presence or absence of specific genes or mobile genetic elements. In the third step, these features are examined to infer the relationships between the isolates from whence they came — a variant common to a subset of isolates, for example, suggests that those cases are epidemiologically linked. Finally, the genomic evidence for epidemiological linkages is reviewed in the context of known epidemiological information, such as social contact between two cases or a common location or other exposure. Recently, automated methods for inferring potential epidemiological linkages from genomic data alone have been developed, greatly facilitating large-scale genomic epidemiological investigations121.

PowerPoint slide

First conceptualized in 2004 by Grenfell et al. as a union of “immunodynamics, epidemiology, and evolutionary biology” (Ref. 40), phylodynamics captures both epidemiological and evolutionary information from measurably evolving pathogens — those viruses and bacteria for which high mutation rates and/or a range of sampling dates contribute to a meaningful amount of genetic variation between sequences41,42 — in other words, enough genetic diversity to be able to infer an evolutionary history for a pathogen of interest, even if that history is only over the short time frame of an outbreak or epidemic. This is possible for most pathogens, particularly single-stranded DNA viruses, RNA viruses and many bacterial species42,43, but there are certain species for which the lack of a strict molecular clock and/or frequent recombination complicate both phylodynamics studies and attempts to infer transmission events42.

Phylodynamics relies on tools such as Bayesian evolutionary analysis sampling trees (BEAST)44, in which sequence data are used to build a time-labelled phylogenetic tree using a specific evolutionary process as a guide — often variations on a theme of coalescent theory45. From the tree, one can infer epidemiological parameters, including the basic reproductive number R0 (Ref. 46). While the insights that can be gained from genomic data alone are exciting, the utility of phylodynamic approaches is greatly extended when additional data are integrated into the models (reviewed in Ref. 47).

Genomic epidemiology in action: Ebola. The many genomic epidemiology studies from the Ebola outbreak (reviewed in Ref. 48) used bench-top and portable sequencing platforms to reveal outbreak-level events and epidemic-level trends. Real-time analyses published around the peak of the epidemic suggested the following: the outbreak probably arose from a single introduction into humans and not repeated zoonotic introductions49,50; sexual transmission had a previously unrecognized role in maintaining transmission chains51; and survivor transmission — another unrecognized phenomenon — contributed to disease flare-ups later in the outbreak52. The first sequencing efforts, all of which had an effect on the epidemiological response in real time, unfolded months into the epidemic. Had they been deployed earlier, we can only speculate as to their potential impact. Arguably, the most compelling use of early sequencing would have been to provide a definitive Ebola diagnosis in this previously unaffected region of West Africa. However, even after the outbreak was underway, sequencing could have benefited the public health response. For example, ruling out bush meat as a source of repeated viral introductions could have changed public health messaging campaigns from avoiding bush meat to the importance of hygiene and safe funeral practices53, potentially averting some cases. Portable sequencing and phylodynamic approaches are currently being deployed in the ongoing Zika epidemic; whether the real-time reporting of genomic findings is able to alter the course of a vector-borne epidemic remains to be seen.

Retrospective phylodynamic investigations are also useful for pandemic preparedness planning. A recent analysis of 1,610 Ebola virus genomes — approximately 5% of all cases — reconstructs the movement of the virus across West Africa and reveals drivers for its spread1. The authors deduce that Ebola importation was more likely to occur between regions of a country than across international borders and that both population size and distance to a nearby large urban centre were associated with local expansion of the virus. These findings may affect decision-making around border closures in future Ebola outbreaks and point to the need to develop surveillance, diagnostic and treatment capacity in urban centres.

The role of the environment

In deploying genomics for surveillance, diagnostics and epidemiological investigation, a key question remains: where? Many regions lack the diagnostic laboratory capacity to carry out basic surveillance, but continuous genomic surveillance in all of these settings would be impossible. Numerous projects have attempted to describe the pool of geographic hot spots and candidate pathogens from which the next epidemic or pandemic will arise. Determining these factors is key to predicting and preventing spillover events (Fig. 4), but huge gaps in our understanding of disease ecology remain. Woolhouse et al. describe 1,399 human pathogens, of which 87 — mostly viral — have emerged since 1980 (Ref. 54). Jones et al. extend this to include 335 new EIDs since 1940 (Ref. 55). They report an increasing number of events each decade, generally located in hot spots defined by specific environmental, ecological and socio-economic characteristics.

Figure 4: Emergence of infectious diseases.
figure 4

In spillover, a pathogen previously restricted to animals gradually begins to move into the human population. During stage one (pre-emergence), as a result of changing demographics and/or land use, a pathogen undergoes a population expansion, extends its host range or moves into a new geographic region. During stage two (localized emergence), contact with animals or animal products results in spillover of the pathogen from its natural reservoir(s) into humans but with little to no onward person-to-person transmission. During stage three (pandemic emergence), the pathogen is able to sustain long transmission chains, that is, a series of disease transmission events, such as a sequential series of person-to-person transmissions, and its movement across borders is facilitated by human travel patterns65.

PowerPoint slide

Most EIDs are zoonotic in origin, with the highest risk of spillover in regions with high wildlife diversity that have experienced recent demographic change and/or recent increases in farming activity55. A global biogeographic analysis of human infectious disease further supports the use of biodiversity as a proxy for EID hot spots56, and reviews focused on systems-level, rather than ecological, factors identify the breakdown of local public health systems as drivers of outbreaks, suggesting that surveillance ought to be targeted to settings where biodiversity and changing demographics meet inadequate sanitation and hygiene, lack of a public health infrastructure for delivering interventions and no or limited resources for control of zoonoses and vector-borne diseases57.

These analyses provide a shortlist of regions, including parts of eastern and southeastern Asia, India and equatorial Africa, on which genomic and other surveillance activities should be focused55,58. Within these regions, sewer systems and wastewater treatment plants could be important foci for sample collection, providing a single point of entry to biological readouts from an entire community. Indeed, proof-of-concept metagenomics studies have revealed the presence of antibiotic resistance genes59, human-specific viruses60 and other pathogens of interest in this readily accessible sample type. Other recent surveys offer insight into what such systems might need to look for. In 2013, Rosenberg et al. reported that viruses dominate the list of agents newly recognized to cause disease in humans61. Most were zoonotic in origin, and over one-quarter had been detected in non-human species many years before being identified as human pathogens. A later review reiterates this observation, noting that recent agents of concern — Ebola, Zika and chikungunya — had been identified decades before they achieved pandemic magnitude62. As a result of NGS technology, the pace of novel virus discovery is accelerating, with recent large-scale studies revealing 184 new viruses sampled from macaque faeces in a single geographic location63 and 1,445 new viruses discovered from RNA transcriptomic analyses of multiple invertebrate species64. However, understanding which of these new entities might pose a threat requires a new approach.

One Health. The emergence of a zoonotic pathogen proceeds in stages65 (Fig. 4); in an effort to better anticipate these transitions and more proactively respond to emerging threats, the One Health movement was launched in 2004. Recognizing that human, domestic animal and wildlife health and disease are linked to each other and that changing land-use patterns contribute to disease spread, One Health aims to develop systems-minded, forward-thinking approaches to disease surveillance, control and prevention66. By investing in infrastructure for human and animal health surveillance, committing to timely information sharing and establishing collaborations across multiple sectors and disciplines, the goal of the One Health community is an integrated system incorporating human, animal and environmental surveillance — a goal in which genomics can have an important role.

The One Health approach has been implemented through the PREDICT project, which is part of the Emerging Pandemic Threats (EPT) programme of the US Agency for International Development (USAID). PREDICT explores the spillover of selected viral zoonoses from particular wildlife taxa67, and early efforts have focused on developing non-invasive sampling techniques for wildlife68, estimating the breadth of mammalian viral diversity across nine viral families and at least 320,000 undiscovered species69 and demonstrating that viral community diversity is at least a partially deterministic process, suggesting that forecasting community changes, which potentially signal spillover, is a possibility63. Although the goal of using integrated surveillance information to predict an outbreak is still many years away, One Health studies are already leveraging the tools and techniques of genomic epidemiology to understand current outbreaks.

Combining genomic data with data streams from enhanced One Health surveillance platforms presents an opportunity to detect the population expansions and/or cross-species transmissions that may precede a human health event. For example, genome sequences from a raccoon-associated variant of rabies virus (RRV), when paired with fine-scale geographic information and data from Canadian and US wildlife rabies vaccination programmes, demonstrated that multiple cross-border incursions were responsible for the expansion of RRV into Canada and sustained outbreaks in several provinces70; this finding led to renewed concern about and action against rabies on the part of public health authorities71. One of the first studies coupling detailed wildlife and livestock movement data with phylodynamic analysis of a bacterial pathogen revealed that cross-species jumps from an elk reservoir were the source of increasing rates of Brucella abortus infections in nearby livestock72; as the most common zoonosis of humans, brucellosis control programmes will benefit substantially from this sort of One Health approach73.

This model, in which diagnostic testing in reference laboratories triggers genomic follow-up, represents an effective near-term solution for integrating genomics into One Health surveillance efforts as the community explores solutions to the many challenges facing in situ clinical metagenomics surveillance of animal populations (reviewed in Ref. 74). Initial forays into this area have been successful; for example, metagenomics analysis of human diarrhoeal specimens and stools from nearby pigs revealed potential zoonotic transmission of rotavirus75. However, metagenomic sequencing across a range of animal species and environments yields more questions than answers. What is an early signal of pathogen emergence versus background microbial noise65? Which emerging agents are capable of crossing the species barrier and causing human disease74? What degree of sampling is required to capture potential spillovers67? Ultimately, a more efficient use of metagenomics in a One Health surveillance strategy might be scanning for zoonotic 'jumps' in selected sentinel human populations rather than a sweeping animal surveillance strategy62, with sentinels chosen according to EID hotspot maps and other factors65 and interesting genomic signals triggering follow-up sequencing in the relevant animal reservoirs. By combining genomic data generated through these targeted surveillance efforts with phylodynamic approaches, it will be possible to take simple presence or absence signals and derive useful epidemiological insights: signals of population expansion; evidence of transmission within and between animal reservoirs and humans; and epidemiological analysis of a pathogen's early expansion.

Digital epidemiology

Most modern surveillance systems use human, animal, environmental and other data76 to carry out disease-specific surveillance, in which a single disease is monitored through one or more data streams, such as positive laboratory test results or reportable communicable disease notifications. Despite marked advances over the preceding decades, testimony from multiple expert groups has repeatedly emphasized the need for improved surveillance capacity8,77, including the use of syndromic surveillance, a more pathogen-agnostic approach aimed at early detection of emerging disease78,79. Syndromic surveillance systems might leverage unique data streams such as school or employee absenteeism, grocery store or pharmacy purchases of specific items or calls to a nursing hotline as signals of illness in a population. Increasingly, digital streams are being used as an input to these systems, be they participatory epidemiology projects such as Flu Near You80, the automated analysis of trending words or phrases on social media sites, such as Twitter81,82, or Internet search queries83,84,85.

This new approach to surveillance is known as digital epidemiology and is also referred to as digital disease detection86. In digital epidemiology, information is first retrieved from a range of sources, including digital media, newswires, official reports and crowdsourcing; second, translated and processed, which includes extracting disease events and ensuring reports are not duplicated; third, analysed for trends; and fourth, disseminated to the community through media, including websites, email lists and mobile alerts87. At least 50 digital epidemiology platforms are currently operating88, and their flexible nature and cost-effective, real-time reporting make them effective tools for gathering epidemic intelligence, particularly in settings lacking traditional disease surveillance systems.

Modelling drivers of infectious-disease emergence. The fields of One Health and digital epidemiology are increasingly overlapping. In the PREDICT consortium, the HealthMap system89 and local media surveillance were combined to identify 307 health events in five countries over a 16-week period90. PREDICT also suggested a role for digital epidemiology in not just event detection but also the identification of changing EID drivers. EIDs are driven by multiple factors, many of which have digital outputs and represent novel sources of surveillance data91. For example, human movement can be revealed by mobile phone data or by the patterns of lighted cities at night, hunting data collected by states can reveal interactions between humans and wildlife, and social media and digital news sources can reveal early signals of famine, war and other social unrest. A major challenge is that the number of digital data sets available for each driver varies substantially, from hundreds for surveying land use changes — many based on remote sensing data92 — to mere handfuls around social inequalities and human susceptibility to infection, with most data biased towards North America and Europe.

The digital and genomic epidemiology domains are also starting to overlap. In the Ebola outbreak, digital epidemiology revealed that drivers of infection risk included settings where households lacked a radio, with high rainfall and with urban land cover93, echoing the evidence from a genomic study suggesting that sites at which urban and rural populations mix contribute to disease1. During the Zika epidemic, Majumder et al. used HealthMap and Google Trends to estimate the basic reproductive number R0 to be 1.42–3.8394; phylodynamic estimates from Brazilian genomic data gave similar ranges (1.29–3.85)3, indicating that both types of data streams can be leveraged in calculating epidemiological parameters that help shape the public health response.

A digital pathogen surveillance era

Recent reports have called for the integration of genomic data with digital epidemiology streams92,95. When informed by a One Health approach, the epidemiological potential of this digital pathogen surveillance system is profound. Imagine parallel networks of portable pathogen sequencers deployed to laboratories and communities in EID hot spots — regions that are traditionally underserved with respect to laboratory and surveillance capacity — and processing samples collected from targeted sentinel wildlife species, insect vectors and humans (Fig. 5). Samples would be pooled for routine surveillance — either through targeted diagnostics or, if the issue of analytical sensitivity can be overcome, through metagenomics — with a full genomic work-up of individual samples should a pathogenic signal be detected. At the same time, existing Internet-based platforms such as HealthMap and new local participatory epidemiology efforts would be collecting data to both identify potential hotspot regions and detect EID events, enabling both prospective and rapid-response deployment of additional sequencers. Genome sequencing data coupled with rich metadata would then be released in real time to web-based platforms, such as Virological for collaborative analysis and Nextstrain for analysis and visualization96. These sites — already used in the Ebola and Zika responses — would act as the nexus for a global network of interested parties contributing to real-time phylodynamic and epidemiological analyses and looking for signals of spillover, pathogen population expansion and sustained human-to-human transmission. Results would be immediately shared with the One Health frontline — epidemiologists, veterinarians and community health workers — who would then implement evidence-based interventions to mitigate further spread.

Figure 5: A future model for surveillance and early outbreak response.
figure 5

It is 2027, and our planet's changing climate and land-use patterns have meant that new emerging infectious diseases (EIDs) are spilling over into humans from wildlife reservoirs with increasing frequency. Building off EID hotspot maps developed in 2008 (Ref. 55), a global public health consortium has implemented an online surveillance tool that scans the digital output of citizens, news organizations and governments in those regions, including data from local retailers on key health-related products, such as tissues and over-the-counter cold remedies. In one such region, the syndromic surveillance system reports higher-than-average sales of a common medication used to relieve fever. Spatial analysis of the data from the pharmacies in the region suggests that the trend is unique to a particular district; a follow-up geographic information system (GIS) analysis using satellite data reveals that this area borders a forest and is increasingly being used for the commercial production of bat guano. An alert is triggered, and the field response team meets with citizens in the area. Nasopharyngeal swabs are taken from humans and livestock with fever as well as from guano and bat tissue collected in the area. The samples are immediately analysed using a portable DNA sequencer coupled to a smartphone. An app on the phone reports the clinical metagenomic results in real time, revealing that in many of the ill humans and animals, a novel coronavirus makes up the bulk of the microbial nucleic acid fraction. The sequencing data are immediately uploaded to a public repository as they are generated, tagged with metadata about the host, sample type and location and stored according to a pathogen surveillance ontology. The data release triggers an announcement via social media of a novel sequence, and within minutes, interested virologists have created a shared online workspace and open lab notebook to collect their analyses of the new pathogen.

PowerPoint slide

The pathway to such a reality is not without its roadblocks. Apart from technical and implementation challenges, a series of larger concerns surrounds the rollout of genomics-based rapid outbreak response, ranging from the uptake of a new, disruptive technology to effecting systems-level change on a global scale.

Ethical, legal and social issues. Sequencing-based diagnostics, particularly clinical metagenomics approaches, are still straddling the boundary between research and clinical use. In this realm, uncertainty is a certainty, be it uncertainty inherent to the technology itself or informational uncertainty, such as how accurate, complete and reliable results actually are97. Early adopters of genomics in the academic domain are used to uncertainty, often acknowledging and appraising it, but routine clinical use requires meeting the evidentiary thresholds mandated by a range of stakeholders, from regulators to the laboratories implementing new sequencing-based tests. Decision criteria that influence whether a new genomic test is adopted include the ability of the assay to differentiate pathogens from commensals, the correlation of pathogen presence with disease, the sensitivity and specificity of the test, its reproducibility and robustness across sample types and settings and a cost comparable to that of existing platforms98.

Validation — defining the conditions needed to obtain reliable results from an assay, evaluating the performance of the assay under said conditions and specifying how the results should be interpreted, including outlining limitations99 — is also critical. Much can be learned from the domain of microbial forensics, where sequencing is playing a large part100. Budowle et al. review validation considerations for NGS101, noting that this technology requires validating sample preparation protocols, including extraction, enrichment and library preparation steps, sequencing protocols, and downstream bioinformatics analyses, including alignment and assembly, variant calling, the underlying reference databases and software tools and the interpretation of the data. Complete validation of a sequencing assay may not always be possible, particularly for emerging pathogens. Therefore, just as the West African Ebola virus outbreak triggered a review of the ethical context for trialling new therapeutics and vaccines102, the scale-up of NGS in emerging epidemics will engender similar conversations. Rather than wait for this to happen, an anticipatory approach is best, outlining the exceptional circumstances under which unvalidated approaches might be used, selecting the appropriate approach and examining the benefits of a potentially untested approach in light of individual and societal interests.

If the social landscape surrounding the introduction of a new technology is not considered, prior experience suggests that the road to implementation will be difficult, with hurdles ranging from public mistrust to moratoria on research103. The enthusiasm of the scientific community for new technology must not lead to inflated claims of clinical utility and poor downstream decisions around the deployment of that technology. Howard et al. outline several principles for successfully integrating genomics into the public health system, and as we pilot digital pathogen surveillance, the community would do well to keep many of them in mind: ensuring that the instruments and processes used are reliable and that reporting is standardized and readily interpretable by end users; that the technology is used to address important health problems; that the advantages of the approach outweigh the disadvantages; and that economic evaluation suggests savings to the health care system and society104. It is also important to reconsider the role of the diagnostic reference laboratory in the new genomic landscape. As their mandates expand to include enhanced surveillance and closer collaboration with field epidemiologists, laboratory directors will face new challenges, from managing exploratory work alongside routine clinical care to hiring a new sort of technologist, one with basic genomics and epidemiology training.

The ethical, social and legal implications of digital pathogen surveillance are an emerging area of research (reviewed in Ref. 105). Chief among the issues that Geller et al. identify is the tension that exists when a new technology has the power to identify a problem but there is limited or no capacity to address the issue. Balancing the benefits and harms to both individuals and populations is challenging when the predictive insight offered by a genomic technology is variable — for example, using genomics to identify an individual as a 'super spreader' has important implications for quarantine and isolation, but that label may be predicated on a tenuous prediction. The problem is further compounded by the fact that many infectious disease diagnoses carry with them a certain amount of stigma and that an individual's right to privacy might be superseded by the need to protect the larger population105.

Data sharing and integration. A critical need for successful digital pathogen surveillance is the capacity for rapid, barrier-free data sharing, and arguments for such sharing are frequently rehashed after outbreaks and epidemics. Genomic epidemiology was born largely in the academic sphere, with early papers coming from laboratories with extensive histories in microbial genomics and bioinformatics. For this community, open access to genome sequences, software and, more recently, publications has tended to be the rule rather than the exception. Indeed, a 2004 National Research Council report described “the culture of genomics” as “unique in its evolution into a global web of tools and information” (Ref. 106). The same report includes a series of recommendations on access to pathogen genome data, including the statement that “rapid, unrestricted public access to primary genome sequence data, annotations of genome data, genome databases, and Internet-based tools for genome analysis should be encouraged” (Ref. 106).

As genomics has moved into the domain of clinical and public health practice, the notion of free and immediate access to genomic surveillance data has encountered several barriers: the siloing of critical metadata across multiple public health databases with no interoperability; balancing openness and transparency with patient privacy and safety; variable data quality, particularly in resource-limited settings; concerns over data reuse by third parties; a lack of standards and ontologies to capture metadata; and career advancement disincentives to releasing data107,108,109. Despite these challenges, the spirit of open access and open data remains strong in the community, with over 40 public health leaders from around the world recently signing a joint statement on data sharing for public health surveillance110. The Ebola and Zika responses in particular highlight the role of real-time sharing of data and samples, be it through the use of chat groups and a LabKey server to disseminate Zika data111 or GitHub to share Ebola data112.

In the wake of Ebola, Yozwiak et al.113 and Chretien et al.114 outline additional issues facing data sharing, from differing cultures and academic norms to complicated consent procedures and technical limitations. They note that we as a community must agree on standards and practices promoting cooperation — a conversation that could begin by examining how the Global Alliance for Genomics and Health (GA4GH) framework for responsible sharing of genomic and health-related data (Box 1) could be adapted for the digital pathogen surveillance community.

The future: the sequencing singularity?

Transformative change to public and global health is profoundly difficult. Complicating the existence of a rapid, open, transparent response is the fact that no matter the setting, there are often conflicting interests at work. In an outbreak scenario, conflict may result from governments wishing to keep an outbreak quiet and/or from the tension between lower-income and middle-income countries with few resources for generating and using data and the researchers or response teams from better-resourced settings115. Indeed, the conflicting values in outbreak responses meet the definition of a 'wicked' problem, where issues resist simple resolution and span multiple jurisdictions and where each stakeholder has a different perspective on the solution. Even the International Health Regulations (IHR), which ostensibly provide a legal instrument for global health security, fail to effect a basic surveillance and outbreak response. As of the most recent self-reporting, only 30% of the 196 member countries of the IHR are in compliance, meeting the prescribed minimum public health core capacities5. In these settings, digital pathogen surveillance must be within the purview of the larger global health community and its diverse group of non-state actors rather than being solely the responsibility of nations themselves116. This raises an important issue: if nations are willing to cede a certain amount of surveillance and diagnostic control to the global health community, the notion of reciprocity suggests that they should derive some corresponding local benefit. The 'trickle-down' effects of global genomic surveillance have yet to be fully articulated, but they are likely to be realized first in the zoonotic domain, where global surveillance efforts will feed back into improved animal health at a local level, in turn benefiting local farmers.

Outbreaks occur at the intersection of risk perception, governance, policy and economics117, and outbreak response is often based on political instinct rather than data5. Building a resilient and responsive public health system is therefore more than just enhancing surveillance and coupling it to novel technology — it is about engagement, trust, cooperation and building local capacity8, as well as a focus on pandemic prevention through development rather than pandemic response via disaster relief mechanisms57. Expert panels convened by Harvard and the London School of Hygiene and Tropical Medicine9 and by the National Academy of Medicine8 have called for a central pandemic preparedness and response agency and also underscored the need for deeper partnerships between formal and informal surveillance, epidemiology and academic and public health networks5. More recently, evolutionary biologist Michael Worobey wrote: “Systematic pathogen surveillance is within our grasp, but is still undervalued and underfunded relative to the magnitude of the threat” (Ref. 118). If we are to achieve the sequencing singularity — the moment at which pathogen, environmental and digital data streams are integrated into a global surveillance system — we require a community united behind a vision in which public health and the attendant data belong to the public and behind the idea that we are a better, healthier society when the public is able to access and benefit from the data being collected about us and the pathogens we share the planet with.

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.