Main

Over the last decade, advances in genomics have driven innovations in medicine on multiple fronts. Gene sequencing, genotyping arrays, and the subsequent development of high-throughput genomics have led to detailed catalogs of human genetic variation. The completion of the Human Genome1,2, HapMap3, and the 1000 Genomes Project4 have facilitated the development of the promising field of ‘precision medicine’ and spurred the creation of large-scale initiatives, such as the UK Biobank (http://www.ukbiobank.ac.uk/) and the US-based All of Us Research Program (https://allofus.nih.gov/). These projects aim to use the power of genomics and other technologies to advance human medicine beyond interventions based on population-level averages towards personalized treatment tailored for each individual5.

Genomic technologies are also transforming another area of human health—response with precision to infectious diseases6. The world is increasingly interconnected, which, in part, is why in recent years there have been several large-scale infectious disease epidemics, often from unexpected sources, including SARS and MERS coronaviruses7,8, H1N1/A influenza virus9, Ebola virus10,11, and Zika virus12. During many of these outbreaks, sequencing of virus genomes directly from infected individuals has helped to accurately elucidate the source, timing, transmission, and spread of disease. This new field of inquiry has been termed ‘genomic epidemiology’6. During the 2013–2016 Ebola epidemic in West Africa, for example, more than 1,600 patients with Ebola (>5% of confirmed cases) had virus genomes sequenced from their blood, and the resulting data provided valuable insights into how the epidemic started, spread, and evolved11,13,14.

Epidemiological approaches to infectious disease control have traditionally relied on case (incidence) data and interview-based contact tracing to estimate key epidemic parameters (for example, basic and effective reproduction numbers, incubation period, serial interval) and to reconstruct transmission chains. These data, however, can be limited by incomplete case reporting due to the labor-intensive nature of contact tracing or uncertain reporting due to the use of clinical symptoms to identify cases.

Although these traditional data sources still play critical roles in informing outbreak interventions, high-throughput and near-real-time pathogen genome sequencing is transforming infectious disease epidemiology6,11,12. By increasing both the scale and resolution of inference, genomic technologies are enabling a more targeted approach to infectious disease control at both the individual and population level, which we refer to, collectively, as ‘precision epidemiology’ (Table 1). We will briefly outline how genomic technologies are enabling precision epidemiology by allowing the design of better intervention strategies for individual patients and for affected populations as a whole (Fig. 1).

Fig. 1: Pathogen sequencing during infectious disease outbreaks can inform precise interventions.
figure 1

Technological advances are enabling the broad application of pathogen genome sequencing for our response to outbreaks of infectious disease. Whole-genome sequencing of many pathogens can now be done directly from clinical samples and in near real time during an outbreak. By analyzing these genomes and their metadata in the context of other sequences generated from the same outbreak, as well as previously characterized variants, researchers can inform individual- and population-level intervention strategies to minimize the burden of infectious diseases. We term the collective approach—sequencing, analysis, and response—as precision epidemiology.

Precision epidemiology in the clinic

The driving principle behind precision medicine is that one size does not, in fact, fit all15. To date, the field has primarily focused on the use of patients’ own genomic information to make personalized decisions about disease treatment5. During infectious disease outbreaks, however, genomic sequence information from the pathogen is arguably more important than an individual’s genomic data for designing appropriate treatment and intervention strategies16.

The practice of utilizing pathogen genotypic information for the diagnosis and treatment of infectious diseases is not new, but technological advances, most notably in the targeted enrichment of pathogen nucleic acids17,18,19 and next-generation sequencing20, have greatly improved the prospect of broadly applying this approach in the clinic. In the past, practical applications of pathogen genotyping were limited by the slow pace of sequencing and its focus only on specific genes—or even portions of genes. Today, in contrast, researchers can characterize entire viral and bacterial genomes from infected individuals in near real time6. Given enough sequence coverage, they can also characterize minor genetic variants in pathogen genomes present within an individual patient, which can be critically relevant in directing clinical care21,22.

Although not typically presented as precision medicine, pathogen genomic information has been used successfully to assess drug sensitivity and/or resistance on a patient-by-patient basis for several significant human pathogens, including HIV23, influenza virus21, and Mycobacterium tuberculosis24. This information can be used—in a manner analogous to human genotypes—to guide the design of individualized drug regimens (for example, antibiotics and antivirals) (Fig. 1). Applying genomic technologies during the development and usage of immunotherapeutics (for example, monoclonal antibody cocktails25) and vaccines can also provide insights into pathogen strategies for immune response evasion26,27 and mechanisms of virulence28,29. By characterizing longitudinal samples from the same patients, pathogen sequencing also provides the potential for identifying genetic components involved in driving disease progression, thus providing novel drug targets30.

Point-of-care molecular tests tailored to individual pathogens have dramatically increased the speed and specificity of infectious disease diagnosis, though there is still considerable room for improvements in sensitivity31. One advantage of genomic approaches is that molecular diagnostics can be modified in light of pathogen sequence information generated during an outbreak6. This, for example, was achieved during the 2013–2016 Ebola epidemic, when rapidly generated virus genome sequences were used to update PCR-based diagnostics so that they more closely matched the Makona variant of Ebola virus responsible for the epidemic32.

In addition to the utility of genomic technologies for improving traditional diagnostic tests, metagenomic next-generation sequencing—in which all genomic information, including microbial material, is sequenced in an untargeted manner—holds great promise as a general approach for the detection and characterization of pathogens without the need for a priori knowledge of the potential causative agent33,34. Because metagenomic approaches do not target particular pathogens, they are equally applicable to the detection of expected pathogens as they are to the detection of novel pathogens—such as the emergences of SARS7 and MERS8—or to the detection of known pathogens in new places, as was illustrated by Ebola virus in West Africa during the 2013–2016 epidemic14. The combination of highly multiplexed target capture and next-generation sequencing is particularly promising, as it increases both sensitivity and specificity. Such an approach is feasible because it is now possible to multiplex millions of individual pathogen-specific probes, each of which can enrich for highly divergent nucleic acids (up to ~40% divergence)19.

Precision epidemiology informs outbreak response

Pathogen genomes can also be used to inform population-level intervention strategies for infectious disease outbreaks. In contrast to the design of individual-level treatment strategies, in which the functional roles of host and/or pathogen mutations are critical, outbreak-scale genomic analyses use pathogen mutations as markers of transmission events. Genomic epidemiology exploits the rapid evolution of pathogens, which often accumulate mutations on the same timescale as their epidemiological spread35, to reconstruct outbreak dynamics from genomic data. With sufficient sampling, relevant metadata (such as location and date) and an appropriate statistical framework, pathogen genomes can reveal patterns of epidemic transmission at a fine-scale resolution, thus enabling the design of targeted interventions that are more precise than those based on traditional epidemiological data alone.

Table 1 Examples of precision epidemiology

One application of precision epidemiology during outbreaks is the identification of causal pathogens and their modes of transmission. Large-scale virus genome sequencing efforts during the 2013–2016 Ebola epidemic, for example, showed that it resulted from a single cross-species ‘spillover’ event of Zaire ebolavirus, from an animal reservoir to humans, followed by sustained human-to-human transmission11. However, while human-to-human transmission typically occurs through direct contact with bodily fluids from a symptomatic individual, genomic epidemiology also demonstrated the potential for sexual transmission of Ebola virus from persistently infected asymptomatic individuals36. This mode of dissemination played a critical role in prolonging the Ebola epidemic in West Africa, and as a result of genomic studies, the World Health Organization (WHO) made an immediate change to their guidance for Ebola survivors and reccomended repeated diagnostic characterization of semen samples prior to two consecutive negative results37. In contrast, genomic epidemiological studies of Lassa fever, which is endemic in West Africa38, showed that human cases of Lassa fever are the result of multiple independent spillovers from a Mastomys natalensis rodent reservoir, with limited human-to-human transmission38,39.

One of the most advanced population-level applications of precision epidemiology is food safety, where it is used for pathogen identification and source attribution. Genome sequencing of foodborne bacterial pathogens now forms part of many surveillance systems, and outbreak investigations in the United States are routinely performed by the Food and Drug Administration’s GenomeTrackr Network. In recent years, this network has grown into an international collaboration among 63 government, private, and academic research laboratories40,41. Through near-real-time genome sequencing and public data deposition of clinical, environmental, and food-related bacterial isolates, this network is streamlining the process of recognizing, investigating, and reducing the impact of foodborne disease outbreaks42,43. The success of this approach was demonstrated recently through a broad investigation of several foodborne Listeria monocytogenes outbreaks across the United States44.

Phylogenetic analysis of pathogen genomes can also be used to elucidate the spatial and temporal scales of transmission, which are critical for the design of effective public health interventions. HIV sequences, for example, have been used to reconstruct transmission networks in detail, with the goal of focusing the use of antiretroviral drugs, along with screening and prevention education messages, in a targeted manner to interrupt community spread45,46. Likewise, Zika virus genomes have been used to determine the relative contributions to epidemic growth of local vector-borne transmission versus repeated reintroductions from travelers in sustaining Zika outbreaks in the Americas47,48. Phylogenetic investigations have also been critical for disentangling the roles of community- and hospital-based transmission of bacterial pathogens49. In one example, whole-genome sequences of methicillin-resistant Staphylococcus aureus (MRSA) indicated that a persistently infected healthcare worker in Cambridge, UK likely played a key role in sustaining transmission within a particular hospital unit50. This analysis directly led to infection control interventions, including targeted pathogen decolonization efforts.

Genomically informed transmission trees are also used to directly estimate key epidemic parameters (such as the basic and effective reproduction numbers of an outbreak), either independently or in combination with incidence data51. Such analyses can provide rapid estimates of pandemic potential and are used to evaluate the effectiveness of interventions51,52. Genomic data can even provide information on within-outbreak population structure (that is, differences in transmission dynamics between geographic locations or risk groups)53 and the proportion of unreported cases54.

Finally, sequencing allows us to monitor genetic changes over time in pathogen populations, an understanding of which is critical for the design of effective diagnostics and countermeasures. Vaccines, for example, are our primary line of defense against seasonal influenza. However, influenza viruses evolve quickly to evade immune responses to previously circulating variants or prior vaccinations. Genetic sequencing and large-scale bioinformatic analysis provide powerful tools for tracking the evolution of influenza viruses in real time55 and for predicting the strains likely to be most prevalent each year. The seasonal influenza vaccine can then be regularly updated to reflect projected changes in the global population of influenza strains56.

Challenges for precision epidemiology during outbreaks

Advances in sequencing technologies are enabling the development and use of innovative genomic approaches for the treatment and prevention of infectious diseases. Adoption of genomic epidemiology into effective outbreak responses, however, will require the establishment of improved mechanisms for coordination between academic researchers and public health agencies. This includes changes to research practice regarding the benefits for rapid and open sharing of data and results as well as a focus on building capacity for sequencing and analysis within public health agencies and the regions most severely impacted by infectious disease57,58.

Comprehensive and carefully organized sampling of pathogen genomes from patients along with rich sets of metadata (Box 1) are required to improve the accuracy and resolution of outbreak transmission patterns reconstructed using genomic epidemiology. Sampling is typically performed or coordinated by local hospitals and departments of health, national entities such as the US Centers for Disease Control and Prevention (CDC), or international groups like the World Health Organization (WHO) and Médecins Sans Frontières. Expertise in genome sequencing, bioinformatics, and phylogenetic analysis, in contrast, is typically concentrated within academic and government research laboratories. Therefore, at this point in time, for precision epidemiology to be successfully implemented, it is critical that researchers and public health agencies work together in close coordination. Such collaborations were critical during responses to the recent Ebola and Zika epidemics; however, the approach to establishing these partnerships was largely unsystematic and, in many cases, delayed because of the need to establish relationships during the course of public health emergencies59.

One important approach to accelerating responses in the future is to build genome sequencing and analysis capabilities within public health agencies and hospitals as well as in developing countries disproportionately impacted by infectious disease outbreaks. Several such efforts are currently underway, including the Association of Publich Health Laboratories (APHL)–CDC bioinformatics fellowship program (https://www.aphl.org/fellowships/pages/bioinformatics.aspx) and the H3Africa initiative, which is backed by the US National Institutes of Health and the UK Wellcome Trust60. Genomics programs within public health agencies and at individual hospitals would streamline the process of integrating genomic data into outbreak response efforts. Genomic epidemiology, however, is a rapidly evolving field with a strong theoretical foundation, and owing to differences in priorities, academic research groups will likely continue to be at the forefront of tool development and implementation. Therefore, it is imperative that researchers develop a framework of norms and rules governing research conduct during and between outbreaks61, establish diverse networks of technical response teams, and produce action plans. This framework needs to be implemented in advance of an outbreak and coordinated through international organizations, like the WHO, and oversight committees within the United Nations59.

It is critical that data and analyses are shared openly during infectious disease outbreaks to ensure the most comprehensive and efficient response possible while ethical constraints also receive close attention. This includes the public release of raw genome sequence data as well as analysis results, which should be provided in a format that conveys to nonspecialists the complexities and uncertainties associated with interpretation. Further development of portable instruments6 for in-country sequencing and online analysis platforms62,63 will continue to advance the rapid generation and open dissemination of data, analyses, and actionable insights. However, concerns regarding the perceived career benefits of slower or more limited public access to outbreak data remain a barrier to open science within the research community. Despite this, there are signs of progress. During recent outbreaks, many researchers made data and analyses available and participated in open discussions via online depositories and forums, such as GitHub and Virological.org, with complete manuscripts often made available prior to publication via preprint servers such as the bioRxiv64. We hope that the successes of the research collaborations that followed this approach will help to increase participation in the future. These movements towards making outbreak data more openly available are also supported by several major public health agencies, including the WHO, which recently called for data relevant to public health emergencies to be distributed immediately and freely upon generation65,66.

With the current capabilities, cost, and speed of sequencing technologies, the field has finally reached a point where rapid genomic surveillance and analysis can start to become a standard part of the response to infectious disease outbreaks. Just as broadscale human genome sequencing revolutionized the treatment of many noncommunicable diseases, pathogen genome data are poised to drive a similar revolution in the response to infectious diseases.