Pathogen surveillance strategies have varying aims, ranging from identifying species and strains to detecting antibiotic resistance genes or plasmids. In this Perspective, we focus on the potential for the application of metagenomics to improve pathogen surveillance for public health decision-making. This differs from metagenomics for clinical diagnostics, in which the primary aim is to determine the presence of disease-causing pathogens in a sample (or samples) from an individual for the purpose of clinical management. Pathogen surveillance for public health is routinely carried out in hospitals, water-purification plants and farms, and is specifically applied in locations thought to be a probable source of emergent disease. The overall aims of public health pathogen surveillance are to monitor known and yet-to-emerge pathogens and to drive early risk mitigation programmes to protect human and animal health.

Traditional pathogen surveillance for public health

Existing public health pathogen surveillance methods include syndrome-based infection event monitoring, active surveillance for specific pathogens, for example, methicillin-resistant Staphylococcus aureus, and laboratory-based surveillance1. Most laboratory-based surveillance efforts focus on monitoring for clinically relevant features, such as antimicrobial resistance (AMR) and disease-causing serotypes in cultured isolates. In addition, specific rooms or wards in hospitals, for example, augmented care units, are routinely surveyed for a predetermined list of pathogens using culture-based methods2,3,4,5. When environmental fomites are epidemiologically implicated in investigations of outbreaks, targeted sampling of air, water and surfaces is recommended to establish potential sources5. Cultivated isolates from investigations of outbreaks are usually subjected to various typing schemes, for example, multilocus sequence typing (MLST), to establish disease transmission. As a result, a wide variety of laboratory techniques, equipment, reagents and associated specialized skills are required to carry out surveillance of a subset of known human pathogens in hospitals.

Pathogen surveillance in communities is mainly carried out in hospitals, water-purification plants and farms, and a variety of strategies is used. These conventional approaches are summarized below.

Prevention and management of outbreaks

Public health and sentinel laboratories support infectious disease surveillance and investigations of outbreaks by participating in disease notification, monitoring of known human pathogens and detection of pathogens in patient samples6 (Table 1). In addition to pathogen identification by culturing and molecular testing, surveillance can include strain typing using different assays, for example, serotyping for Neisseria meningitidis, MLST for S. aureus and antibiotic resistance testing. Traditional surveillance protocols often require a wide range of laboratory capabilities and assays (Fig. 1), which makes it challenging to integrate information at a local or national level, let alone globally, despite attempts to standardize processes (for example, WHO/CDS/CSR/ISR/99.2)6. Furthermore, techniques such as serotyping require manipulation of viable cultures of bacteria, some of which pose substantial risks of laboratory-acquired infections (LAIs), for example, N. meningitidis or Salmonella enterica serotype typhi. The use of whole-genome sequencing (WGS) for bacteria, viruses and, to a certain extent, fungi, is gaining traction but has not become the standard for most sentinel laboratories. Although WGS is well suited to this purpose, upfront resource and human capital requirements have been a barrier to adoption, particularly in low- and middle-income countries.

Table 1 Microbiological surveillance strategies
Fig. 1: Current pathogen surveillance workflows.
figure 1

Existing laboratory-based surveillance efforts focus on monitoring characteristics of culture isolates, such as AMR, serotype and virulence factors. Typically, the culturing step is followed by various typing and phenotypic assays that require specialized equipment and techniques and significant time and resources. The time taken to obtaining a pure isolate varies based on the growth rate of each species (two days and up to several weeks), and the resultant isolate is subjected to downstream phenotypic and molecular analysis based on specific requirements of each sector and species. BSL-3, biosafety level 3.

Food and water safety

Traditional microbiological safety monitoring for food relies on risk assessment and subsequent targeted detection of pathogens and hygiene or safety indicator organisms in food7,8. A wide variety of culture methods are required to enrich and select for specific species from complex food matrices (Table 1). The monitoring of foodborne AMR is based on susceptibility testing of selected bacterial foodborne pathogens9,10. For drinking water, monitoring of microbial contamination is similarly risk based11. Although monitoring of indicator bacterial organisms (for example, Escherichia coli) is widely adopted, waterborne viral and protozoan pathogens are not consistently addressed11. Since the early 2010s, WGS has also been effectively deployed in high-income countries to track foodborne pathogens in investigations of outbreaks12. Of note, although the burden of foodborne disease is highest in low-income countries13, these countries often lack the resources to adopt WGS for mitigating outbreaks.

Human–animal–ecosystem interface

Livestock has been implicated in various zoonotic infectious disease outbreaks, including those caused by avian influenza, swine influenza and Nipah virus14,15. With the intensification of livestock farming, surveillance will probably play an increasingly important part in our preparedness against epidemics and pandemics. Surveillance for zoonotic diseases requires human health, animal health and agricultural sectors to effectively communicate about disease trends, as well as taking part in collaborative testing of samples from human, animal (wildlife, livestock and companion animals), environment (both soil and water), vector and food sources16. For example, zoonotic influenza A viruses, such as the highly pathogenic avian influenza strain H5N1, which is panzootic among poultry and wild birds, are priority pathogens. The Global Influenza Surveillance and Response System (GISRS) supports detection and whole-genome characterization of influenza in human clinical samples around the world17, and these data are integrated by the World Health Organization (WHO) to monitor influenza activity and to guide vaccine composition. For influenza viruses in animals, the Global Early Warning System for Major Animal Diseases (GLEWS) combines and coordinates alert mechanisms of the WHO, the Food and Agriculture Organization of the United Nations (FAO) and the World Organization for Animal Health (OIE) for surveillance of wildlife, livestock, poultry and the environment. However, the coverage remains variable and is not adequately resourced in most countries (Table 1). The public health threat posed by zoonotic and emerging diseases is further highlighted by the ongoing coronavirus disease 2019 (COVID-19) pandemic, in which a low barrier to cross-species transmission is evidenced by the increasing number of animal species reported to be infected by reverse zoonosis18.

Community health

Monitoring of wastewater (sewage) treatment systems has frequently been mooted as a promising way to assess community health, and has been adopted in polio eradication programmes in various countries such as Egypt, India, Israel, Pakistan, Afghanistan and Nigeria19 (Table 1). The COVID-19 pandemic has accelerated efforts to improve sample processing and sensitivity of assays20, as well as serving to further highlight the usefulness of wastewater monitoring to uncover and interrupt outbreak clusters21 (Fig. 1). However, virus isolation from wastewater requires specialized laboratory setup and skilled personnel. In the case of high-risk pathogens such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), manipulation of samples beyond routine clinical diagnostics necessitates biosafety level 3 containment, which is only accessible in well-resourced centres with highly skilled personnel.

Healthcare-associated infections

In addition to surveillance for clinically relevant bacterial and fungal isolates, healthcare environments, particularly air and water, are routinely sampled for pathogens and other microbial contamination22,23 (Fig. 1). For example, in augmented care units, periodic culture of potable water for Legionella species is performed24, and water used for haemodialysis is monitored for total viable bacterial cell counts and endotoxin levels22. Some healthcare guidelines advocate monitoring for Pseudomonas aeruginosa if the water used is in direct contact with patients in augmented care25. Routine surveillance of water used in healthcare settings can be technically demanding and requires a variety of culture media and culture conditions as well as trained personnel (Table 1).

AMR surveillance

The global antimicrobial resistance surveillance systems (GLASS) network has fostered AMR surveillance and enabled the collection, integrated analysis and sharing of standardized human bacterial pathogen AMR data by participating countries around the world26. AMR surveillance in its current state relies on the successful culture and isolation of the pathogens of interest (Table 1). The culture isolate is then subjected to standardized antimicrobial susceptibility testing for a predetermined panel of antibiotics. AMR present in the food chain, crops, animals and environment can be mobilized into bacterial pathogens of humans and animals through horizontal gene transfer and thereby affect human and animal health. AMR surveillance in these non-healthcare domains, however, is less established and standardized.

Integrated pathogen surveillance

Current initiatives for integrated surveillance, including guidelines, laboratory methods, data collection and analyses, continue to be segregated by One Health sectors (for example, human health, animal health and plant health). Given the diversity of sample types and microbial species targeted, a range of processing and culture methods are required for optimal recovery and isolation. As known human pathogens are prioritized and selected for during the culture process, any novel, emerging or atypical pathogens and their associated AMR determinants will probably not be detected. The overall expense, lack of consistent workflows and outcomes, targeted nature of many assays (in which you only find what you are looking for) and paucity of cost-effective options that are less biased27 remain substantial hurdles to success. Correspondingly, pathogen surveillance, particularly outside healthcare sectors, remains under-resourced, and the complexity and requirements for multiple assays will probably contribute to the overall suboptimal surveillance of pathogens.

The availability of inexpensive, high-throughput sequencing technologies has paved the way for the development of a new generation of pathogen surveillance methods28. Metagenomics—the interrogation of genetic material from a sample containing multiple species—could enable culture-free methods and workflows for the detection of known and unknown viruses, bacteria and fungi. Metagenomic studies are often either done using a targeted amplicon approach that is based on marker genes (for example, 16S rRNA gene sequencing) or shotgun metagenomic sequencing to probe the entire genetic content of a sample. Metagenomic sequencing of samples can also enable the detection of virulence and resistance determinants outside their genome contexts. With improvements in ease of analysis and cost, the majority of biosurveillance efforts across various sectors would benefit from the adoption of metagenomic approaches. Despite its promise, there are technical and analytical complexities in metagenomic workflows that either need to be addressed or accounted for before integration into routine laboratory-based surveillance.

Next, we review the challenges in sample preparation, sequencing and bioinformatic analyses in metagenomics, examine examples of real-world applications of metagenomics in pathogen surveillance and evaluate the advances needed to enable adoption of these methods. We also discuss the potential for metagenomics-based surveillance to accelerate and transform global efforts to detect risks to human health within a One Health framework.

Metagenomics for pathogen surveillance

Metagenomic workflows for pathogen surveillance29 are unified by the use of sequencing as a read-out. Long-read and short-read sequencing technologies have both been used for metagenomics, in which sequence data can be sifted using computational approaches to reconstruct exquisitely detailed outputs that are not constrained by the availability of cultures or references genomes.

Nevertheless, there are several considerations related to sample acquisition and nucleic acid processing that are invariably application-specific and affect downstream steps. Metagenomic workflows need to take into account surveillance objectives, resource availability, operational feasibilities and technical compatibilities, as highlighted in Table 2. The following sections outline the main components of a metagenomic workflow and we illustrate their application using recent case studies. Although some of the challenges for sample manipulation and processing (for example, dealing with PCR inhibitors) are applicable across molecular assays, different metagenomic sequencing technologies (particularly long-read sequencing) impose additional constraints (for example, DNA purity, molecular weight, among others).

Table 2 Steps to success in metagenomic workflows

Sample collection

This step has the greatest variability in metagenomic workflows and needs to account for a variety of sample matrices30 and their physical and chemical properties. These include swabs and tapes (for example, skin, nasal and environmental), filters, for example, for water, sludge and semi-solids (for example, stool, sewage, soil or sediment and food) and fluids (for example, saliva). Some samples, such as sludge, may need to be collected in large volumes to obtain sufficient biomass, and associated leakage concerns can pose logistical challenges to safe sampling. Other considerations include the need to prevent cross-contamination, a need for culture stocks and whether cold-chain storage or transport in stabilization medium is needed to minimize overgrowth or nucleic acid degradation (Table 2).

Nucleic acid extraction

The choice of nucleic acid extraction protocols is primarily determined by the trade-off between the ability to effectively lyse cells31 using harsh physical disruption, such as bead beating, versus the importance of preserving DNA integrity through treatment by enzyme cocktails such as MetaPolyzme, which is a mixture of achromopeptidase, chitinase, lyticase, lysostaphin, lysozyme and mutanolysin. Certain sample types are inherently more challenging to process, for example, stool, food and soil samples, owing to PCR inhibitors and the need to separate nucleic acids from complex matrices with debris of varying sizes and consistency. Correspondingly, some sequencing protocols can be less forgiving, requiring suitable nucleic acid quantity and quality to enable optimal throughput32.

An additional aspect that affects this step is the relative proportion of microbial nucleic acids in the sample, a factor that influences the cost, sensitivity and feasibility of analysis33. Microbial enrichment can either be done before nucleic acid extraction (for example, through centrifugation, filtration, culture enrichment or flow cytometry) or as part of the protocol, for example, using selective lysis steps to deplete eukaryotic cells34 or using targeted probes to enrich and amplify microbial DNA35 (Table 2).

Library preparation and sequencing

Current sequencing technologies are either second-generation systems that provide short reads at low cost (~100 base pairs, for example, Illumina) or third-generation instruments with longer reads but with cost and accuracy trade-offs36 (for example, Pacific Biosciences and Oxford Nanopore Technologies). Applications in which accurate quantification or higher sensitivity is needed through deeper sequencing generally use short-read platforms, whereas de novo assembly is greatly improved by the inclusion of long reads. Other features that can affect the choice of a platform for surveillance include portability, cost of acquisition and maintenance, and time-to-answer (Table 2). Recent developments in real-time nanopore sequencing using a portable thumb-drive-sized device37 have massively expanded the feasibility of in-field sequencing-based surveillance programmes. Although library preparation workflows are largely determined by sequencing instruments, advancements in linked read38 and Hi-C39 protocols have broadly expanded our ability to generate high-quality genomes from metagenomic data.

Bioinformatic analyses

The analysis of metagenomic data can be done at various resolutions, from identifying and quantifying specific taxa using taxonomic classifiers to analyses of genes and pathways40,41,42,43, and finally to whole-genome reconstruction, annotation and phylogenetic analyses. This is facilitated by a range of specialized tools, such as taxonomic classifiers, read mappers, pathway reconstruction tools and genome assembly programs40,41,42,43. In Table 3, we highlight some of the more popular pipelines and webtools that integrate these capabilities into user-friendly workflows accessible to non-specialists. In addition, there are cloud-deployable clinical metagenomic computational workflows such as SURPI44,45 and IDseq46 for pathogen detection and identification.

For metagenomic surveillance, improved capabilities in generating metagenome-assembled genomes (MAGs) have been a transformative factor as they enable finer-grained transmission analysis and more comprehensive antibiotic resistance predictions. These include improved binning strategies that retrieve bins of short-read contigs that belong to the same genome47 and long-read48 and hybrid assembly49 programs that circumvent higher error rates to produce more contiguous assemblies. Near-complete genomes can now be reconstructed from metagenomic datasets, although sensitivity and accuracy improvements still remain to be made, particularly in cases in which multiple similar strains of a species are present in a community50.

After the generation of MAGs, phylogenetic analysis can be done using two main approaches to aid epidemiological tracing and transmission analysis51,52: (1) multilocus (multi-gene) alignment and (2) whole-genome comparisons. Although whole-genome analysis provides higher resolution in principle53, horizontal gene transfer, recombination events and higher error rates in repeat regions can affect results. Frequently, average nucleotide identity (ANI) thresholds are used to infer transmission, but these need to be adjusted to evolutionary rates of specific pathogens54. The analysis of AMR genes is usually done through mapping and alignment to reference databases such as CARD55, ARG-ANNOT56 and MEGARes2.057. As these approaches can have false negatives and arbitrary coverage and identity cut-off values, deep-learning approaches have been proposed as an alternative58. Additionally, the analysis of mobile and extrachromosomal AMR genes has been boosted by the use of Hi-C techniques to link them to their host59.

Table 3 Bioinformatics pipelines for end-to-end metagenomic analysis

Learning by example from case studies

In the following paragraphs, we discuss case studies that reveal how metagenomic surveillance has been used, including the specific choices made and lessons learnt and how this can guide future efforts to operationalize metagenomic surveillance.

Urban environments and transportation systems

As the world becomes increasingly urbanized, surveillance of public spaces in cities, particularly mass-transport systems, represents an efficient way to map our microbial exposomes. One of the first applications of metagenomic surveillance at scale was conducted in 2013–2014 in the subways of New York City, United States60. The results highlighted the feasibility and scalability of this strategy to detect pathogens and AMR genes, as well as potential pitfalls of read-level analysis61. In subsequent work, the international MetaSUB consortium expanded this study to more than 3,700 samples from 58 cities, building a global reference map for microbial diversity and AMR genes62. As environmental swabs often have low biomass, standardized sample collection and inclusion of positive and negative controls were used to minimize kitome artefacts (that is, reagent- and laboratory-associated contaminants). In addition, de novo assembly of short-read sequencing data was used to identify novel species and AMR genes that were missing from existing reference databases, thereby reducing the effect of this confounding factor in reference-based analyses.

AMR tracking from sewage

Metagenomic surveillance for AMR has been performed based on sewage and livestock fecal slurry in a few large-scale studies63,64. In a global sewage surveillance project64, untreated sewage from 74 cities in 60 countries was shipped frozen and centrally processed to obtain short-read metagenomic data and assemblies. In total, 1,625 different AMR genes belonging to 408 gene classes were identified from these assemblies, including blaNDM, mcr and optrA genes, which are AMR determinants for last-resort antibiotics64. The study found that total AMR abundance varies by region, and values were highly correlated with socioeconomic factors such as sanitation. Large-scale sewage surveillance may therefore provide a convenient, low-cost method of AMR monitoring at the population level. These studies highlight that sampling and transportation logistics can be addressed, and that further automation in collection and processing would make this approach attractive for surveillance.

Healthcare environments

Large-scale metagenomic mapping of healthcare environments was pioneered by Lax et al.65 to determine microbial colonization and transmission patterns in a newly built hospital. Owing to their importance as a high-risk setting, Brooks et al.66 focused on neonatal intensive care units, using shotgun metagenomic sequencing and MAGs to demonstrate the direct interchange of microorganisms between patients and the environment. In addition, their work demonstrated the persistence of some strains in the intensive care unit environment throughout the one-year study period. In recent work, we showed the enhanced persistence and widespread distribution of multidrug-resistant organisms in hospital environments67 based on hybrid metagenomic assemblies. In addition, comparisons with patient isolates up to eight years apart showed high similarity, which highlighted the persistence of AMR reservoirs. This study used quasi-metagenomics (that is, metagenomics of intentionally modified microbiomes, for example, via culture enrichment) to enrich for antibiotic resistant organisms and to acquire a large number of high-quality MAGs (>2,000) from a few hundred samples. Collection curve analysis indicated that as few as 50 samples may be sufficient for routine strain and AMR surveillance in hospitals. Further improvements in analysis of low-biomass samples and real-time sequencing are likely to make these approaches even more attractive in healthcare settings in which time-to-answer is critical. Sandbox trials would also be needed to provide evidence for its ability to rapidly track outbreaks and prevent infections to make the case for routine adoption.

Zoonotic viral reservoirs

Metagenomic approaches have been used to detect and characterize zoonotic pathogens in various potential reservoirs, such as arthropod vectors68,69,70, bats71 and wild ducks72. Known, emerging and novel viral pathogens have been detected from ticks69,70, mosquitos68, bat gastrointestinal excreta71 and wild duck fecal droppings72, thereby highlighting the utility of metagenomic surveillance of zoonotic viral reservoirs. Correspondingly, the Global Virome Project73 was initiated as a large-scale effort to develop a global atlas of potential zoonotic viral pathogens and to improve our capacity to discover, detect and diagnose viruses that may threaten human health. Viral metagenomics poses significant technical challenges74, including safety concerns, low viral loads in most samples, RNA degradation and quasi-species diversity. Addressing these concerns through the development of new automation, nucleic acid extraction and bioinformatics tools would therefore be key to improving our ability to monitor and understand the vast virosphere (especially coronaviruses, influenza viruses and ebolavirus) through metagenomics.

Taken together, these case studies highlight the unparalleled breadth and depth of survey that is now feasible in various domains of interest using metagenomics. The ability to unify analysis of diverse pathogens, to integrate information across kingdoms in a single high-throughput assay and to carry out surveillance for AMR genes provides analytical opportunities that were previously not feasible. In addition, the generation of high-quality metagenome-assembled genomes in some studies further enabled whole-genome investigations62,67 and high-resolution phylogenetics that were previously only feasible through a more low-throughput process of culturing and isolating genome sequencing. There is therefore strong interest in performing routine metagenomic surveillance because the data generated can be used to answer biological research questions.

Challenges to widespread adoption of metagenomics surveillance

Although there is a robust rationale for implementing metagenomic workflows in surveillance, there are challenges to the broad adoption of this technology. In this section, we highlight some of these issues and discuss ongoing developments that will smooth the path to routine application.

Sensitivity, specificity and standardization

An important consideration for adoption of metagenomics methods is the sensitivity, specificity and reproducibility of an assay, an aspect that was not investigated in detail in the metagenomic case studies discussed here. Other studies have sought to determine the sensitivity and specificity of shotgun metagenomic assays relative to conventional testing approaches44,75. Similar to other molecular methods such as PCR, the sensitivity and specificity of metagenomic analytics depends on the optimization of protocol workflows that include several factors, such as different target pathogens and different (sometimes parallel) sample processing workflows for maximizing assay sensitivity and specificity. Depending on the resources and sample biomass available, this may or may not be feasible to implement, which poses a bottleneck for adoption. Progress in the development of molecular biology techniques and adoption of robotics and laboratory automation (see the section ‘Breakthrough technologies’) would greatly address this pertinent challenge and facilitate widespread application.

Furthermore, the wide range of sample matrices and multi-step custom laboratory and analytical workflows are sources of variability that complicate metagenomic assay implementation. Despite these challenges, there are studies suggesting that high intra-laboratory reproducibility can be achieved with rigorous optimization, standardization and validation44,75. Overall, efforts towards standardization of workflows and quality assurance are already gaining momentum75,76,77, and should be aided by automation (ease of use), reduction in analysis cost (sequencing cost and multiplexing) and efforts to benchmark bioinformatics tools78. Crucially, sequencing costs are already close to the threshold (less than US$100) where they are comparable with overall costs for conventional surveillance protocols (incorporating labour costs). End-to-end standardization (including bioinformatics) and validation of metagenomic workflows may enable better uptake.

Despite the inherent complexity of metagenomic data collection and interpretation, the increasing adoption of shotgun metagenomic approaches in clinical diagnostics indicates that such assays can be standardized and validated44,76,79. Nevertheless, as surveillance programmes affect public health decision-making, acceptance of metagenomics is influenced by considerations beyond sensitivity, specificity and economic feasibility. For example, concerns can range from the impact of microbial contamination on the results80,81,82 and the fidelity of bioinformatic analysis for related species62 to the viability and infectivity of detected pathogens and the ability to estimate the public health risk that is posed83. Thus, in addition to stringent protocol standardization, end-to-end guidance for common sample types, inclusion of various controls, quality assurance and pipeline verification with proficiency samples and the use of reference standards84,85, further studies are needed to develop public health risk models based on metagenomic surveillance data. As the main objective of pathogen surveillance is to provide early intelligence on potential public health threats, this intelligence will require follow-up confirmation. Furthermore, the regulatory implications for surveillance will be complex and determined by each jurisdiction based on local resources available, local surveillance needs, epidemiology and local policies.

Technological hurdles

One advance that could lower the barrier to adoption of metagenomic surveillance is the availability of kits and protocols for linked reads, synthetic long reads and Hi-C reads86,87. In parallel, rapid developments in sequencing technology, particularly in generating highly accurate long reads88 and ultralong reads89, as well as the ability to do real-time, field-deployable whole-genome analysis, are making metagenomics surveillance a realistic option90. For routine work, inexpensive and robust sequencing kits with long shelf-life are important areas of ongoing development91.

In tandem, bioinformatic algorithms that enable sensitive classification of long reads, such as MetaMaps92, fast and accurate metagenomic assembly49,93, and composition and coverage-based assembly binning94, are bringing robust metagenomic-surveillance workflows within reach. Significant research investment is needed to improve strain-resolved assemblies of metagenomes. Specifically, the utility and reliability of metagenome assemblies remain limited by a failure to assemble difficult-to-assemble genomic content such as plasmids, 16S rRNA and repetitive DNA elements, and the generation of incomplete and mixed or composite genome bins. The ability to obtain near-complete metagenome-assembled genomes, in particular, is a key development for metagenomic surveillance as it enables more complete mapping of AMR genes and the use of these genomes for transmission analysis. A growing set of research studies have exploited this capability to expand our understanding of microbial diversity and have resulted in new MAG databases such as UHGG, GTDB and IGGdb, which can be useful resources for future microbial surveillance efforts95,96,97. Community-driven initiatives such as the Genomic Standards Consortium98 are driving data and metadata standardization efforts that would be valuable for the use of MAGs in surveillance99,100. Curated, quality-controlled reference genome databases, such as FDA-ARGOS, will also be instrumental for improving the accuracy of metagenomic analyses.

Breakthrough technologies

Several recent advances will make a difference to accelerate surveillance efforts.

Robotics and laboratory automation

The streamlined workflow of metagenomic assays makes them amenable to various forms of automation, thereby improving scalability, safety and reproducibility over traditional surveillance strategies. In particular, open-source automation platforms (for example, Opentron and potentially the VoITRAX) enable seamless wet-bench standardization through sharing of digital protocols that can be directly deployed to automated systems of participating laboratories.

Genome analysis on mobile applications

The time required for data synthesis and reporting can be a major bottleneck in transdisciplinary biosurveillance efforts. There is therefore a need to couple portable sequencing technologies, such as Oxford Nanopore’s MinION, with applications that enable genomic analysis in the field, for example, the MinKNOW app and the iGenomics app101.

Adaptive sequencing

Nanopore sequencing has the feature that DNA templates can be ejected, which provides the ability to select sequenced DNA in real time. Recent applications of this method have demonstrated its utility to do enrichment sequencing of target regions in a metagenome without the need for specialized sample preparation, thereby reducing turnaround time and resource requirements102,103. Further developments of this technology could involve selecting for specific mutations or adaptively choosing a virulence marker gene to search for once a particular pathogen signature has been observed.

Citizen science

With increasingly accessible scientific tools such as the MinION sequencer and the iGenomics app, broader participation of citizen scientists in environmental surveillance is now becoming feasible. Consortiums such as MetaSUB and the American Gut have taken a first step in this direction by crowd-sourcing samples and funds104. In addition to aiding in wider and more representative sampling, these efforts helped engage a broader community in their scientific efforts, thus promoting a better understanding of the underlying science in matters of public concern such as AMR.


The COVID-19 pandemic has shown that infectious diseases do not respect borders and that nobody is safe until everyone is safe. Although the pandemic has spurred a greater appreciation for the need to strengthen global sequencing capacity, to share genomic data and to coordinate surveillance efforts, these have not been high-priority areas until recently. Importantly, we run the risk that future efforts may be limited to surveillance for viruses similar to SARS-CoV-2 or solely for respiratory viruses. By broadening the aperture of surveys, metagenomics goes beyond a technological upgrade. It aspires to unify microbial surveillance and transform public health efforts to proactively screen for threats, involve the very citizens that it seeks to protect and be harmonized on a global scale in step with the globalized world that we live in. We anticipate that a future model of microbial surveillance utilizing metagenomics will have several key elements, including the following:

  1. (1)

    Standardized processes and analysis that a range of facilities and individuals can apply, including well-resourced reference and sentinel laboratories to interested citizen scientists. This would include pre-set requirements for sampling, significant investments to improve molecular biology workflows and low-cost liquid handling automation to enable programmable automated sample processing and sequencing, and provenance-tracked validated pipelines that submit data to global repositories in a standardized format (Fig. 2).

    Fig. 2: Harmonized pathogen surveillance using metagenomics.
    figure 2

    Schematic illustrating the opportunities for transdisciplinary One Health surveillance, utilizing metagenomic sequencing as a way to unify workflows and harmonize data. Key drivers for this integration are highlighted, including climate change and its impact on zoonotic reservoirs, globalization and international travel, medical tourism and global food trade. Created with

  2. (2)

    Unification in a global One Health network, whereby genomic data across domains (for example, clinical, food and water safety, and vector control) and countries can be jointly analysed with appropriate privacy, security and data-ownership safeguards (for example, GISAID105; Fig. 2). This would involve automated analyses integrating epidemiological (for example, the TraceTogether app, and environmental information to flag events of concern (for example, AMR or pathogen transmission). We envisage that such efforts would need to be integrated and coordinated at the national and global level through multinational agencies such as the WHO to effectively meet its goals.

  3. (3)

    High-resolution transmission mapping, whereby availability of whole-genomes (MAGs or isolates) would facilitate detailed source tracking, identification of reservoirs of concern and targeted mitigation efforts (Fig. 2).

  4. (4)

    Boost infectious disease research, whereby similar to precision medicine efforts worldwide, the availability of massive genetic datasets with rich metadata will facilitate biological and genetic research into virulence genes, microbial evolution, plasmid and phage diversity, in turn feeding back to support rational therapy for infections.

  5. (5)

    Data-informed public policy and practices, whereby the integration of clinical, epidemiological and metagenomic data will be routinely used to determine policy choices for desired public health outcomes. The implementation of such evidence-backed policies will need to become more commonplace to improve our chances to avoid future pandemics106.

The importance of better international surveillance and information sharing is clear in the WHO’s review of the global COVID-19 response, in which the panel recommends key transformational changes “to establish a new global system for surveillance, based on full transparency by all parties, using state-of-the-art digital tools to connect information centres around the world and including animal and environmental health surveillance”107. This would go beyond the current national efforts to update therapeutic guidelines and antimicrobial stewardship programmes, and could foster a new era of international cooperation supported by a multilateral agreement similar to the World Trade Organization. The harmonization of pathogen surveillance workflows in a metagenomics-enabled unified One Health strategy108,109 would have the added benefit that other public health priority pathogens would not be relegated in times of crisis.