Introduction

Historically, fecal indicator bacteria (FIB) are used to regulate microbiological water quality; however, FIB are not reliable indicators of viruses.1,2 For example, previous studies have shown a lack of correlation between FIB and gastrointestinal illness in drinking water due to noroviruses,3,4 and viral outbreaks in recreational waters with FIB at acceptable levels,5 or in the absence of FIB6,7 have been documented. Infectious viral pathogens primarily enter the water environment through the release of untreated or inadequately treated wastewater. It is difficult to assess the complete public health and economic impact of waterborne viral infections; however, exposure to fecal-contaminated recreational water is estimated to result in millions of illnesses annually in the United States8 and hundreds of millions of illnesses globally,9 and inadequate sanitation and drinking water are predicted to account for greater than 700,000 annual deaths globally.10 Viruses are also predicted to account for the most significant fraction of human illnesses in sewage-contaminated water under specific exposure scenarios.11 Improved tools to understand viral pathogens’ presence and health impact in sewage-impacted waters are critical to informing engineering and policy efforts to protect human health and improve microbial water quality.

Potential applications of improved viral water quality indicators are broad, including monitoring recreational water and drinking water, ensuring safe wastewater reuse, and agricultural applications (e.g., irrigation water quality and food safety). The ideal viral indicator would have multiple characteristics. Previously proposed criteria include originating from the gut microbiome of target animal species; environmental co-occurrence with pathogens; greater concentration than target pathogens; equivalent or greater environmental resistance to killing stresses as pathogens; specificity to the target host and no environmental sources or replication; correlation to health risk; and, ease in measurement.12 These criteria should be considered in the selection, development, and evaluation of any potential viral water quality indicator.

Viral water quality indicators have previously been proposed and developed but not yet been widely adopted for regulatory applications. These previously developed indicators mainly fall into two groups: human pathogens and culturable bacteriophages associated with human feces. Examples of human pathogens previously proposed as viral water quality indicators include adenovirus,13,14,15,16 polyomavirus,13,14,17,18 norovirus,19 and reovirus.20 These methods have the advantage of being based upon human viruses capable of causing disease and being highly specific to humans but are challenged by low and variable concentrations in wastewater. Culturable bacteriophages, e.g., bacteriophages infecting E. coli or Bacteroides fragilis,21 have also been proposed as water quality indicators. These phage-based methods meet many of the above-mentioned criteria for an ideal viral water quality indicator, including higher concentrations than many human viral pathogens in wastewater (somatic and F-specific coliphages previously found to range from 3 to 6 log10 PFU per 100 mL in untreated sewage)12,22 and more rapid culturability than human viral pathogens.12 Potential challenges in the application of coliphage includes limited specificity to human fecal waste23 and lower concentrations than other more recently discovered targets, such as those described in this review. Culturable phage-based indicators are currently being evaluated as a regulatory viral water quality indicator by the United States Environmental Protection Agency.12,24 More recently, alternative viral indicators such as pepper mild mottle virus (PMMoV) and cross-assembly phage (CrAssphage) have been proposed, which will be covered more thoroughly in this review. An opportunity remains to develop improved viral water quality indicators that address shortcomings of current methodologies, namely low concentration in wastewater to facilitate detection or a lack of specificity to human waste.

There has been substantial research interest over the last decade in the ‘human microbiome’, that is, microbial communities associated with the human body. Technical developments in DNA sequencing have enabled characterization of unculturable microbial communities, with initial discoveries suggesting the relationship between the human microbiome and human health. The microbiome research focus has generated significant insights into the diversity and ecology of the microorganisms that inhabit the human body as well as the discovery of novel human-associated microbiota, including viruses. The discovery of human-associated viruses creates a potential opportunity to develop viral water quality indicators that address current challenges of human specificity and concentration in waste water.

Here, we review current methodologies to move from metagenomic viral discovery to develop a molecular viral water quality indicator. We provide examples of viruses discovered by metagenomics and their subsequent detection in the water environment, with a specific focus on their suitability as a viral water quality indicator. We also discuss some existing shortcomings and research for molecular viral water quality indicators. Finally, we conclude with an outlook on the potential of developing water quality monitoring tools from newly discovered viruses in the microbiome era.

Metagenomic discovery of human-associated viruses

Metagenomic applications for viral discovery25 and viral pathogen detection26,27,28 have been previously reviewed. Here we provide a brief discussion relevant to viral discovery for water quality indicator development. The development of viral metagenomics for direct pathogen detection has shown promise29,30; however, it is notable that current untargeted metagenomic methods may miss pathogens detected by targeted methods.31 In addition, developments to improve the quantitative nature of metagenomics may improve the application of metagenomics for direct viral pathogen detection.32 The top panel of Fig. 1 shows an overview of metagenomic viral discovery.

Fig. 1
figure 1

Overview of the pathway from metagenomic viral discovery to viral water quality tool development

Metagenomic sequencing has accelerated viral discovery, including the discovery of uncultured viruses. Currently, only a small fraction of predicted viral genomes exist in available databases;33 however, recent metagenomic approaches have significantly expanded available viral genomes.34

In viral metagenomics, viruses are typically first isolated from a sample, viral DNA or RNA is extracted, and nucleic acids are randomly sequenced (i.e., nucleotide sequence determined) and bioinformatically ‘assembled’ into longer contiguous sequences (contigs) based upon sequence similarity. Given the relatively short genome size of viruses, contigs of abundant viruses may often be near-complete genomes. Bioinformatic tools, such as VirSorter35 and MetaVir,36 have been developed to bioinformatically determine viral genomes from contigs. Following the identification of a potential virus, the genome sequence is often confirmed by re-sequencing using overlapping PCR regions. Potential human-associated pathogens are typically determined based upon their phylogenetic relationship with previously discovered pathogens.37,38 Some recently discovered viruses, e.g., crAssphage, share limited sequence similarity with known viruses, challenging their sequence-based detection and assignment of host and gene function.39

Challenges developing viral water quality tool from novel human-associated viruses

Traditional virus isolation has been carried out via culturing on a receptive cell line or host. Many viruses discovered by metagenomics are not readily culturable. This is expected; simply put, if these viruses were readily culturable using common existing techniques, these viruses would likely have been previously discovered using culture-based approaches. Novel human-associated viruses are not necessarily ‘unculturable’, but may require specialized development to culture in the laboratory. The inability to culture these novel viruses hinders culture-based detection in water quality monitoring, essentially requiring molecular (DNA-based or RNA-based) methods, as discussed below. In many cases, the inability to culture novel viruses may also complicate the identification of the viral host or host-range. Similarly, the limited ability to culture these viruses hinders the ability to confirm their disease association via Koch’s postulates, leading to the proposal of the ‘metagenomic Koch’s postulates’.25 In ‘metagenomic Koch’s postulates’, multiple lines of evidence, primarily co-occurrence with a disease, are used to define a recently discovered microorganism as a causative agent of disease. Similar approaches, including co-occurrence and genomic profiling, can be used to propose viral hosts.

Viruses discovered using metagenomics typically depend on molecular detection in the environment (that is, DNA-based or RNA-based detection) due to the lack of a cultured isolate. Currently, the majority of molecular water quality indicators are PCR-based assays, with quantitative PCR (i.e., real-time or qPCR) as the most common approach. In addition, digital PCR (dPCR) is an emerging water quality monitoring tool.40 In dPCR a PCR reaction is divided into thousands of individual sub-reactions, and the number of targets in the original sample is quantified based on the number of positive PCR sub-reactions. This development eliminates the need for the standard curve used in qPCR reactions. dPCR has previously been used to detect viruses in natural waters, including norovirus, hepatitis A, and rotavirus.41,42 When compared to qPCR, assessments have shown that dPCR has greater precision42 and is less sensitive to PCR inhibitors.42,43,44 Results have been more mixed for sensitivity assessments, with some studies showing similar sensitivity as qPCR41,42 and other studies showing similar or slightly decreased sensitivity.44 The evidence suggests that dPCR offers several technical advantages over qPCR, with the primary drawbacks being capital costs and the necessity for further technical verification, i.e., limited demonstrations of application for viral monitoring in environmental waters.

Designing PCR assays based on viruses discovered from assembled metagenomic reads comes with specific challenges. The genome assembled from metagenomic sequence data likely represents a consensus genome developed from multiple strains present in the sample. A consensus- or pan-genome results in unknown sequence variation, complicating molecular assay development based on this data. In addition, novel viruses are often assembled from a relatively small number of samples, which may result in inadequate coverage of true viral diversity.

Viral water quality tool development

The bottom panel of Fig. 1 shows an overview of the pathway for viral water quality tool development.

Following metagenomic genome or contig assembly, the sequence can be scanned for potential PCR primers that meet design criteria. A developed PCR assay could then be adapted to qPCR or dPCR chemistry to enable target quantification. In general, PCR primers should be designed from unique, high-quality, and unambiguous DNA regions. Some important rules are: (1) the primer lengths should be from 17 to 25 basepairs, as longer primers may lead to higher annealing temperature, which reduces DNA polymerase activity; (2) GC-content should be between 40 and 60%, and both primers should have similar GC content; (3) the bases should be distributed randomly avoiding more than three identical consecutive bases; (4) the 3′ end of primers should have two G or C in last 5 bp; (5) regions of secondary structure, such as hairpins, should be discarded during design; (6) bioinformatics primer design tools, such as NCBI Primer-BLAST, should be used to check for potential false positives after primer design.45,46 Typically, this process is completed for a single or small number of PCR primer targets, and those assays are subsequently optimized; however, that approach may be challenged by uncertainty in the viral genome composition (for example, unknown sequence diversity) that arises from the metagenomic assembly of the genome. Best practices for PCR assay development and implementation for water quality assessment have previously been detailed.47 Assays targeting RNA also have some additional considerations, namely the addition of a reverse-transcription step and greater possibility of RNA degradation.48

Recently, Stachler et al.49 took a unique approach to overcome unknown sequence conservation in genomes assembled from metagenomes. In this approach, 384 primer pairs were designed to cover the regions of the CrAssphage genome determined to be suitable for primer design and screened against sewage DNA and composite animal fecal DNA to identify the best performing primer pairs. In the initial evaluation, 18 primer pairs gave false positives in animal fecal DNA, 12 primer pairs produced spurious bands, 123 primer pairs produced no product, 174 primer pairs were positive with primer dimer (suboptimal amplification), and 57 primer pairs passed initial evaluation. An overview of this primer design approach over the CrAssphage genome is shown in Fig. 2. This approach represents a potentially powerful method to rapidly overcome uncertainty due to unknown sequence variability in genomes determined from metagenomic sequence data.

Fig. 2
figure 2

Genome map demonstrating the shotgun primer design approach used to design a CrAssphage PCR assay by Stachler et al.49 Genome position is noted by numbers on external ring. The open reading frames of the circular CrAssphage genome are shown in blue and orange on the external ring. Each tick mark in the internal ring represents a primer pair. The outcome of each primer pair screening indicated by legend. Adapted with permission from Stachler et al.49 Copyright 2017 American Chemical Society

Two of the primary performance metrics of PCR assays for water quality monitoring are sensitivity and specificity. Sensitivity refers to the percentage of non-spiked target samples correctly identified as positive normalized by the total number of target samples tested (e.g., the percentage of human sewage samples positive for the assay). Specificity refers to the percentage of presumptive negative samples correctly identified as negative normalized by the total number of presumptive negative samples tested (e.g., the percentage of non-human fecal samples negative for the assay). Typically, assays are screened against human and animal-derived fecal samples to assess performance. While there are no formal performance criteria for sensitivity or specificity, assays are often considered to perform well if they demonstrate at least 80% sensitivity and specificity.50 The specific laboratory, reference samples, and amount of input material may influence reported sensitivity and specificity.50

Two additional performance metrics often considered are the limit of detection (LOD) and limit of quantification (LOQ). The LOD is the lowest amount of DNA sample in a given reaction condition that can be statistically detected. LOQ is the lowest quantifiable target concentration above the lower limit of the linear detection range and the assay LOD.51 Varying definitions of LOD may lead to differing interpretations.52 These measures must also be updated for varying technologies. For example, the LOD for digital droplet PCR has been previously defined as the [mean of the no-template control measures] + 2.479*[standard deviation of the no-template controls].51 There is not yet agreed upon metrics for the direct application of semi-quantitative metagenomic approaches.

LOD and LOQ have well-recognized challenges as quantitative descriptors of molecular water quality assays; namely that different sources or amounts of input material significantly alter these measures. Improved descriptors have recently been proposed. The first is the ‘equivalent threshold’, which is the “quantity of fecal reference material per reaction that corresponds to the LOD of an assay”.52 The second is the ‘false positive index’, which is the “the difference between the log10 transformed ET values of the target source and a non-target source”.52 Continued development of assay descriptors will improve the quantitative comparison of molecular water quality indicators.

Currently, there is no consensus regarding tested reference library size, reaction input concentration, and assay performance parameters for proposed assays. For instance, in the review from Harwood et al.,53 over 26 studies and 55 assays were summarized regarding assay performances such as sensitivity, specificity, sample type, LOD and LOQ. From this summary, the diverse reported units for input, LOD, and LOQ values indicate that a standard way to evaluate molecular fecal source tracking methods is needed. Even using the same assay, but in different geographical areas, the parameters varied greatly in both LOD and LOQ, suggesting geographic variability.52,53 Possible explanations for differential performance between laboratories could be slightly different qPCR reaction conditions, gene copy abundances in different locations, different sampling timing, different DNA extraction techniques, and most importantly, different initial input concentrations as the initial target concentration plays an important role in calculating sensitivity and specificity.52 Likewise, viral concentration approaches are a critical aspect to assay performance as viral targets may be dilute in target environmental waters; method performance of viral concentration methods has been previously reviewed.54,55 It is also notable that performance is dependent on the specific PCR assay; for example, a previous evaluation found order of magnitude concentration differences using two different adenovirus qPCR assays.56 These results demonstrate that assays should be carefully chosen and validated for both different laboratories and geographic locations before use.

Examples of metagenomic viral discovery and detection in the water environment

Recently, there have been several examples of human-associated viruses discovered by metagenomic methods. Below we review some notable examples of virus detection in the water environment following their discovery by metagenomics and the potential to use these viruses as water quality monitoring tools.

Bocavirus

Human bocavirus (HBoV) is a member of the parvovirus family first described in 2005 as an agent of respiratory infection using an early metagenomic approach.38 HBoV is a non-enveloped virus with single-stranded 5Kb DNA genome. As HBoV is often detected concurrently with established pathogens, the role of HBoV as a causative agent in disease is unresolved due to the lack either of a simplified cell culture system or animal model.57 HBoV was detected in fecal samples from children who have diarrhea regardless of the presence of respiratory infection, indicating that HBoV may be both an enteric and respiratory pathogen.58

HBoV has been detected in wastewater and environmental waters. Table 1 shows a summary of HBoV detection in wastewater samples and the specific assays used. HBoV was detected using qPCR in 100, 81, 79, and 51% of untreated wastewater samples from Egypt, the United States, Italy, and Norway, respectively.59,60,61,62 HBoV was also detected by viral metagenomics in raw sewage from Nepal and the United States.63 In a 2009 study in Germany, 40% of river water samples contained HBoV at concentrations between 3 × 101 and 2 × 103 genome copies/L.64 Previous environmental detection of HBoV have employed multiple detection methods, including qPCR65 and Luminex66 assays.

Table 1 Previous studies detecting HBoV in wastewater and associated PCR primers

The study of 28 children hospitalized during a sewage-contaminated drinking water outbreak in Finland highlighted the potential for environmental spread of HBoV.67 In three of the four cases multiple viruses were detected; but in one case, HBoV was the only virus detected, suggesting that HBoV was the primary cause of symptoms.67

The evidence suggests that the metagenomic discovery of HBoV identified an unrecognized but likely minor source of waterborne viral disease; however, the utility of HBoV as a viral water quality-monitoring tool may be limited by method specificity to wastewater less than 100% in the majority of previous studies.

Cosavirus

Human cosavirus (HCoSV), a genus in the Picornaviridae family, was first identified in 2008 in stool samples of Pakistani children with non-polio acute flaccid paralysis through the use of viral metagenomics.68,69 HCoSV has a 7.3 Kb single-stranded RNA genome and is genetically diverse.70 HCoSV was previously detected in stool samples from multiple regions throughout the world.71,72 The presence of HCoSV strongly correlates with diarrhea;70 however, HCoSV has also been detected in the stool of healthy individuals.71 Its higher concentrations in those without gastroenteritis complicates the understanding of its pathogenesis.72 HCoSV was also previously detected in pig feces, albeit at lower concentrations than found in infected humans.73

Table 2 shows a summary of HCoSV detection in wastewater samples and the specific assays used. HCoSV was detected using qPCR in 71 and 25–38% of untreated sewage samples in Japan and the United States, respectively.59,74,75 HCoSV was also detected by viral metagenomics in raw sewage from Nepal.63 HCoSV has also been detected using qPCR in treated wastewater samples in the US, France, and Japan, with 1–2 log10 removal following treatment.74,75,76 HCoSV was previously detected using qPCR in 29% of river samples in Japan and 12% of tributary samples in France at lower concentrations than in wastewater samples.74,76 Notably, HCoSV was recently detected using qPCR at relatively higher concentrations than observed in other environmental waters in the Bagmati River, Nepal, highlighting the importance of geographic variability in marker evaluation.77

Table 2 Previous studies detecting HCoSV in wastewater and associated PCR primers

HCoSV may be a causative agent of waterborne viral disease; however, its variable specificity to wastewater (25–71%) across studies and presence in animal sources (pig) may limit its application for viral water quality monitoring. Notably, the recent detection of elevated HCoSV concentrations in environmental waters in Nepal points to a greater potential for application in specific geographic regions.

Cross-assembly phage

Cross-assembly phage, or “CrAssphage”, is a bacteriophage that was discovered in 2014 by metagenomic data mining of human fecal microbiome sequence data.39 It has a dsDNA, 97Kb genome.39 The study found that the CrAssphage genome was present in 73% of available human fecal metagenomes, and that CrAssphage was more abundant in the human gut than all other known human gut phages combined.39 The phage was predicted to be a Bacteroides bacteriophage by co-occurrence profiling,39 and recent successful cultivation of CrAssphage has confirmed this prediction.78 Recent work has identified CrAssphage as the prototypical virus in a new viral family.79

CrAssphage has been successfully detected in both sewage and environmental waters. Table 3 shows a summary of CrAssphage detection in wastewater samples. CrAssphage was detected using metagenomics in 100% of wastewater samples surveyed and was most abundant in samples from North America and Europe and less abundant in samples from Africa and Asia.80 Subsequently, CrAssphage qPCR assays were developed, and CrAssphage was detected in 100% of untreated wastewater samples from throughout the United States and Spain at average concentrations of greater than 109 genome copies/L.49,81,82 CrAssphage was also recently detected with 100% occurrence in untreated wastewater samples from Australia.83 CrAssphage appears to be highly human-associated; however, previous studies have had positive CrAssphage detections in seagull, dog, chicken, cat, and cow fecal samples, albeit at decreased concentrations to those found in human sewage.49,82,84 Recent studies have also successfully deployed CrAssphage markers to detect fecal pollution in environmental waters.84,85 Finally, a recent study exploring CrAssphage diversity found CrAssphage in sewage samples globally.86

Table 3 Previous studies detecting CrAssphage in wastewater and associated PCR primers

High detection rates and high specificity to human wastewater as well as the high abundance in sewage suggest the significant potential of CrAssphage as a viral water quality indicator and could be a valuable addition to the microbial source tracking toolbox. However, further demonstrations are necessary, including detection in multiple environments, correlation with viral pathogens, and geographic diversity, abundance, and exploration if different specific animal variants of CrAssphage exist.

Klassevirus

Human Klassevirus, also known as Salivirus, was first discovered by metagenomic sequencing of a novel picornavirus in pediatric stool samples in the United States.37,87 Klassevirus has a 6–8 Kb single-stranded RNA genome. Serological evidence has identified humans as the host for Klassevirus.88

Table 4 summarizes Klassevirus detection in wastewater samples. Klassevirus was detected using qPCR in 14.6, 57, and 50% of wastewater samples in the United States, South Korea, and Japan, respectively.89,90,91 Notably, using an improved qPCR assay, Klassevirus was detected in 93% of wastewater samples with a maximum concentration of 9.7 × 106 genome copies/L in Japan.92 Klassevirus was also detected by viral metagenomics in untreated sewage from Nepal.63 Klassevirus was also detected using qPCR in 44, 29, and 42–50% of treated wastewater samples from France, Japan, and the United States, respectively.76,89,92 Finally, Klassevirus was detected using qPCR in nine out of 56 river water samples in Japan (16% detection rate) and two of six samples (33%) in Barcelona river water and in one of six samples (16%) in Rio de Janeiro known to be contaminated with sewage.92,93

Table 4 Previous studies detecting klassevirus in wastewater and associated PCR primers

Klassevirus has been observed globally in wastewater, treated wastewater, and in environmental waters, characteristics necessary for a human fecal pollution indicator. The development of an improved assay by Haramoto et al. that increased detection rates in sewage from ~50 to >90%92 increases the potential to utilize Klassevirus in water quality monitoring applications; however, the lower concentrations in sewage (<107 genome copies/L) than other proposed viral indicators may limit its further development.

Pepper mild mottle virus

PMMoV is a plant pathogen that infects a wide variety of pepper cultivars. PMMoV has an approximately 6.3 Kb single-stranded RNA genome.94 PMMoV was not discovered by viral metagenomics, but is included in this discussion as PMMoV was first proposed as a viral indicator of fecal pollution95 following the metagenomic observation of the abundance of PMMoV in feces.96 The presence of PMMoV in feces and wastewater is due to consumption of infected pepper products. PMMoV is mainly specific to human feces but has previously been detected in chicken, geese, seagull and cow fecal samples.95 The authors posited that this was due to animal consumption of products containing peppers.95 A recent manuscript reviewed PMMoV in viral water quality monitoring.97

Table 5 summarizes of PMMoV detection in wastewater. The original Rosario et al. study found PMMoV using qPCR in 100% of untreated wastewater samples from throughout the United States at concentrations of greater than 109 genome copies /L.95 Subsequent studies using qPCR have found 100% detection rates in untreated sewage in the United States, Costa Rica, and Vietnam.98,99 An exception to this high detection rate in sewage is a previous study in South Korea with a PMMoV detection rate of 57%.90 PMMoV was also found in wastewater in Australia100 and Germany101.

Table 5 Previous studies detecting pepper mild mottle virus in wastewater and associated PCR primers

Studies have also investigated the removal of PMMoV in water and wastewater treatment. Observed PMMoV removal using qPCR in wastewater treatment plants in the United States and Germany varied from <1 to 3.7 log10.101,102 Limited to no removal was observed in Bolivian wastewater treatment ponds103 and surface flow wetlands in the United States.104 For drinking water processes, 1–2 log10 removal was removed by coagulation and filtration in Thailand,105 and reverse-osmosis removed PMMoV to below the LOD106. PMMoV was proposed as a viral indicator in wastewater reuse.107

PMMoV has also been detected in environmental waters using qPCR, including 97% of river samples in Germany,101 94% of surface water samples and 38% of groundwater samples in Vietnam,99 76% of surface water source samples in Japan,108 and 33% of beach water samples in Australia.100 Conversely, PMMoV was not detected in any Costa Rican surface water samples despite the detection of FIB and other MST markers, and 100% PMMoV detection in untreated wastewater.98 Finally, PMMoV was found in 85% of Karst groundwater samples in Mexico (all samples positive for total coliforms)109 and PMMoV was the most commonly detected virus in groundwater at artificial groundwater recharge site.110 Observed PMMoV are genetically diverse in the environment.108

The abundance of PMMoV in wastewater and high specificity suggests its high potential as a fecal pollution indicator. One potential challenge may be the detection of PMMoV in chicken, seagull, cow, and goose feces, limiting its human-specificity;97 however, the previously demonstrated specificity is above the widely-accepted 80% threshold. In addition, the high detection rates in environmental waters suggest that PMMoV is persistent, potentially limiting its application to detect recent pollution events, although further research is necessary.

Research needs to enable molecular viral water quality indicators

As discussed above, molecular methods are the most common approach to deploy novel human-associated viruses as water quality indicators. The application of molecular methods to water quality monitoring has specific challenges that must be addressed to allow viral marker application as a regulatory tool. Below, we discuss some of these challenges, including current approaches and potential pathways forward.

Association of viral water quality indicator with disease following exposure

The ultimate ‘gold standard’ for application of a viral water quality monitoring target is an association with disease following exposure. In lieu of epidemiological disease association, modeling approaches may be used to associate molecular water quality indicator exposure with human health outcomes. Previous examples demonstrated a statistical framework based upon quantitative microbial risk assessment to associate a molecular sewage pollution indicator with disease outcomes.11,111 In these models, molecular water quality indicators are used to indicate the volume of wastewater exposure. Typically, a range of concentrations is used with a Monte-Carlo simulation. For reference, the observed concentration ranges for CrAssphage and PMMoV in sewage have been 7–9.5 and 6–10 log10 genome copies per liter, respectively.49,95,97,112 These observed concentration ranges tend to be above the ranges for human pathogens such as adenovirus, facilitating their direct detection and use in environmental waters. A statistical approach is then used to correlate the abundance of molecular sewage indicators with select pathogens in sewage to determine pathogen exposure. Finally, a quantitative microbial risk assessment is applied to estimate expected disease outcomes depending on the marker concentration. These models assumed that indicator and pathogen concentrations fall within previously observed ranges and did not consider either indicator or pathogens transport or decay. Future model iterations should be expanded to better include the environmental behavior of both model pathogens and viral indicators. Ultimately an epidemiological association between viral marker exposure and disease outcome will likely be necessary to enable their regulatory application; however, quantitative microbial risk assessment establishes an approach to associate molecular water quality markers with disease and provide a path to bridge viral water quality marker development with disease outcome.

Temporal and geographic variability

The microbial composition in human gut microbiomes varies both temporally113 and geographically.114 Similarly, human-associated viruses vary based on individual,115 age,116 diet,115,117,118 and geography.117,119 Each of these factors could alter the abundance, and thus suitability, of a viral indicator in wastewater regionally and temporally. Indeed, the microbial communities found in sewage vary between cities120 and the prevalence and distribution of waterborne pathogens; for example, norovirus abundance in sewage varies both regionally and globally,121 as well as temporally.122

Characterization of the regional and global distribution of viral water quality pollution indicators is incomplete. Figure 3 shows the global detections of the viruses highlighted in this review as a demonstration. While the evidence is suggestive of global distribution of these viruses, HBoV, HCoSV, Klassevirus, and PMMoV have each only been detected in sewage from 3.1, 1.5, 2.5, and 4.1% of countries globally. An exception is CrAssphage, which following a recent study exploring CrAssphage diversity86 has been found in sewage in 39% of countries globally. While there is no available data for the vast majority of countries, it is clear that we do not yet know the distribution of these recently discovered viruses for majority of the world’s population or demographics. This limited understanding should be considered before, and may ultimately limit, the global implementation of these viral water quality tools. Regional, rather than global, application of water quality tools may ultimately be most appropriate and should be locally verified prior to application. It may also be most appropriate to apply a ‘toolbox’ of indicators to account for geographic and temporal variability. The selection of appropriate local molecular assays may be informed by metagenomic methods.

Fig. 3
figure 3

Countries with HBoV, HCoV, CrAssphage, Klassevirus, or PMMoV identifications in sewage. CrAssphage alone has been documented in orange countries, blue countries contain multiple viruses. Countries not shown are Azerbaijan, El Salvador, Israel, Kazakhstan, Luxembourg, Malta, Singapore and Sri Lanka

There is a clear research need to expand our view of viral diversity in sewage, perhaps following the consortium model used by Mayer et al.123 to survey bacterial fecal pollution markers on six continents. At a minimum, novel viral water quality monitoring tools should be evaluated locally before implementation to assess the potential effect of geographic or temporal variation on assay performance.

Viral fate

The motivation to use a viral-based water quality monitoring tool is that ostensibly it would behave more like pathogenic viruses in the environment than currently used FIB; however, this assumption requires validation. Viral fate characteristics, e.g., decay rate, could also be incorporated into quantitative microbial risk assessment models as described above. In this context viral marker ‘fate’ includes both persistence and transport in the environment. Viral characteristics, including genome and capsid structure, as well as transmission pathway, drive environmental persistence.124 One of the most important viral characteristics to consider is genome composition (i.e., DNA- or RNA- genome), as genome composition is a significant predictor of viral fate;124 however, even within structurally similar viruses, viral persistence can vary by over an order of magnitude.125

The inherent variability of viral environmental fate demands multiple demonstrations of co-occurrence as well as comparable persistence and transport between potential viral indicators and pathogenic viruses that the indicators are intended to represent. A recent review identified differing predicted risk based upon the age of sewage contamination, and also that risk may be under-predicted using bacterial fecal indicator markers due to the lesser persistence of these markers compared to viral pathogens.126 Most critically, viral transport and persistence must be well-characterized in several different environments to enable comparison between viral marker and pathogen fate and subsequent updates to exposure models. These evaluations must also take into account that molecular viral marker transport may differ from viable pathogen transport, as DNA may be more persistent in environmental systems than viable microorganisms as discussed below. Ultimately, it may be most appropriate to employ multiple viral indicators in a ‘toolbox’ approach to most accurately capture the different behavior of pathogenic viruses of interest. Finally, a move from an observational to a mechanistic understanding of viral persistence will better inform the selection of viral surrogates and indicators127 and provide a framework to integrate the environmental fate of viruses with disease modeling approaches as noted above.

Viral viability

Many viruses discovered via metagenomic methods are not immediately culturable and thus are likely to be developed as molecular detection tools. Molecular methods may detect nucleic acids from both viable and non-viable virus, and nucleic acids may have greater persistence than viable virus.128,129 Detection of non-viable virus by molecular methods, i.e., ‘legacy DNA’, may not be informative of the likelihood of infection or a recent contamination event, potentially limiting the application of molecular assays as a water quality monitoring tool. This is especially important in the environment, where a significant fraction of DNA comes from non-viable cells.130 The question of ‘legacy DNA’ is critical in many molecular applications and has been recently reviewed;131 here, we provide a brief discussion relevant to viral water quality monitoring to summarize the state of the field and potential future research directions.

A common approach to discriminate viable virus is through the removal of nucleic acids not contained within the viral capsid. These approaches typically are not capable of excluding virus with damaged capsids that are no longer capable of causing infection but contain the viral genome. The persistence of RNA in the environment is expected to be much less than the persistence of DNA;132 thus, most approaches are primarily focused on the removal of free DNA. The most common approach currently for removal of legacy DNA is sample digestion with DNAse. Challenges with DNAse include poor removal of particle-bound, non-viable DNA and loss of DNAse activity in some environmental matrices. An alternative approach is the use of intercalating dyes, such as propidium monoazide (PMA). In this approach, intercalating dyes are activated by light and integrate into free DNA, making the DNA unavailable for PCR amplification or sequencing. The dependence of intercalating dyes on visible light has raised questions about the efficacy of PMA for different environmental matrices and microorganisms.133

An alternative approach to discriminating free nucleic acids is via targeting virus capable of successful host binding, demonstrating a functionally intact viral capsid. This approach has previously been demonstrated for norovirus and rotavirus using porcine gastric mucin.134 Binding assays must be tailored for a specific viral target and have not been demonstrated for widespread environmental application. In addition, binding assays may not capture damage to the viral genome, which may be the predominant mechanism for viral inactivation for certain viruses and conditions.135

Finally, advances in viral culturing, for example, the recent successful culturing of norovirus in human intestinal enteroids to determine viral inactivation131 and the recently reported successful cultivation of CrAssphage,78 may provide the opportunity to develop culture-based detection approaches for previously unculturable viral targets for viral water quality monitoring. At a minimum, emerging cultivation approaches will be useful to validate methods design to target viable virus as described above.

Additional approaches and validation of methods to discriminate molecular detections from intact virus will be necessary to propel the widespread adoption of molecular tools for water quality monitoring. The path forward will likely require a multi-tiered approach, integrating approaches to exclude free viral nucleic acids and target intact viral capsids, as well as emerging cultivation approaches to confirm the suitability of other methodologies.

Summary and outlook

A critical need remains for viral water quality indicators with potentially diverse applications, including recreational and agricultural water quality monitoring, fecal source tracking, and wastewater reuse. The recent rise in human microbiome data and discovery of human-associated viruses provides a rich opportunity to develop molecular viral water quality monitoring tools. In general, novel human-associated viruses suspected of causing disease, such as HBoV, HCoSV, and Klassevirus in this discussion, are less abundant in wastewater but more human-specific, while bacteriophages such as CrAssphage and food-associated viruses such as PMMoV are more abundant in wastewater but less human-specific. Given that the primary challenge in viral detection is low target abundance and that the potential exists to improve human-specificity of assays through modification or a toolbox approach (that is, using the developed assays in conjunction with others), the immediate richest potential for viral water quality tool development appears to be with bacteriophages or food-associated viruses. Several research areas are necessary to enable the widespread application of these assays, including investigating the association of viral indicators with disease, geographic distribution, and fate and persistence in the environment. In addition, improved tools to exclude non-viable virus from molecular assessments that would not be indicative of a recent pollution event are necessary. There remains a critical need to provide improved viral water quality monitoring tools that would have the potential to reduce hundreds of millions of illnesses globally each year. The improved resolution of human-associated viral diversity enabled by microbiome research provides a significant opportunity for improved viral water quality management tools to address this need.