Wastewater sequencing reveals community and variant dynamics of the collective human virome

Wastewater is a discarded human by-product, but its analysis may help us understand the health of populations. Epidemiologists first analyzed wastewater to track outbreaks of poliovirus decades ago, but so-called wastewater-based epidemiology was reinvigorated to monitor SARS-CoV-2 levels while bypassing the difficulties and pit falls of individual testing. Current approaches overlook the activity of most human viruses and preclude a deeper understanding of human virome community dynamics. Here, we conduct a comprehensive sequencing-based analysis of 363 longitudinal wastewater samples from ten distinct sites in two major cities. Critical to detection is the use of a viral probe capture set targeting thousands of viral species or variants. Over 450 distinct pathogenic viruses from 28 viral families are observed, most of which have never been detected in such samples. Sequencing reads of established pathogens and emerging viruses correlate to clinical data sets of SARS-CoV-2, influenza virus, and monkeypox viruses, outlining the public health utility of this approach. Viral communities are tightly organized by space and time. Finally, the most abundant human viruses yield sequence variant information consistent with regional spread and evolution. We reveal the viral landscape of human wastewater and its potential to improve our understanding of outbreaks, transmission, and its effects on overall population health.

For all statistical analyses, confirm that the following items are present in in the figure legend, table legend, main text, or or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as as a discrete number and unit of of measurement A statement on on whether measurements were taken from distinct samples or or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of of all covariates tested
A description of of any assumptions or or corrections, such as as tests of of normality and adjustment for multiple comparisons A full description of of the statistical parameters including central tendency (e.g.means) or or other basic estimates (e.g.regression coefficient) AND variation (e.g. standard deviation) or or associated estimates of of uncertainty (e.g.confidence intervals) For null hypothesis testing, the test statistic (e.g.F, t, r) with confidence intervals, effect sizes, degrees of of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on on the choice of of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of of the appropriate level for tests and full reporting of of outcomes Estimates of of effect sizes (e.g.Cohen's d, Pearson's r), ), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or or software that are central to to the research but not yet described in published literature, software must be be made available to to editors and reviewers.We We strongly encourage code deposition in in a community repository (e.g.GitHub).See the Nature Portfolio guidelines for submitting code & software for further information.

Data Policy information about availability of data
All manuscripts must include a data availability statement.This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy Research involving human participants, their data, or biological material Policy information about studies with human participants or human data.See also policy information about sex, gender (identity/presentation), and sexual orientation and race, ethnicity and racism.

Reporting on sex and gender
Reporting on race, ethnicity, or other socially relevant groupings

Recruitment
Ethics oversight Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research.If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Ecological, evolutionary & environmental sciences study design
All studies must disclose on these points even when the disclosure is negative.

Study description
Research sample
Use the terms sex (biological attribute) and gender (shaped by social and cultural circumstances) carefully in order to avoid confusing both terms.Indicate if findings apply to only one sex or gender; describe whether sex and gender were considered in study design; whether sex and/or gender was determined based on self-reporting or assigned and methods used.Provide in the source data disaggregated sex and gender data, where this information has been collected, and if consent has been obtained for sharing of individual-level data; provide overall numbers in this Reporting Summary.Please state if this information has not been collected.Report sex-and gender-based analyses where performed, justify reasons for lack of sex-and gender-based analysis.
Please specify the socially constructed or socially relevant categorization variable(s) used in your manuscript and explain why they were used.Please note that such variables should not be used as proxies for other socially constructed/relevant variables (for example, race or ethnicity should not be used as a proxy for socioeconomic status).Provide clear definitions of the relevant terms used, how they were provided (by the participants/respondents, the researchers, or third parties), and the method(s) used to classify people into the different categories (e.g.self-report, census or administrative data, social media data, etc.) Please provide details about how you controlled for confounding variables in your analyses.
Describe the covariate-relevant population characteristics of the human research participants (e.g.age, genotypic information, past and current diagnosis and treatment categories).If you filled out the behavioural & social sciences study design questions and have nothing to add here, write "See above." Describe how participants were recruited.Outline any potential self-selection bias or other biases that may be present and how these are likely to impact results.
Identify the organization(s) that approved the study protocol.
Influent wastewater was collected at wastewater treatment plants in Houston and El Paso, TX, USA.Samples were processed, nucleic acid was extracted.Oligonucleotide probes were used to enrich human and animal virus DNA/RNA, then the nucleic acid was sequenced using massively parallel technology.Viral sequences were analyzed for clinical correlation and virome community attributes.
wastewater sample: Between 100-500 mL of raw wastewater collected over a 24-hour period.These samples were chosen to represent the viruses of a wastewater treatment plant catchment area population during time of collection.
Sampling was conducted weekly for ~9 months at 10 separate wastewater treatment plants.Sample size calculations were not performed, and sample size was determined by a) budgetary allotment, and b) cooperation of wastewater treatment plant professionals in Houston and El Paso.