The impact of dietary behaviour on chronic disease prevalence is undisputed, yet gaining a real-time understanding of this diet–disease relationship remains a challenge1. Validated survey-based methods (for example, 24 h recall and food frequency questionnaires) for population-level nutritional assessments, such as part of the National Health and Nutrition Examination Survey, are widely implemented, but are often susceptible to recall bias2,3,4. Objectives have been set forth as part of the Healthy People 2030 Initiative by the United States Department of Health and Human Services to increase consumption of unprocessed whole foods through evidence-based tactics, such as combining nutritional counselling with workplace or school nutrition policies; however, tools supportive of targeted educational interventions in public health nutrition are still lacking5.

A plant-based diet generally consists of consuming vegetables, fruits, nuts and seeds, with minimal to no consumption of animal-based foods. This type of diet has been shown to prevent, minimize or even reverse symptoms of certain chronic diseases, including type 2 diabetes and cardiovascular disease, which has motivated recent campaigns to promote plant-based dietary behaviour in the United States6. Naturally occurring, plant-derived chemicals known as phytoestrogens can serve as biomarkers indicative of a plant-based diet as they are consumed through a wide variety of foods, including cruciferous vegetables, berries, nuts, flaxseed, soybeans, oat and wheat6,7,8,9. Two major classes of phytoestrogens commonly consumed in the United States include isoflavones (soy-based foods and food products), as well as lignans, which are much more ubiquitous. These two classes have been investigated in individualized human studies utilizing targeted metabolomic techniques to measure the phytoestrogen chemicals genistein, daidzein, enterolactone and equol, coupled with genomics to assess human gut microbial interactions in response to consumption10,11. Differences in geographic region or demographics may contribute to varied rates of phytoestrogen consumption, such as in Asian populations where intake is considerably higher (25–50 mg per day) than the United States (1–3 mg per day); however, prevalence of diseases in Asia, such as breast cancer, which is in part linked to phytoestrogen consumption, remains low12,13. These findings could be attributed to differences in human gut microbial communities that produce phytoestrogen-derived metabolites, including lignan- and isoflavone-transformed gut microbial products enterolactone and equol, which have been linked to offering cardiovascular- and cancer-protective benefits for the human host14,15. Human gut bacterial genera reported to participate in these enterolactone- and equol-producing pathways include Ruminococcus, Slackia and Bacteroides, rendering these and related commensals prime investigative targets when studying plant-based diets16,17,18. These unique differences in consumption patterns and relationship to disease warrant the need for alternative methods that have the ability to determine population-level dietary information to assess the need for and effectiveness of dietary interventions19.

Wastewater-based epidemiology (WBE) is a rapidly evolving scientific discipline that leverages the analysis of biochemical signatures detectable in community wastewater conveying composited human excreta (that is, urine and stool) to offer insights into population-level health, behaviour, exposure and activity20. Historically employed primarily to monitor substance use across cities21,22, WBE has gained wide-scale attention and acceptance as a companion tool to clinical testing for monitoring the coronavirus disease 2019 (COVID-19) pandemic at multiple spatial scales (that is, country, city, neighbourhood and/or building level)23,24. The success of implementing WBE for viral monitoring with rapid dissemination of actionable data has provided a powerful impetus to leverage this methodology for other public health fields. Use of WBE for elucidating dietary behaviour has been repeatedly proposed as a logical next step; however, only few experimental reports exist so far25,26. The complex relationship between dietary intake and subsequent human metabolism warrants multi-pronged analyses for the successful deployment of WBE for advanced dietary population health assessment.

Here, we report on a hypothesized proof-of-concept investigation of plant-based dietary behaviour demonstrating the use of WBE in public health nutrition assessment through targeted metabolomic and genomic analyses. Study goals were to: (1) investigate longitudinal trends of chemical indicators of isoflavone (genistein and daidzein) and lignan (enterolactone) consumption at the subsewershed level, (2) evaluate the relationship between daidzein consumption and associated metabolite equol to enhance insights into isoflavone intake, (3) assess human gut microbial composition of biologically relevant taxa known to play a role in phytoestrogen metabolism, (4) explore relationships between simultaneously consumed foods for a judicious data interpretation and (5) propose inclusion of WBE into public health dietary assessment.


Using a multiomic approach to WBE, multiple chemical and biological biomarkers of phytoestrogen consumption were successfully detected and analysed in community wastewater to indicate plant-based dietary behaviour. Resultant near-real-time population-level data were then used to propose integrating WBE into the public health nutrition framework.

Study location and samples analysed

This study took place in a small (<10,000 population) subsewer catchment within a large urban city located immediately south-east of a major public university. This wastewater catchment is predominantly residential, comprising a mix of single-family homes, condominiums, apartment complexes and off-campus student housing. Demographic estimates based on self-reported data for this catchment area are as follows: white non-Hispanic (40%), Asian (25%), Hispanic or Latino (21%), African American (7%) and American Indian/Alaska Natives (4%). Daily 24 h composited wastewater samples were collected for seven consecutive days per month from August 2017 to July 2019, resulting in a total of 156 samples collected, processed and analysed for chemical and biological indicators of phytoestrogen consumption.

Per capita chemical measurements in community wastewater

In the first year of this neighbourhood-level monitoring study (August 2017 to July 2018), total phytoestrogen consumption patterns (defined as the sum of genistein, daidzein and enterolactone) displayed a distinct increase at the start of each new year (January), with a gradual subsequent decline through March. In the following year (August 2018 to July 2019), this pattern was repeated, with average overall per capita consumption in year 2 (5.0 ± 2.3 mg per day) greater than year 1 (4.1 ± 2.2 mg per day) (Fig. 1a and Supplementary Fig. 1). While both yearly measurements for total phytoestrogen consumption were higher than the estimated average for the United States from traditional methods (1–3 mg per day per capita), individual isoflavone and lignan measurements of consumption fell within this expected range (mg per day per capita): isoflavones 2.7 ± 1.5 (year 1) and 3.5 ± 1.6 (year 2); lignan 1.6 ± 0.9 (year 1) and 2.1 ± 1.0 (year 2) (Fig. 1e,f). Measured daily concentrations in raw wastewater, calculated mass loads and estimated average per capita excretion are shown for genistein, daidzein and enterolactone in Supplementary Figs. 2 and 3a. Statistical comparisons revealed significant differences (P ≤ 0.01) between years 1 and 2 (Fig. 1b), as well as seasonal changes between fall and winter, and spring and summer (Fig. 1d). Phytoestrogen consumption was slightly lower overall on weekends versus weekdays; however, these differences were subtle (P > 0.05) (Fig. 1c). Results of all statistical comparisons for changes in dietary behaviour are listed in Supplementary Table 4. An assessment of the potential for temperature-dependent in-sewer degradation of analytical targets suggested weak susceptibility for genistein (ρ = 0.22), daidzein (ρ = 0.16) and enterolactone (ρ = 0.23) (Supplementary Fig. 4), implying that the reported wastewater-derived, phytoestrogen-metabolite loads are a reflection of human excretion activity and, by extension, human dietary behaviour.

Fig. 1: Total per capita phytoestrogen consumption (mg per day) from August 2017 to July 2019.
figure 1

a, Total phytoestrogen consumption (sum of genistein, daidzein and enterolactone) shown per month from August 2017 to July 2018 (top) and August 2018 to July 2019 (bottom). bd, Statistical comparisons by year (Y) (b), weekday (Wk) versus weekend (Wkd) (no statistical significance) (c) and season: fall (F) and winter (W); spring (Sp) and summer (S) (d). e, Isoflavone consumption shown as the sum of genistein and daidzein per month from August 2017 to July 2018 (left) and August 2018 to July 2019 (right). f, Lignan consumption indicated by the production of enterolactone per month from August 2017 to July 2018 (left) and August 2018 to July 2019 (right). All box plots: centre lines indicate the median; whiskers indicate the range (minimum and maximum); limits are 25th and 75th quartiles. Black diamond represents mean with individual dots representing daily measurements. Statistical significance tested using Mann–Whitney U non-parametric test of variability with BH correction for false discovery (0.05) (*P ≤ 0.01).

Assessing precursor transformation interactions

The isoflavones investigated in this study, daidzein and genistein, are both precursor (that is, human ingested) compounds, and thus it was necessary to test the efficacy of using these as indicators of phytoestrogen consumption by incorporating an associated human excreted metabolite. Equol, a human gut microbial metabolite of the isoflavone daidzein, was chosen for this purpose and integrated into the study starting from January to July 2019 (n = 49 observations). Measured daily concentrations, calculated mass loads and per-capita excretion are listed in Supplementary Fig. 3b. Trends observed here for equol aligned with those of the other phytoestrogens investigated in this study, with elevated consumption at the beginning of the year and a subsequent decline after March. Statistical analysis revealed a strong positive correlation between daidzein and equol (ρ = 0.84), with a precursor-to-metabolite ratio of 0.2 (Fig. 2), relatively in line with conventional estimates for equol production (30% of people in Westernized societies)27. Equol production reported by WBE can therefore be reported in two different ways: as an average per capita result and as the output per equol producer depending on the study population (Fig. 2). Once adjusted for the population-specific producer to non-producer ratio, computed normalized values are directly comparable and transferable between studies and different demographics.

Fig. 2: Evaluating the relationship between daidzein and equol.
figure 2

a, Population-scale daidzein consumption (light-blue diamonds) and metabolite equol per active equol shedder (30% of population; shown as white squares) observed in an urban population of <10,000 people using wastewater analysis. Data are shown as mean values between duplicates (n = 2); error bars represent standard deviation. b, Paired observations (n = 49) of concentrations of daidzein and equol in community wastewater (μg l−1) showed a strong linear relationship (R2 = 0.80) and strong positive association (ρ = 0.84) using Spearman’s rank-order non-parametric correlation test (n = 93).

Population-level dietary human gut microbial interactions

As a proof of concept for investigating human gut microbial interactions at the population scale, a subset of samples (n = 84; January to December 2018) was tested to determine if the bacterial genera identified to be involved in phytoestrogen metabolism and production of enterolactone and equol from individualized studies (Supplementary Table 6) are also observable at the community level via analysis of composited human waste extant in municipal wastewater. Genus-level compositional analysis revealed the presence of specific bacterial taxa known to be involved in phytoestrogen metabolism within the human gut, such as Bifidobacterium, Blautia and Romboutsia, were detectable in community sewage (Fig. 3a). A detection frequency of 100% in all samples was found for Ruminococcus, Blautia, Clostridium, Bifidobacterium, Romboutsia, Dorea and Subdoligranulum, followed by Intestinibacter (66%), Eubacterium (58%), Bacteroides (50%), Prevotella (25%), Senegalimassilia and Roseburia (16%) and Slackia (8%). Measurements of 16S ribosomal RNA gene copies per litre provided an indication of bacterial concentration in each wastewater sample, ranging from 4.3 × 1011 to 2.4 × 1012 (Fig. 3b). Average semi-quantitative abundances of the number of 16S rRNA genes belonging to select enterolactone- and equol-producing genera were calculated, with resultant values ranging from 9.0 × 106 (Eubacterium; June) to 2.6 × 1011 (Bifidobacterium; October). Estimated average monthly phytoestrogen consumption, determined from chemical biomarker measurements, are shown alongside average semi-quantitative abundances of population-level phytoestrogen-metabolizing taxa to enable a delineation of trends. Elevated trends in both sets of data were observed at the beginning of the year and again towards the end of the summer heading into the fall (autumn), displaying a weak and non-statistically significant association (ρ = 0.24 and P > 0.05), which was, however, intriguing (Fig. 3c).

Fig. 3: Microbial composition of wastewater collected in this study between January and December 2018.
figure 3

a, Relative abundance of select bacterial genera involved in phytoestrogen metabolism within the human gut. b, Copy number of bacterial 16S rRNA genes per litre determined using qPCR. Error bars represent standard deviation between duplicates (n = 2). c, Average semi-quantitative (SQ) abundances (gene copies per litre) of the same phytoestrogen-metabolizing genera shown with total phytoestrogen consumption (mg per day per person) per month.

Evaluating influences from diet variety

In a typical human diet, multiple types of foods and food products are consumed simultaneously, including alcoholic beverages during certain mealtimes or celebrations. A sharp increase in equol production occurred in March 2019 (μg per day per person); 207 ± 7.1 from Saturday 16 March to 594 ± 65.1 on Sunday 17 March, the St Patrick’s Day holiday; prompting further investigation into potential interactions with other human consumption behaviours during this particular holiday (alcohol), and its influence on production of equol at the population scale (Fig. 4). Calculated population normalized mass loads (mg per day per person) of equol and ethyl sulfate, a metabolite of alcohol, were compared from January to April 2019 to assess trends, capture holiday behavioural anomalies and seek out concordance in trends of the various analytes. The highest values for ethyl sulfate within this timeframe occurred over the St Patrick’s Day holiday: Saturday 16 March (13.6 ± 1.5 mg per day per person) and Sunday 17 March (13.5 ± 1.5 mg per day per person), indicating a pointed increase in alcohol consumption that strongly correlates with the increase observed in equol production (ρ = 0.89) (Fig. 4).

Fig. 4: Relationship between phytoestrogen and alcohol consumption from January to April 2019.
figure 4

a, Equol production from isoflavone (top) and ethyl sulfate consumption (bottom) from January to April 2019. Box plots: centre lines indicate the median; whiskers indicate the range (minimum and maximum); limits are 25th and 75th quartiles. Black diamond represents mean with individual dots representing daily measurements. b, March 2019 displayed Monday to Sunday showing increases over the St Patrick’s Day holiday weekend. Statistical comparisons and significance (P ≤ 0.01) suggest strong positive relationship between equol and ethyl sulfate (ρ = 0.89 and P = 0.001) confirmed by Spearman’s rank-order non-parametric test and Mann–Whitney U non-parametric test of variability with BH correction for false discovery (0.05). Error bars represent standard deviation between duplicates (n = 2).


This study yielded corresponding wastewater-derived datasets on phytoestrogen intake as a proxy for a plant-based diet and the gut microbiome of local human populations. For all compounds examined, there was a considerable increase in consumption starting in January, suggestive of temporary lifestyle changes associated with individual New Years’ resolutions of the study population, such as to increase intake of plant-based foods to achieve optimal health28 (Fig. 1). Additionally, while the notable spike in equol production during the St Patrick’s Day holiday period in March 2019 could be from collective participation in consumption of traditional Irish foods, such as colcannon, fish and chips or corned beef and cabbage, which all contain varied amounts of phytoestrogens, it may more likely be due to an increase in acute alcohol consumption29. Interestingly, in a study comparing equol producers and non-equol producers, those who reported greater alcohol consumption were more likely to be stronger equol producers; however, the mechanism behind this is still unclear30. This unique relationship was tested here at the population scale (Fig. 4), exhibiting strong positive correlations between the alcohol metabolite, ethyl sulfate31, and equol, suggesting that alcohol consumption may serve as a driving factor behind equol production when consumed in high amounts and within a condensed timeframe; however, more studies are needed to confirm this phenomenon.

As mentioned, it is estimated that the average consumption rate of phytoestrogens in a Western society range between 1 and 3 mg per day per person; however, consumption can vary depending on life stage, dietary preferences, demographics and/or regular access to healthy fruits and vegetables32,33. Thus, this reported estimate of the United States average may not be entirely applicable to this study sewershed with a distinct demographic profile. For example, demographics across the nation consist of 60% white non-Hispanic, 5% Asian, 18% Hispanic or Latino, 13% Black and 1% American Indian/Alaska Native, whereas in this study, white non-Hispanic is 40%, with 25% Asian, 21% Hispanic or Latino, 7% Black and 4% American Indian/Alaska Native34. This difference in demographic distribution could result in the disparity between the estimated United States consumption rate and what was measured herein using WBE at the subsewershed level. Measured wastewater-derived isoflavone consumption was closer to the 3 mg per day ceiling than lignans (Fig. 1e), which could be attributed to the greater percentage of Asian residents within this catchment who reportedly consume a greater amount of isoflavones than in Westernized cultures35. These differences further demonstrate the importance of performing more contextually relevant and culturally competent measurements that reflect the population sample under study.

It is evident that human gut microbial interactions that occur in response to plant-based food consumption may hold substantial benefits for the human host. Evidence suggests an increase in regular consumption of plant-based foods can promote gut microbial growth amenable for production of enterolactone and equol, which offer protection against certain nutrition-related chronic diseases16. While it is currently understood that approximately 30% of Westernized regions are equol producers, a contextualized investigation could provide deeper insight into this statistic, such as the daidzein-to-equol (precursor-to-metabolite) ratio of 0.2 (20% equol producers) observed in this study. Specifically for enterolactone, studies relying on urinary analysis of individuals reported a noted increase in production to be associated with protective properties against cancer and cardiovascular disease, with use of antibiotics shown to decrease production, indicating the measured values are driven by gut microbes36. Studies have also found positive associations between intake of isoflavone-rich foods and subsequent urinary elevations in equol production11,37, affirming the role of diet in promoting growth of health-promoting commensals38. Thus, following a plant-based diet has been shown to strongly correlate with increased production of particular bacterial genera responsible for producing enterolactone and equol in individuals, including Ruminococcus, Eubacterium, Slackia, Bacteroides and Bifidobacterium11,39. These were detected and measured here in community wastewater, with similar trends when compared with phytoestrogen consumption (Fig. 3c). While microbial communities have been assessed in community sewage to understand overall human health40, this approach to investigating dietary behaviour using WBE has yet to be fully explored, and thus serves as a foundational framework for future analysis of population nutritional assessment (Fig. 5).

Fig. 5
figure 5

Case study-derived conceptual framework for integrating WBE into public health nutrition.

The chemical measurements reported herein are slightly higher than those found in a prior WBE study confirming the viability of measuring genistein, daidzein and enterolactone in monthly composites of raw wastewater entering a local wastewater treatment plant to assess diet25. However, this previous study collected one single 24 h composite sample per month from a wastewater treatment plant, followed by shipment before sample processing and analysis. Shorter hydraulic residence times, contextual ascertainment of data from subsewershed monitoring, multiple analytical methods and targets employed, and a rapid analysis of samples without delay probably contributed to the comparatively stronger and more detailed dataset observed in the present work. Finally, a cost analysis was conducted from this work, which yielded per-person costs of <US$1 with existing startup equipment, or ~US$2 if capital expenses are needed (Supplementary Table 5). This is considerably less than the estimated average cost for traditional nutritional assessment surveys41.

This study had limitations that should be noted. Nationwide average ranges from traditional population-level survey methods for phytoestrogen consumption were used for benchmark validation of WBE results; however, these datasets may not be a suitable comparison to the specific catchment area investigated as mentioned above. Additionally, samples in this study were collected according to compliance monitoring pre-determined by the partnering municipality as previously explained. Data obtained were utilized to assess overall yearly trends based on these daily measurements and aggregated per month for ease of visualization (approximately 84 samples per year), which has been indicated in prior work to be suitable for WBE42. The minimum number of days needed to represent the year is approximately 57 according to Cochran’s formula (10% precision, 90% confidence interval), thus, the number of sample collection days here are considered acceptable for this purpose (supplementary equation (1)) (ref. 43). While an alternative approach to sampling (for example, weekly) may have offered different insights, it is important to consider the resources available and weigh against the potential benefits when conducting a WBE study. Further, while trends could be garnered between microbial taxa and measured phytoestrogens, genus-level reporting may be limiting for population-level analysis, therefore more detailed investigation incorporating metagenomics is warranted in future work. Additionally, method detection limits and analyte recovery experiments were conducted at the start of this 2 year study. While instrument performance was continuously assessed by analysing standard curves made monthly, and use of isotopically labelled internal standards can assist in evaluating variability over time, it is acknowledged that matrix changes may have occurred over the study duration. Finally, use of subsewershed monitoring served to reduce potential in-sewer loss of analytes from the point of human excretion to the sampling location, and while temperature-influenced degradation investigated herein exhibited weak susceptibility (Supplementary Fig. 4), some analyte loss may still have occurred.

Overall, this study highlights the utility, importance and cost advantages of integrating multiomic WBE into population nutritional assessments. Here we report a successful demonstration of longitudinal investigation showcasing continuous signal detectability, considerable dynamic range and joint trending of wastewater-borne biomarkers that are modulated as a function of human diet and behaviour. Major findings of this work include: (1) multi-parameter community-scale datasets showing the interplay between human behaviour (for example, alcohol consumption on holidays and short-term dietary changes) and dietary indicators detectable in composited municipal wastewater; (2) association between wastewater-borne levels of phytoestrogens and related gut microbial composition; (3) the introduction of daidzein metabolite, equol, to offer a deeper interpretation of isoflavone consumption measured in wastewater and (4) potential relationships between simultaneously consumed foods that may serve as driving factors for subsequent production of associated metabolites. The data reported here suggest that the integration of WBE could offer a holistic model for public health nutritional assessment by providing contextual, non-invasive and actionable data for community stakeholders to inform on nutritional status, the need for and appropriate design of targeted strategies, and the success of such interventions chosen at differing scales (for example, school lunches, university campuses and urban populations).


Chemicals and reagents

Native analytical standards of genistein, daidzein and enterolactone were purchased from Sigma-Aldrich. Isotopically labelled analytical standard genistein-d4 was purchased from Cayman Chemical. Native analytical standard of equol was purchased from Santa Cruz Biotechnology. Native analytical standard of ethyl sulfate (EtS) as well as its isotopically labelled standard, sodium ethyl sulfate-d5 (EtS-d5) were purchased from Toronto Research Chemicals. Liquid chromatography-grade (LC-grade) water, methanol and acetone were obtained from Fisher Scientific, and LC-grade formic acid was purchased from Fluka Chemical Corp. Stock standard solutions were prepared in LC-grade methanol and stored at −20 °C, with the exception of EtS and EtS-d5, which were stored in water at 4 °C.

Sample collection and transport

The wastewater collection system in this sewer catchment is separate from the stormwater collection system, and drains south-west, with an average sewage retention time of approximately 20 min. Recorded measurements of wastewater temperatures unique to this catchment ranged between 20 °C and 30 °C depending on the time of year. Time-weighted, 24 h composite wastewater samples were collected daily from within the sewer collection system (that is, neighbourhood level) over seven consecutive days per month between August 2017 and July 2019 (n = 156), allowing for a robust indication of consumption trends over time given the number of observations per year42. An Avalanche automated refrigerated sampler (Teledyne, ISCO) was set to collect 60–100 ml of raw wastewater from within the sewer collection system every 15 min. Each 24 h composited sample was adequately mixed and transferred to two-litre high-density polyethylene bottles and immediately placed on ice to prepare for transport and same-day processing. Wastewater flow measurements were monitored by ISCO LaserFlow flow metres (Teledyne, ISCO), which monitored and recorded flow on-site at 2 min intervals. Flow measurements were obtained through FlowLink online software (v. 5.1) (Teledyne, ISCO).

Sample processing and analysis

Methods employed for chemical processing and analysis have been previously reported25,31. Briefly, duplicated 200 ml aliquots of each raw influent wastewater sample were spiked with 20 ng isotopically labelled internal standard and subsequently arranged on a Dionex Autotrace 280 Solid-Phase Extraction Instrument (Thermo Scientific) using Oasis Hydrophilic-Lipophilic Balance cartridges (Waters). Method blanks (deionized water) were extracted and analysed alongside each set of samples to determine potential contamination and assess method performance over time. Analytes of interest were then extracted using a 1:1 (vol/vol) methanol and acetone solution with 0.5% formic acid until a final volume of 4 ml was achieved, then stored at −20 °C until further analysis. For EtS, 10 ml of raw wastewater sample was aliquoted in 15 ml conical tubes and centrifuged at 4,000g for 10 min. Next, 500 μl of the supernatant were added to 490 μl of water along with 1 ng labelled standard (EtS-d5) to achieve a final extract volume of 1 ml ready for analysis by liquid chromatography–tandem mass spectrometry (LC–MS/MS).

For LC–MS/MS sample extract preparation (except EtS), 200 μl of each organic extract was aliquoted into glass amber vials and dried down entirely under nitrogen. Extracts were reconstituted first with 100 μl of LC–MS-grade methanol, followed by 100 μl of LC–MS-grade water, then lightly vortexed. Finalized extracts were analysed for targeted analytes using a Shimadzu Prominence 2100 high performance liquid chromatographer (Marlborough) paired to an AB Sciex API 4000 triple quadrupole mass spectrometer (Applied Biosystems) with electrospray ionization operating in negative mode. Analyte identification was achieved using compound-specific retention times and ion transitions from multiple reaction monitoring. Chromatographic separation was attained using a Symmetry C8 column (4.6 × 150 mm and 3.5 μm particle size), followed by a Symmetry VanGuard Cartridge (3.9 × 5 mm and 3.5 μm particle size) (Waters) (Supplementary Table 1).

Data analysis

LC–MS/MS data were acquired with Analyst 1.5 software (Applied Biosystems), where concentrations were calculated using isotope dilution. Calculated concentrations (μg l−1) were converted to mass loads (g day−1) using flow data accessed with FlowLink software (approximately 2,165,000 litres per day from August to May and 1,637,000 litres per day from June to July).

Population-normalized mass loads and subsequent consumption rates were produced using population estimates (approximately 9,800 from August to May and 6,900 from June to July) and estimated excretion values listed in Supplementary Table 3. Per capita daily genistein consumption (GC) was calculated using the following equation:

$${\mathrm{GC}} = \frac{{C_{\mathrm{G}} \times Q_{{\mathrm{Tot}}}}}{{CF_{\mathrm{G}} \times {\mathrm{Pop}}}}$$

where CG is the measured genistein concentration, QTot is the total daily volumetric flow rate and CFG is the correction factor (5; 20% excretion). Pop is estimated population. Per capita daily daidzein consumption (DC) was calculated using the following equation:

$${\mathrm{DC}} = \frac{{C_{\mathrm{D}} \times Q_{{\mathrm{Tot}}}}}{{{\mathrm{CF}}_{\mathrm{D}} \times {\mathrm{Pop}}}}$$

where CD is the measured daidzein concentration, QTot is the total daily volumetric flow rate and CFD is the correction factor (2.2; 45% excretion). Pop is estimated population. Per capita daily lignan consumption (LC) was calculated using the following equation:

$${\mathrm{LC}} = \frac{{C_{{\mathrm{ENT}}} \times \left( {{\textstyle{{{\mathrm{MW}}_{{\mathrm{LIG}}}} \over {{\mathrm{MW}}_{{\mathrm{ENT}}}}}}} \right)Q_{{\mathrm{Tot}}}}}{{{\mathrm{EF}}_{{\mathrm{ENT}}} \times {\mathrm{Pop}}}}$$

where CENT is the measured enterolactone concentration, MWLIG/MWENT is the ratio of molecular weights with the average of four main parent lignans common in the human diet; pinoresinol, lariciresinol, matairesinol and secoisolariciresinol (LIG) and enterolignan metabolite enterolactone (ENT), QTot is the total daily volumetric flow rate, EFENT is the urinary excretion of enterolactone (1.1 mg per day). Pop is estimated population. Per capita daily equol production (EC) was calculated using the following equation:

$${\mathrm{EC}} = \frac{{C_{{\mathrm{EQ}}} \times \left( {{\textstyle{{{\mathrm{MW}}_{{\mathrm{DAD}}}} \over {{\mathrm{MW}}_{{\mathrm{EQ}}}}}}} \right)Q_{{\mathrm{Tot}}}}}{{{\mathrm{EF}}_{{\mathrm{EQ}}} \times {\mathrm{Pop}}}}$$

where CEQ is the measured equol concentration, MWDAD/MWEQ is the ratio of molecular weights of the parent daidzein (DAD) and metabolite equol (EQ), QTot is the total daily volumetric flow rate, EFEQ is the urinary excretion of equol (2.7 mg per day). Pop is estimated population. Ethyl sulfate population mass loads followed the same equations as above to represent alcohol use.

The population of contributing individuals within this subsewer catchment were determined using previously reported methods44. Briefly, this catchment predominately comprises permanent residents, as estimated by 2010 census block group data. To factor in transient populations, employment data from Maricopa Association of Governments, resident and non-resident employment was examined. The following classifications were used: employees living outside of the city (non-resident and employed) and residents of the city (resident and employed). A portion of residents in this catchment are students attending the nearby university with housing contracts typically for the entire academic year (August to May). These student population estimates were obtained from publicly available campus resident data and estimates using changes in wastewater flow volume to make adjustments at the beginning and end of the academic year.

Statistics and reproducibility

Reported concentrations were determined on the basis of a minimum seven-point standard curve ranging from 0.05 to 2,000 μg l−1, and a minimum coefficient of determination of r2 = 0.99. Precision was expressed as relative percentage difference (RPD) (equation (5)). Instrument blanks were included every six to eight samples to assess analyte carryover, of which none was observed. Method detection limits were determined using methods outlined by the United States Environmental Protection Agency following recommendations by EPA Method 1694 (ref. 45) with daidzein, genistein and enterolactone previously reported and validated25. Average absolute recoveries (before normalization with stable isotope internal standards) for target analytes were determined on the basis of spike-recovery experiments in a peat moss matrix according to previously reported methods25,46 (Supplementary Table 2).

RPD was calculated using the following equation:

$${\mathrm{RPD}}{{{\mathrm{\% }}}} = {\mathrm{ABS}}\left( {\frac{{C_{{\mathrm{S}}1} - C_{{\mathrm{S}}2}}}{{C_{{\mathrm{S}}1} + C_{{\mathrm{S}}2}}}/2} \right) \times 100$$

where CS1 and CS2 are the measured concentrations in the sample and its associated duplicate.

Statistical analyses were performed using Microsoft Excel 2019, where Mann–Whitney U and Spearman rank-order non-parametric tests were conducted. To control for type I errors and correct for multiple tests, a Benjamini–Hochberg (BH) correction factor was applied (false discovery rate 0.05) using the following equation:

$${\mathrm{BH}} = \frac{i}{m}Q$$

where i is the rank assigned to the P value in the array, m is the number of comparisons and Q is the false discovery rate. To define seasonality, seasons were determined according to the National Geographic Society:47 fall (September, October and November), winter (December, January and February), spring (March, April and May) and summer (June, July and August). Formal randomization is not relevant for this type of study as the inherent nature of WBE randomly collects contributions of the individuals within any particular sewer catchment at any given time. Further, sample collection weekly campaigns (seven consecutive days per month) were pre-determined by the city, by which the researchers needed to comply. This could contribute to further randomizing the collected sample. Thus, no statistical method was used to pre-determine sample size, the experiments were not intentionally randomized and no data were excluded from the analyses. Additionally, formal blinding is not relevant for this type of study as the nature of WBE inherently and effectively blinds the researchers to the contributing population, regardless of geospatial scale. A major benefit of performing this type of work is the anonymous nature of the collected and analysed sample; individuals are unable to be identified. Thus, the resultant data are representative of the sewer catchment served as a whole.

Effect of temperature on in-sewer analyte degradation was assessed through statistical analysis based on real measurements. Reported ambient temperatures (average, minimum and maximum) for each sample collection day throughout this study were recorded. The relationship between recorded ambient temperatures and measured analyte signal in wastewater (μg l−1) for each analyte undergoing long-term monitoring (genistein, daidzein and enterolactone) was assessed using Spearman’s rank-order non-parametric tests (ρ < 0.50 weak; ρ > 0.50 < 0.70 moderately strong; ρ > 0.70 strong) (Supplementary Fig. 4).

Microbiome analysis

As a proof of concept for understanding human gut microbial interactions at the population level, a subset of samples (n = 12 months) was allocated for microbiome analyses. Approximately 84 previously frozen raw wastewater samples from 2018 (January–December; seven samples per month) were thawed, mixed and aliquoted into composites representing each month. Next, approximately 50 ml of each sample was loaded onto sterile 0.22 µm polycarbonate membrane filters (47 mm) (EMD Millipore) using a vacuum pump apparatus with in-line filter, discarding the filtrate. Each filter was then aseptically transferred into an individual bead tube. DNA extractions were performed using a QIAGEN DNeasy Power Soil Pro Kit, following the manufacturer’s instructions. The resultant DNA (50 μl) was immediately stored at −80 °C until further analysis. A whole-process negative extraction control (deionized water) was incorporated to account for potential contamination.

Bacterial community composition analysis was performed with next-generation sequencing. The 16S rRNA V4 region was polymerase chain reaction (PCR) amplified using the barcoded primer set 515f/806r (ref. 48), following the protocol by the Earth Microbiome Project ( for library preparation49. PCR amplifications for each sample were performed in duplicate, then pooled and quantified using the Accublue High sensitivity dsDNA Quantitation Kit (Biotium). A no/template control was included during the library preparation for extraneous nucleic acid contamination. After PCR, 200 ng of DNA per sample were pooled and cleaned using a QIA quick PCR purification kit (Qiagen). The pool was quantified by Illumina library Quantification Kit ABI Prism (Kapa Biosystems). The DNA pool was then diluted to a final concentration of 4 nM, then denatured and diluted to a final concentration of 4 pM with 25% of PhiX for quality control. Finally, the DNA library was loaded onto the Illumina MiSeq platform and run using the version 2 module (2 × 250 paired end) following the directions of the manufacturer.

Data sequences were analysed using QIIME 2 (version 2021.2) (ref. 50) for sequence quality control and feature table construction. The DADA2 plugin51 was used to filter and merge the forward and reverse reads. Sequences were then mapped to the Silva 138 SSURef NR99 Database52 (updated 27 August 2020) for microbial community composition analysis. A classifier was trained with the forward and reverse primers used in this study. The generated QIIME 2 files were imported into R (v. 4.1) using the R phyloseq package (version 1.36.0) (ref. 53), and contaminating amplicon sequence variants (ASVs) were identified and removed with the R decontam package (version 1.12.0) (ref. 54). ASVs were classified as contaminants if identified by either the frequency or prevalence methods (method = ‘either’). The relative abundances of individual genus relative to the abundances of total genera in each sample were visualized using the R ggplot2 package (version 3.3.5) (ref. 55).

Quantitative PCR (qPCR) was performed using a QuantStudio 3 (Applied Biosystems) instrument to quantify the total 16S rRNA gene as a proxy of bacterial concentration (16S rRNA gene copies per litre wastewater) in each sample. A previously published TaqMan-based assay was employed targeting the 16S rRNA gene using universal primers (BAC1055F 5′-ATGGYTGTCGTCAGCT-3′; BAC1392R 5′-ACGGGCGGTGTGTAC-3′) and probe (BAC1115Probe 5′′-FAM-CAACGAGCGCAACCC-3′-TAMRA) (ref. 56). All samples and controls were run in triplicate; standard curve (16S rRNA plasmids) ranged from 107 to 101 copies μl−1. Molecular no-template controls using RNAse/DNAse-free Ultrapure PCR-grade water (Invitrogen) were incorporated into each run.

Semi-quantitative calculations were achieved by multiplying the relative abundance of select genera (%) by the resultant copy numbers of the 16S rRNA gene per litre of wastewater informed by qPCR in each corresponding sample, resulting in the number of 16S rRNA genes belonging to each selected genera under investigation.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.