Introduction

Methods to characterize exposure to fecal contamination from contaminated drinking water in low- and middle-income settings have typically been limited to estimating fecal loading in the environment using indicators of fecal contamination1. These are proxy measures, in the sense that rather than measuring the actual ingestion of enteric pathogens associated with fecal contamination, they infer exposure by measuring fecal indicator bacteria (FIB) concentrations in the environment and are heavily based on assumptions on the interactions of individuals with that environment. In addition to the well-documented shortcomings of using FIB as a proxy for enteric pathogens2, the difference between the observed exposure, i.e., the exposure assigned from these proxy measures, and the true exposure represents a potential form of exposure measurement error. Other areas of environmental health have detailed how exposure measurement error may introduce bias and uncertainty in estimated exposure–outcome relationships, thereby obscuring true associations3. Correspondingly, fields such as air pollution epidemiology have seen increased emphasis on understanding errors associated with differences between individual or personal exposures and other proxy exposure measures used in health effects modeling4.

We recently compiled individual participant data (IPD) for a systematic review and meta-analysis. We matched household-level drinking water FIB concentrations as a proxy measure for individual-level enteric pathogen exposure to >43,000 diarrhea reports and >10,000 growth measures for children under the age of 5 years across 19 studies in low- and middle-income countries (Goddard et al., under review). This analysis was an update and expansion of previous meta-analyses that found mixed results on the relationship between FIB concentrations in drinking water and diarrhea and did not consider a more chronic health outcome such as child growth5,6,7. Our findings suggest that FIB concentrations in household drinking water are associated with both reported diarrhea (odds ratio (OR) 1.09; 95% confidence interval (CI) 1.04, 1.13) and lower height-for-age Z (HAZ) scores (HAZ −0.04; 95% CI −0.06, −0.01). Notably, we also observed moderate heterogeneity among studies in the strengths of association for both the diarrhea (I2 = 34%; 95% CI 0–62%) and linear growth analyses (I2 = 19%; 95% CI 0–63%).

A primary limitation of this analysis was potential error in outcome measurement. In these studies, most of the data are from non-blinded intervention protocols where caregiver-reported diarrhea and linear growth measures are subject to participant and enumerator bias8. Another possible source of bias and uncertainty in our analysis may be in part due to errors in the assigned exposure. Prior research studying the effects of exposure measurement error from proxy measures of exposure in air pollution epidemiology suggest that these proxy measures can introduce uncertainty and bias risk estimates toward no observed effect9,10,11. Similar effects of exposure measurement error have also been shown in other areas, such as chemical exposures12 and diastolic blood pressure measurements13. Based on these findings, it is possible that findings from our IPD analysis may also exhibit similar uncertainty and bias due to exposure measurement error from the use of FIB concentrations in household drinking water as a proxy for personal exposure to enteric pathogens.

Theoretically, there are a number of different potential sources of error in enteric pathogen exposure assessments. Some examples include (1) temporal or spatial variability in water quality, (2) assigning household- or community-level water quality to individuals, (3) the use of FIBs as proxies for enteric pathogens, and (4) processing errors (i.e., during sample collection, transport, or laboratory instrumentation errors). This is not an exclusive list of sources of error, and each source could be further broken down into underlying sources of error. Here we introduce a conceptualized model of potential sources of measurement error using a formal measurement error framework14. We demonstrate how such a framework might be evaluated and its ability to quantify the relative contributions of measurement error, using empirical data of drinking water across several global low- and middle-income settings.

Although quite limited, prior research on the effects of exposure measurement error on waterborne disease epidemiology has found preliminary evidence of regression dilution bias between 14% and 57% from the use of FIB on the relationship between fecal contamination in recreational water and swimming-associated illness15. Another study found that spatiotemporal variability in rainfall data attenuated subsequent associations between heavy rainfall and diarrhea between 35% and 45%16. These studies focused on single components of error, and we did not find any formal discussions pertaining to multiple sources of measurement error for exposure to fecal contamination in the peer-reviewed literature. Here we introduce an exposure measurement error framework to conceptualize multiple possible components of error. We adapt a framework for fecal contamination exposure in drinking water based on an approach presented in Zeger et al. to distinguish sources of exposure measurement error14. While our analysis borrows from extensive work examining air pollution health effects and is limited to fecal contamination as a proxy for enteric pathogens, we intend that this paper serves as an initial step for discussing ways of incorporating estimates of exposure measurement error for enteric pathogen exposures.

Zeger et al. consider sources of error in the assignment of ambient air quality to a population from central monitoring sites, a more distal measure of exposure than attempting to quantify personal exposures. While this framework is contextual to time series studies of air pollution health effects, we see parallels in the proxy measures commonly used for fecal contamination exposure assessments to central-site air pollution exposure assessments, although we do not claim this framework includes all possible sources of error. In the current example involving fecal contamination exposure and response, as in the air pollution design settings that Zeger et al. used, technical and logistical constraints as well as limited resources lead to an inability to obtain measures of true personal exposure x for individual i at time t. Instead, exposure may be estimated by measuring household FIB concentrations z at time t. In Fig. 1, we summarize the differences between measured FIB concentrations zt, the only component in this framework that is actively measured, and true personal exposure to fecal contamination xit as a proxy for fecal contamination exposure.

Fig. 1: Fecal contamination in drinking water exposure measurement error framework (adapted from Zeger et al.14).
figure 1

This framework outlines three components of error to describe the difference between true personalfecal contamination exposure Xit and measured fecal indicator bacteria concentrations Zt.

There is a difference between measured FIB concentrations in household drinking water zt and true personal exposure to fecal contamination xit—the exposure measurement error—which we split into three components of error in accordance with the Zeger et al. framework (Eq. 1).

$$x_{it} = z_t + \left( x_{it} - {\bar x_t} \right) + \left( {\bar x_t} - {z_t^ \ast } \right) + \left( {z_t^ \ast - z_t} \right)$$
(1)

where \(\left( x_{it} - {\bar x_t} \right)\) describes error from the difference in aggregate fecal contamination exposure across a population \({\bar x_t}\) (i.e., members of a household) and personal exposure xit; \(\left( {{\bar x_t} - z_t^ \ast } \right)\) describes measurement error from assigning household water fecal contamination \(z_t^ \ast \) as the exposure and not considering other exposures, such as exposure to fecal contamination experienced in the community \(w_t^ \ast\) that may make up the aggregate exposure \({\bar x_t}\) across a population; and \(\left( {z_t^ \ast - z_t} \right)\) describes measurement error from the difference in measured household water FIB concentrations zt as an indicator of fecal contamination and the true levels of fecal contamination in household drinking water \(z_t^ \ast\).

Our current analysis sought to address the second and third components of this framework with the goal of examining how they may affect exposure–outcome relationships for exposure to fecal contamination in drinking water. This analysis does not seek to validate any single exposure characterization method but rather to describe potential sources of error in current methods to help inform future method development. In conducting this analysis, we sought to assess evidence and magnitude of exposure measurement error from: (1) from assigning household-level FIB concentrations \(z_t^ \ast\) as the exposure and not considering community-level FIB concentrations \(w_t^ \ast\) (component 2) on health effect estimates of FIB concentrations in drinking water on child diarrhea and (2) from using single FIB measure zt compared to repeated longitudinal measures \(z_t^ \ast\) (component 3) on health effect estimates of FIB concentrations in drinking water on linear growth.

Results

The dataset we compiled included studies of varying sizes with FIB data for drinking water available from 98 to 2137 households per study (Table 1). Four studies included only cross-sectional water sample collection, but most had collected repeated water samples over time with samples typically being collected monthly, quarterly, or annually. To evaluate findings from our simulations, matched diarrhea data were available from all included studies and matched growth data were available from seven studies.

Table 1 Summary characteristics of the included studies.

Household versus community exposure

The simulations comparing household to community exposure included 37,119 observations (82% rural, 18% urban) from 16 of the included studies, with between 1 and 12 observations per child. Studies by Arnold et al.26, Brown et al.36, and Pickering et al.46 were excluded from these simulations because the data were not compatible for estimating median community water FIB concentrations (i.e., specific sample collection dates were not available or only one sample was collected in every community). The simulations found that if children experienced exposure to fecal contamination in drinking water outside of their household, then using household FIB concentrations \(z_t^ \ast\) exclusively as the error-prone exposure variable may lead to an attenuation in the observed FIB–diarrhea relationship even at low levels of community exposure (Fig. 2). If the aggregate exposure \(\bar x_t\) was represented by 90% household and 10% community exposure, we found that the estimated odds of diarrhea for 1 − log10 higher FIB concentrations in drinking water, using household FIB concentrations in drinking water \(z_t^ \ast\) as the error-prone exposure variable, were OR = 1.32 compared to the assigned odds of diarrhea OR = 1.50 (α = 0.69). This trend continued as the assumed aggregate exposure \(\bar x_t\) consisted of increasing levels of community exposure \(w_t^ \ast\). If the aggregate exposure was represented by 100% community exposure, we found that the estimated odds of diarrhea, using household water as the error-prone exposure variable, were OR = 1.06 (α = 0.15). Our findings were consistent between urban and rural areas. Sensitivity analyses for these findings are provided in Supplementary Tables S1 and S2.

Fig. 2: Simulated odds of diarrhea for 1 − log10 higher FIB concentrations in drinking water when only household water FIB concentrations \({\it{z}}_{\it{t}}^ \ast \) was used as the exposure variable for different household–community exposure scenarios and the corresponding attenuation factors.
figure 2

Exposure scenarios begin with 100% of exposure assumed to be within the household and subsequently replacing household exposure with community exposure \({\it{w}}_{\it{t}}^ \ast \) in 10% increments.

For the evaluation with empirical diarrhea data, we used the same dataset as we did for the simulations, with the exception of using field-reported diarrhea instead of assigning diarrhea cases. In the combined analysis, we found that household water fecal contamination \(z_t^ \ast\) assigned exclusively as the error-prone exposure variable attenuated the association between FIB concentrations in drinking water and diarrhea in comparison to a mixture of household and community water fecal contamination up to assigning 20% household and 80% community water FIB concentrations (Fig. 3). However, this attenuation was not as pronounced as suggested in the simulations, with the greatest attenuation factor found to be α = 0.79 (60% household, 40% community exposure). The stratified results suggested that in urban areas the attenuation from using household water fecal contamination \(z_t^ \ast\) was limited to scenarios up to 70% household and 30% community water quality, and in rural areas, it extended to 10% household and 90% community water fecal contamination.

Fig. 3: Estimated odds of diarrhea for 1 − log10 higher FIB concentrations in drinking water for different household–community exposure assignments and the corresponding attenuation factors.
figure 3

Attenuation factors are derived from comparing the log odds of diarrhea for a 100% household exposure assignment to the log odds of diarrhea from community assignments.

After stratifying by age, the odds of diarrhea for 1 − log10 higher FIB concentrations in drinking water for children aged 0–23 months were similar to the odds of diarrhea for children aged 24–72 months when household water fecal contamination \(z_t^ \ast\) was assigned as the exposure variable. However, when replacing household water fecal contamination with community water fecal contamination \(w_t^ \ast\) in 10% increments, there was a trend of higher odds of diarrhea for children aged 24–72 months but not for children aged 0–23 months for up to 20% household and 80% community water quality (Fig. 4).

Fig. 4: Estimated odds of diarrhea for 1 − log10 higher FIB concentrations in drinking water for different household–community exposure assignments, stratified by non-ambulatory and ambulatory pre-school-aged children, and the corresponding attenuation factors.
figure 4

Attenuation factors are derived from comparing the log odds of diarrhea for a 100% household exposure assignment to the log odds of diarrhea from community assignments.

Single versus multiple samples

The simulations comparing the effects of defining exposure with a single water sample compared to multiple samples included 24,806 unique children from the 19 included studies that had ≥1 matched FIB concentration estimates in drinking water. These simulations indicated that using FIB concentrations from single water samples zt compared to the median of multiple samples attenuated estimated differences in HAZ scores associated with fecal contamination in drinking water, with similar findings between wet and dry season months (Fig. 5). If the household water fecal contamination \(z_t^ \ast\) was represented by the median of two samples, then randomly selecting one of the two samples almost halved the observed difference in HAZ scores associated with FIB concentrations in drinking water from the assigned HAZ = −0.20 to HAZ = −0.11 (α = 0.56). This finding was more pronounced when household water fecal contamination \(z_t^ \ast\) was represented by three (HAZ = −0.10; α = 0.52) or four samples (HAZ = −0.09; α = 0.43). Sensitivity analyses for these findings are detailed in Supplementary Tables S3 and S4.

Fig. 5: Simulated difference in HAZ scores for 1 − log10 higher FIB concentrations in drinking water and the corresponding attenuation factors.
figure 5

Simulations compare the use of single measures of household water FIB concentrations zt as the exposure variable for four scenarios where 1, 2, 3, or 4 samples represent household water fecal contamination \({\boldsymbol{z}}_{\boldsymbol{t}}^ \ast \).

For the evaluation of these simulations, we were limited by empirical linear growth data availability, with linear growth data available for 3311, 743, and 233 children with 2, 3, and ≥4 matched water samples, respectively. As a result, we were not able to stratify this analysis by season as we did with the simulations, and the baseline effect estimates and corresponding uncertainties around these estimates vary by group because they represent different samples (Fig. 6). The difference in HAZ scores associated with higher FIB concentrations in drinking water was consistently closer to zero (i.e., no effect) when using a single sample zt compared to the median of multiple samples. Similar to findings from the simulations, using a single sample compared to the median of two samples approximately halved the estimated difference in HAZ scores associated with FIB concentrations in drinking water (α = 0.56), and this was more pronounced for the median of three or four samples (α = 0.54; α = 0.38).

Fig. 6: Estimated difference in HAZ scores with 95% confidence intervals for 1 − log10 higher FIB concentrations in drinking water and the corresponding attenuation factors.
figure 6

Estimates compare randomly selecting single measures of household water FIB concentrations zt as the exposure variable to estimates with 1, 2, 3, or 4 available samples.

Discussion

We adapted and introduced a framework to assess measurement error when characterizing child exposure to fecal contamination in drinking water, i.e., the difference between exposure assigned by proxy measures of exposure and the true exposure experienced by an individual. These frameworks can help prioritize current research gaps by identifying areas within fecal exposure assessments that are limited or missing and by quantifying components of error that are most critical to biases in waterborne disease epidemiology. Ideally, generating improved exposure data could lead to a better understanding of the true associations between fecal contamination along different pathways and child health. This analysis primarily serves as an initial effort to apply an exposure measurement error framework within the field of waterborne disease epidemiology. In so doing, we aspire to understand the presence and magnitude of several sources of measurement error. Our analyses showed how components of error may attenuate estimated exposure–outcome relationships using empirical data from an extensive dataset of studies collected in low- and middle-income settings. Our findings provide indication that the previously reported odds of diarrhea and reduction in HAZ scores associated with fecal contamination in drinking water (Goddard et al., under review) may be prone to regression dilution bias and thus may be underestimating true exposure–outcome relationships.

We introduced three different components of exposure measurement error. The first component may emerge from assigning household water fecal contamination data to individual household members who interact with their environment differently. Substantial heterogeneity of between-child interactions with their domestic environment has been shown in both urban and rural settings for different age groups in the 0–5-year age range17,18. In addition, differential drinking water ingestion rates by age can lead to heterogeneity in the ingested doses of fecal contamination19, and infants may experience limited exposure to household water from ingestion before weaning20. This may lead to differences in dose–response between members of the same household. To test how the first component of this exposure measurement error framework can be applied, small controlled panel studies are needed to generate estimates of personal exposure and compare those to household-level estimates9.

The second component of error may occur when exposure to fecal contamination in drinking water outside of the household is not incorporated into exposure assessments. A recent study characterizing fecal exposure in Accra, Ghana as part of the SaniPath research program reported widespread fecal contamination in both domestic and public domains21. Measurement error from assigning household water fecal contamination as the exposure does not only depend on the presence of fecal contamination in the public domain but also on the study population’s interaction with water in that domain. To our knowledge, no published studies have quantified child exposure to contaminated water in different microenvironments in the domestic and public domains, but time–activity analyses in air pollution studies have long been conducted for exposure assessments22 and have shown that children spend extensive amounts of time outside of their domestic environment23.

Findings from our diarrhea simulations suggest that, if children are experiencing exposure to fecal contamination in drinking water outside of their households, then using household water FIB concentrations as a proxy for their overall exposure may result in attenuated health effect estimates for FIB concentrations in drinking water and diarrhea. While evaluating these results, we found that this attenuation may be more pronounced in children above the age of 2 years. This suggests that children under the age of 2 years may be experiencing most of their exposure within the confines of their homes, so household-level exposure assessments may be appropriate for this age group. However, for older ambulatory children exposure outside of the home might be more readily considered.

The third component of error may not only emerge from limited precision associated with methods to characterize FIB concentrations24,25, i.e., from variability in water quality measurements due to sampling and laboratory processing methods, but can also stem from temporal differences in water quality. FIB levels in household water can vary on a weekly, daily, and even hourly basis26. For an outcome such as diarrhea that is normally acute, the biologically relevant household water fecal contamination levels might be representative of the fecal contamination levels during the incubation period of enteric pathogens found in water, which depending on the pathogen can vary from a matter of hours to up to a month27. If water samples are collected on the same day as diarrheal disease data, the measured FIB concentrations on that day may not be representative of the biologically relevant fecal contamination in the lead up to a diarrhea episode. These discrepancies could be due to environmental factors, such as short-term weather changes like extreme rainfall events28, or human factors, such as water treatment behavior change in response to a diarrhea episode29,30. For chronic outcomes such as child growth, the biologically relevant household water fecal contamination likely needs to consider longer-term fecal contamination exposure, which may not be adequately represented by single or a few repeat measurements of household water fecal contamination, due to short-term and seasonal variability in fecal contamination in drinking water31.

Our simulations suggest that long-term household water fecal contamination may not be adequately represented by a single sample and hence can result in attenuations of the health effects of FIB concentrations in drinking water on child linear growth. These results were consistent with our evaluations using empirical growth data. While our previously reported IPD analysis found a significant association between fecal contamination and linear growth, 70% of the sample population only had a single matched water quality measure available to characterize exposure. The findings from this analysis imply that the reported effect sizes may be attenuated, and fecal exposure assessments may consider characterizing fecal contamination using multiple longitudinal samples to estimate more biologically relevant exposure.

The results from our analyses need to be interpreted with caution. First, due to data availability we were limited to applying this framework to two sources of error. There are many more possible sources of error that we were not able to consider here, such as assigning household-level exposures to individuals and the use of FIB as proxies for enteric pathogens. Second, this analysis was limited to quantifying the effects of measurement error on the magnitudes of health effects and not on the precision of those effect estimates. Uncertainty in health effect estimates introduced by exposure measurement error may obscure associations where they exist, thus increasing the likelihood of false-negative findings. Third, the current framework is limited to drinking water, but there are a number of other important fecal–oral transmission pathways, such as hands, food, soil, fomites, and flies. Findings from our IPD analysis suggest that fecal contamination along select pathways is associated with child diarrhea and growth, so a similar framework could be applied to other pathways to test whether those findings may have suffered from regression dilution bias. Fourth, we did not have access to repeated water samples within the shorter timeframe of pathogen incubation periods for acute gastroenteritis, so were not able to quantify measurement error on health effects from FIB concentrations in drinking water on diarrhea from the use of single samples used to estimate household water fecal contamination.

Our results suggest that exposure measurement error can contribute to attenuated fecal exposure–outcome relationships for outcomes that are typically acute, such as diarrhea, as well as for more chronic outcomes such as linear growth. Fecal exposure assessments in drinking water may consider exposure outside of the household as well as attempting to characterize fecal contamination with repeat samples to account for variability in water quality. They may leverage measurement error frameworks to design exposure assessments that are more proximal to the true exposure experienced by individuals, which in turn may inform the design of more effective interventions to reduce waterborne disease burdens.

Methods

Data

We used data from 19 studies conducted in South America, Sub-Saharan Africa, and South and South-East Asia29,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49. We requested permission from data owners for use of these data for this study. Eligible datasets included variables describing FIB concentrations in household drinking water, child age, and intervention status. We included children aged 0–72 months. Datasets also included unique anonymized identifiers for each community, household, and child. For the diarrhea analysis, we defined community water fecal contamination levels for a given household on a specific day as the median household water FIB concentrations of all other households in its community on the same day. We generated a variable for each city or collection of communities in rural areas describing whether water quality data was collected in a wet or dry season month by using the 30-year average monthly precipitation from the WorldClim dataset50 and designating wet season months as those where average precipitation was >60 mm and dry season months if average precipitation was <60 mm51. For the diarrhea analysis, we used household water fecal contamination data in long format by matching single household water FIB concentration observations zt to child survey data if they were collected on the same day or up to 7 days before the survey was conducted and then generated different scenarios for the aggregate fecal contamination \(\bar x_t\) by incorporating the median community water fecal contamination. For the growth analysis, we used household water FIB concentrations data in wide format by matching all available water samples collected over the course of a child’s life up to the day anthropometric measurements were taken.

Analytical approach

We used a two-tiered analytical approach to examine evidence and magnitude of random exposure measurement error. First, in a simulated analysis we randomly assigned health outcomes (diarrhea cases and HAZ scores) to each observation with an estimated exposure and then regressed those outcomes on the error-prone exposure variables, represented by the measured proxies of exposure. Exposure was assigned based on household water FIB concentration measurements, so the simulations retained existing correlations between communities, households, and individuals. Evidence and magnitude of exposure measurement error was assessed by estimating the attenuation factor associated with the error-prone exposure variable52. Second, we evaluated findings from our simulations by using empirical health outcome data from the same datasets and regressing it on both the estimated exposure and error-prone exposure variables. All analyses were conducted in R version 3.653.

Household versus community exposure

We simulated the effect of exclusively assigning household water fecal contamination \(z_t^ \ast\) for individual exposure, if estimated exposure is actually a combination of both household and community water fecal contamination \(w_t^ \ast\), by:

  1. 1.

    Randomly generating diarrhea cases for each included child with a combination of household- and community-level drinking water fecal contamination as the aggregate drinking water fecal contamination \(\bar x_t\) experienced by a child, using the Bernoulli distribution where the log odds of diarrhea dijkl for child i in household j in community k and study l is given by:

    $${\mathrm{logit}}\left( {{d}_{ijkl}} \right) = \beta _0 + \beta _1{\mathrm{FIB}}_{ij} + \beta _2{\mathrm{Treat}}_{ij} + \beta _3{\mathrm{Age}}_{i} + \beta _4{\mathrm{Season}}_{ijk} + \mu _{ijkl} + \mu _{ijk}$$
    (2)
    $$\mu _{ijkl} \sim N\left( {0,0.6} \right);\mu _{ijk} \sim N\left( {0,0.3} \right)$$
    $${p}_{ijkl} = \frac{{{\mathrm{e}}^{d_{ijkl}}}}{{1 + {\mathrm{e}}^{d_{ijkl}}}}$$
    $${d}_{ijkl} = {\mathrm{Bernoulli}}({ijkl},p_{ijkl})$$
  2. 2.

    Assuming that (1) community-level drinking water fecal contamination is represented by the median household water FIB concentrations in all other community households; (2) baseline odds of diarrhea β0 for this population is 0.15; (3) odds of diarrhea for 1 − log10 higher FIB concentrations in drinking water β1 is 1.5; and (4) odds of diarrhea for children receiving an intervention β2, child age β3 (in years), and for data collected in the wet compared to the dry season β4 are 0.9, 0.8, and 1.2, respectively. Effect estimates were broadly based on model outputs from our IPD analyses, although we assumed higher odds of diarrhea for FIB concentrations in drinking water because we hypothesize that the effect estimate for the exposure–outcome relationship in our IPD analysis may have been suffering from regression dilution bias54. The model accounted for clustering at the study-level μijkl and community-level μijk.

  3. 3.

    Fitting multilevel generalized mixed effects models with the assigned diarrheal cases and replacing the combined household and community drinking water FIB concentrations \(\bar x_t\) with household-level FIB concentrations \(z_t^ \ast\) exclusively as the error-prone exposure variable.

  4. 4.

    Calculating the attenuation associated with the estimated log odds of diarrhea (β1*) from assigning household-level FIB concentrations \(z_t^ \ast\) exclusively as the error-prone exposure variable, compared to the assigned log odds of diarrhea (β1 = log(1.5)) if combined household and community drinking water FIB concentrations \(\bar x_t\) represent the exposure: \(a = \frac{{\beta _1^ \ast }}{{\beta _1}} = \frac{{\beta _1^ \ast }}{{\log \left( {1.5} \right)}}.\)

  5. 5.

    Repeating simulations for a range of exposure scenarios by adding community water fecal contamination \(w_t^ \ast\) in 10% increments, starting with 100% household water fecal contamination and ending with 100% community water fecal contamination representing the estimated exposure.

  6. 6.

    Stratifying the combined analysis: As reported previously, estimated odds of diarrhea for a 1 − log10 increase in FIB concentrations in drinking water was higher in urban compared to rural settings (Goddard et al., under review), so we stratified the simulation by urban versus rural areas to differentiate whether exclusively assigning household drinking water fecal contamination \(z_t^ \ast\) may introduce more error in one setting compared to the other.

  7. 7.

    Conducting sensitivity analyses: Assessed the effects our assumptions had on the simulation findings by repeating the simulations with (a) higher and lower assumed odds of diarrhea for higher FIB concentrations in drinking water and (b) using the highest and lowest community water FIB concentrations instead of the median.

To evaluate findings from the simulations, we applied empirical diarrhea data by:

  1. 1.

    Beginning with household water fecal contamination \(z_t^ \ast\) as the estimated exposure and fitting a multilevel generalized mixed effects model to estimate the odds of diarrhea for 1 − log10 higher FIB concentrations in household drinking water.

  2. 2.

    Replacing household water fecal contamination \(z_t^ \ast\) with community water fecal contamination \(w_t^ \ast\) in 10% increments and fitting the same regression model with each new exposure assignment.

  3. 3.

    Calculating the attenuation associated with the log odds of diarrhea (β1*) from assigning household-level FIB concentrations exclusively as the error-prone exposure variable, compared to effect estimates that combine household/community water fecal contamination (β1).

  4. 4.

    Stratifying the analysis: In addition to stratifying by rural versus urban areas, we also stratified by children aged 0–23 and 24–72 months to consider how child mobility may modify the effect of assigning community water quality to exposure. We hypothesized that children aged 0–23 months are mostly non-ambulatory and spend the majority of their time within the confines of their home, and pre-school children aged 24–72 months are ambulatory and spend their time both in their home and within the confines of the community.

Single versus multiple samples

We simulated the effect of assigning a single measure of FIB concentrations in drinking water zt as the error-prone exposure variable by

  1. 1.

    Randomly generating expected HAZ scores with the estimated household water fecal contamination \(z_t^ \ast\) represented by the median household water FIB concentrations from repeat samples, using the following model where the difference in HAZ scores HAZijkl for child i in household j in community k and study l is given by:

    $${\mathrm{HAZ}}_{ijkl} = \beta _0 + \beta _1{\mathrm{FIB}}_{ij} + \beta _2{\mathrm{Treat}}_{ij} + \beta _3{\mathrm{Age}}_i + \mu _{ijkl}\,+ \in _{ijkl};$$
    (3)
    $$\mu _{ijkl} \sim N\left( {0,0.5} \right); \in _{ijkl} \,\sim N(0,1)$$
  2. 2.

    Assuming that the (1) mean baseline HAZ score β0 in this population is −1.6; (2) difference in HAZ score for 1 − log10 higher median FIB concentrations β1 is −0.2; (3) difference in HAZ score for children receiving an intervention β2 and for child age β3 (years) are 0.1 and −0.05, respectively; and (4) HAZ scores follow a normal distribution. The model accounted for clustering at the study-level μijkl.

  3. 3.

    Fitting multilevel generalized mixed-effects models with the assigned HAZ scores and replacing the estimated household water fecal contamination \(z_t^ \ast\) represented by the median household water FIB concentrations from repeat samples with a randomly chosen single measure of household water FIB concentrations zt as the error-prone exposure variable.

  4. 4.

    Calculating the attenuation associated with the estimated difference in HAZ score (β1*) from randomly choosing a single measure of water quality zt as the error-prone exposure variable, compared to the assigned difference in HAZ score (β1 = −0.2), if the estimated exposure is represented by repeat samples of household water fecal contamination \(z_t^ \ast\): \(\alpha = \frac{{\beta _1^ \ast }}{{\beta _1}} = \frac{{\beta _1^ \ast }}{{ - 0.2}}.\)

  5. 5.

    Repeating the simulations for children with at least two, three, or four matched household water FIB concentration measures making up the median household water fecal contamination \(z_t^ \ast\). We did not have sufficient data to conduct these simulations with more than four matched water samples.

  6. 6.

    Stratifying the analysis: Previous research has found that fecal contamination in drinking water sources in low-income countries is higher in the wet season compared to the dry season55, so we stratified these simulations by season to examine whether error introduced from variability in water quality is greater in one season compared to the other.

  7. 7.

    Conducting sensitivity analyses: Assessed the effects our assumptions had on the simulation findings by repeating the simulations with (a) higher and lower assumed difference in HAZ scores for higher FIB concentrations in drinking water and (b) using the highest and lowest drinking water FIB concentrations from the repeat samples instead of the median.

To evaluate findings from the simulations, we applied empirical linear growth data for a subset of children in our dataset where HAZ scores were available by:

  1. 1.

    Fitting multilevel generalized mixed-effects models with the median household water fecal contamination \(z_t^ \ast\) from repeated measures of household FIB concentrations as the exposure variable. Repeating this for two, three, and four repeat measures.

  2. 2.

    Fitting the same models after randomly selecting a single measure of household water FIB concentrations zt as the error-prone exposure variable from the repeat measures.

  3. 3.

    Calculating the attenuation associated with the estimated difference in HAZ score (β1*) from randomly selecting a single measure of household water FIB concentrations zt as the error-prone exposure variable, compared to effect estimate (β1) from the median of repeat samples of household water fecal contamination \(z_t^ \ast\) as the exposure variable.

  4. 4.

    Stratifying the analysis: We conducted the same stratification by season as we did for the simulations.