## Introduction

Since the 1980s, quantitative microbial risk assessment (QMRA) has been increasingly used for quantifying the microbial safety of drinking water.1,2 The microbial exposure (i.e., the dose) is calculated from the pathogen concentration in drinking water and the consumption of unboiled drinking water. The risk of infection is calculated from the exposure to pathogens and the relationship between dose and response. In many studies, the variability of each parameter, such as the pathogen concentration in the source water and the removal and inactivation efficacy of the water treatment, is described by a probability density function (PDF). The yearly risk of infection is quantitatively estimated by a Monte Carlo simulation. QMRA methodology has been developed and improved by many studies,2 and has demonstrated its usefulness in designing water treatment processes.3,4,5

After conducting a QMRA, sensitivity analysis is performed for examining the relative influence and importance of components (i.e., variables) within the simulation model on the outcome (i.e., risk estimates). In general, the purposes of sensitivity analysis are: prioritization of potential control points in the system; identification of key sources of uncertainty and variability; refinement and verification of the QMRA model; and conditional analysis of the QMRA model (“what if” scenario analysis and identification of factors contributing to high exposure or risk).5

In QMRA, two pivotal values—the pathogen concentration in drinking water and the volume of unboiled water consumed per day—are used for calculating the pathogen dose that the consumers are exposed to. Mathematical models such as the Poisson distribution and the lognormal distribution have been proposed to account for the variability in water consumption within a population. Among the Poisson, the exponential, the gamma and the lognormal distribution, the Poisson distribution has often been recommended for use in QMRA.6 Based on datasets from different countries, the Poisson distribution was found to be a better fit than the lognormal distribution suggested by Roseberry and Burmaster.7 The Poisson distribution also has the advantage of having only one parameter, and is more suited for discrete datasets.

It is obvious that the variability in water consumption affects risk estimates. Bastos et al.8 evaluated the impact of water consumption on risk estimates. Six different models applied to water consumption data were compared in their study. Consequently, the sensitivity analyses demonstrated that the volume of water consumed per day had a significant impact on risk estimates.

In general, the fitting of statistical probability distribution functions to the consumption data is examined. As a result, a distribution that fits best to the data is chosen. However, researchers have so far not discussed the effects of the statistical methods used for describing the distribution of water consumption. In particular, it should be noted that there are some non-consumers (zero mL/day of consumption) within a population, which has been always shown by questionnaire surveys. In addition, in some analyses, consumption of less than, e.g., 20 mL/day is considered to be zero.6 Among statistical distributions, there are distributions that differentiate a fraction of non-consumers and do not differentiate it.

This study investigated the effects of the statistical methods used to analyze the water consumption data on the results of sensitivity analysis. Even if the statistical methods do not affect the yearly risk of infection, a statistical method that does not confuse the consequence of sensitivity analysis needs to be used. Our study demonstrated that zero value of water consumption resulted from a fraction of non-consumers had a great impact on sensitivity analysis. This means that not analyzing the water consumption data using an appropriate statistical method could adversely influence sensitivity analysis. Our study suggests an appropriate model to describe the variability in water consumption for conducting a QMRA.

## Results

### Statistical distribution of unboiled water consumption

To account for the variability in water consumption within a population, a Poisson distribution has often been recommended for conducting a QMRA.6 The continuous data in milliliters per day obtained in Osaka City were translated into discrete values of glasses per day, assuming a glass to be 250 mL. Figure 1 shows the constructed Poisson distribution with a mean value (rate in the Poisson model) of 1.31 glass/day (equivalent to 327 mL/day). This Poisson model shows that 27% of people do not drink tap water, although the fraction of non-consumers according to the questionnaire survey was 8.2% as described in 'Methods'. The other PDF—an exponential distribution, which was applied instead of the Poisson distribution—will be discussed in the next section.

### Sensitivity analysis

Figure 2a shows the result of the sensitivity analysis performed on the concentration of Escherichia coli (E. coli) in treated water. It shows that the variable that has the highest impact on the concentration of E. coli in treated water is the inactivation efficacy of advanced oxidation process (AOP) with ozone and ultraviolet light, while the variable that has the second highest impact is the concentration of E. coli in source water. Among the six treatment steps of coagulation and sedimentation, rapid sand filtration (RSF), AOP, cation exchange, anion exchange, and chlorination, AOP exhibited the highest impact, implying that the rank correlation coefficient between the inactivation efficacy of AOP and the concentration of E. coli was the highest because the inactivation efficacy by AOP significantly varies from 2.49 to 8.59 log10 as described in 'Removal efficacies of the treatment steps'. Thus, the result of the sensitivity analysis indicated that reliable AOP inactivation is imperative for the reliable production of water with low concentrations of E. coli.

Figure 2b shows the result of sensitivity analysis performed on the E. coli dose. It is obvious that water consumption has the highest impact on the E. coli dose. Compared to the effect of water consumption, the contributions of the other variables are low. The E. coli dose is calculated just by multiplying the concentration of E. coli in treated water by the water consumption; however, the result of the sensitivity analysis was very different from Fig. 2a. Another variable for calculating the yearly risk of infection is the ratio of Campylobacter jejuni (C. jejuni) to E. coli (C/E ratio) that is described by the lognormal distribution as explained in 'Ratio of C. jejuni to E. coli (C/E ratio)'. It was found that water consumption also affects the yearly risk of infection significantly, since the impact of the C/E ratio was not so large (see Supplementary Figure 1).

A continuous distribution, instead of the Poisson distribution, was applied to the unboiled water consumption data collected in Osaka City. As a result, the exponential distribution showed an adequate fit as shown in Fig. 3. The rate of the applied exponential distribution was determined to be 3.06 × 10−3. The yearly risk of infection was calculated to be 3.24 × 10−10 infection/person/year as shown in Table 1, and this value was comparable to the value obtained using the Poisson model (3.16 × 10−10 infection/person/year in Table 1). It was found that the mean yearly risk of infection did not change even if the PDF changed. Since the Poisson model suggests that 27% of the people do not drink cold tap water, the probability of infection for these non-consumers is zero. On the other hand, the exponential model gives lower, non-zero limits, for the E. coli dose and the yearly risk of infection. As a result, the 2.5 percentile of the yearly risk of infection was estimated to be 4.14 × 10−19 infection/person/year.

Figure 2c shows the result of the sensitivity analysis for the yearly risk of infection. It is obvious that this result is very different from Fig. 2b (and Supplementary Figure 1). AOP has the highest contribution, and water consumption has the fourth highest impact with just 6.0% of the contribution to variance. The same result was obtained for the sensitivity analysis on E. coli dose, which is the variable before multiplying by the C/E ratio (see Supplementary Figure 2). Although the yearly risk of infection does not change, it should be noted that the statistical methods applied to the water consumption data significantly affect the results of the sensitivity analysis.

The impact of water consumption data was further examined. In real life it is difficult to consume absolutely no tap water; it could happen incidentally, when taking a shower, brushing teeth, and so on. In our study, we assumed that there was incidental water intake during tooth brushing. The volume of incidental water intake has been set for various activities, such as swimming in the river or sea, diving, and playing golf (by touching a golf ball on a lawn after it was irrigated with treated wastewater).5,9 Based on the aforementioned study, the incidental water intake during tooth brushing was assumed to be 1 mL and 10 mL. Therefore, it was assumed that even non-consumers ingest 1 mL/day or 10 mL/day of cold tap water, and these volumes were added to the water consumption value in the Poisson model.

The results of calculating the yearly risk of infection with these assumptions are summarized in Table 1. When the volume of incidental intake was set to 1 mL, there was virtually no difference in the mean value (3.17 × 10−10 infection/person/year) and the 97.5 percentile (1.38 × 10−10 infection/person/year). In contrast, because there are lower limits for the E. coli dose and the yearly risk of infection, the value for the 2.5 percentile was 5.02 × 10−20 infection/person/year and not zero. When the volume of incidental intake was set to 10 mL, both the mean value and the 97.5 percentile increased slightly to 3.27 × 10−10 infection/person/year and 1.56 × 10−10 infection/person/year, respectively. The 2.5 percentile also increased to 2.65 × 10−19 infection/person/year.

The result of the sensitivity analysis for the yearly risk of infection in the case of 1 mL (Supplementary Figure 3) showed that the contribution of water consumption to the variance was reduced to 20.2%, and it had the third highest impact, while water consumption had the highest impact in Fig. 2b (and Supplementary Figure 1). Figure 2d shows the result of the sensitivity analysis in the case of 10 mL. It indicates that the contribution of water consumption had the fourth highest impact and was drastically reduced to 9.8%. The same result was obtained for the sensitivity analysis on E. coli dose, which is the variable before multiplying by the C/E ratio (see Supplementary Figure 4).

## Discussion

A large difference was found between Fig. 2a and b. The reason for this result is the statistical method applied to the water consumption data. The distribution of water consumption is described not by a continuous distribution but by the Poisson distribution, which is a discrete distribution. According to this Poisson model with a mean value of 1.31 glass/day, 27% of the people do not drink tap water. For this reason, the 2.5 percentile (lower 95% confidence interval (CI) boundary) of the yearly risk of infection is zero as shown in Table 2. Because 27% of people do not drink tap water at all, their E. coli dose is zero and the probability of infection is also zero for this fraction of non-consumers. This zero value of water consumption drastically affects the variabilities in the E. coli dose. This result is mathematically quite reasonable. As described above, a difference in the inactivation efficacy by AOP is as large as 6.1 log10 (8.59–2.49 log10). However, a difference between zero and non-zero is much larger than 6.1 log10. This is the reason that led to the difference found in Fig. 2a and b.

If the fraction of non-consumers is large, so is the impact of water consumption. The tap water consumption data obtained in the Netherlands can be described by a Poisson distribution with a mean value of 0.706 glass/day.6 According to this Poisson model, 48% of people do not drink tap water. In this case, it was found that the water consumption has a higher contribution to the variance of the E. coli dose and the yearly risk of infection when compared to Fig. 2b and Supplementary Figure 1 (data are not shown).

Sensitivity analyses therefore indicate that water consumption has the highest contribution to the E. coli dose. The effects of the statistical models applied to the water consumption data on the E. coli dose and the yearly risk of infection were further examined using a different distribution.

The properties of water consumption data and the choice of a fitting statistical distribution have been discussed before by Mons et al.6 For QMRA, they recommended that a Poisson distribution be used rather than an exponential distribution to describe the variability of consumption in the Netherlands or that the data themselves be used. However, researchers have so far focused only on selecting a distribution that shows an adequate fit to consumption data. On the other hand, the above sensitivity analysis indicated that water consumption including zero value has the highest contribution to the E. coli dose. We need to realize that a difference between zero and non-zero is larger than a difference on logarithm scale.

Figure 2c obtained with applying the exponential distribution was very different from Supplementary Figure 1 obtained with applying the Poisson distribution. In addition, Supplementary Figure 1 obtained with including non-consumers was very different from Supplementary Figure 3 and Fig. 2d that were obtained with adding 1 and 10 mL, respectively, to the water consumption value. These results demonstrate that zero value of water consumption greatly affects the result of the sensitivity analysis for the yearly risk of infection.

Consequently, from the sensitivity analysis for the yearly risk of infection shown in Fig. 2c, AOP was identified as the process that most affects the yearly risk of infection. On the other hand, if the Poisson model was uncritically applied, it would lead to misunderstanding that the treatment step of AOP is not so important as compared to unboiled water consumption (as shown in Supplementary Figure 1).

This study demonstrated that the statistical methods used to analyze water consumption data greatly impact sensitivity analysis, although they do not have large effects on the probability of infection. The purpose of the sensitivity analysis is to identify critical control points within the system and to prioritize data collection and research in the future. It should be noted that the reliability of sensitivity analysis can be compromised if inappropriate statistical methods are used for analyzing water consumption data. To avoid this problem, it is preferable to apply a continuous model, such as the exponential model, rather than a discrete model, such as the Poisson model, to describe the variability in water consumption.

In this study, sensitivity analysis performed on the concentration of E. coli in treated water identified AOP (among the six treatment steps) as the process that most affects the concentration of E. coli. In contrast, sensitivity analysis performed on the E. coli dose showed that water consumption has the highest impact. This result is due to the use of a Poisson model for describing the distribution of water consumption, which suggests that 27% of people do not drink tap water at all.

It was found that the statistical methods used to analyze the water consumption data have a great impact on sensitivity analysis. It should be noted that the statistical methods used may devalue the results of the sensitivity analysis. To avoid this problem, it is preferable to apply a continuous model, such as the exponential model, rather than a discrete model, such as the Poisson model, to describe the variability in water consumption.

This study confirmed that reliable AOP inactivation is imperative for the reliable production of safe drinking water by the water treatment process developed for reducing chlorinous odor.

## Methods

### Developed water treatment process

Because of customers’ complaints about chlorinous odor in drinking water, water utilities and researchers have started seeking new water treatment processes to improve water quality. Echigo et al.10 have proposed a new process for reducing chlorinous odor even after chlorination. The treatment steps are as follows: coagulation and sedimentation, RSF, AOP with ozone and ultraviolet light, ion exchange (cation and anion exchange treatments), and chlorination. This process is a hypothetical one that has been developed using a pilot-scale plant installed at the K Water Purification Plant in the Kansai Region of Japan. The treated water can meet the target threshold odor number of 4, which is acceptable to consumers.11

### Procedure of QMRA

Zhou et al.12 conducted a QMRA to estimate infection risk in the drinking water treated with the above process. The following is the brief description of the QMRA procedure employed.

C. jejuni was selected as the target pathogen for estimating consumers’ infection risk. E. coli was used as a surrogate for C. jejuni in the treatment process. The validity of E. coli as a surrogate is discussed below in detail. The procedures for estimating the removal and inactivation efficacies of the six treatment steps are also described below. After estimating the overall removal efficacy of the six-step process, the concentration of E. coli in treated water was calculated by multiplying the overall removal efficacy with the concentration of E. coli in source water. The daily exposure (or dose, expressed as E. coli/day) was calculated by multiplying the estimated concentration of E. coli in treated water with the amount of unboiled drinking water consumed per day. Unboiled water consumption data were provided by Osaka City Waterworks Bureau, Japan.13 Details of the data are described in the next section. The calculated E. coli dose (E. coli/day) was then converted to an estimated dose of the target pathogen C. jejuni (Campylobacter/day) using the C/E ratio in surface water. The daily probability of infection (Pd) was calculated from the C. jejuni dose using a dose–response model. The dose–response relationship of C. jejuni used in this study is explained below in detail. The individual health risk was expressed by the average yearly risk of infection (Py). Assuming a binomial distribution, the yearly risk of one or more infections was calculated as Py = 1–(1–Pd)365.

#### Target pathogen and its indicator

A total of 86 enteric disease outbreaks associated with EU public drinking water supplies for the years 1990 to 2004 were detected.14 The most predominant pathogen isolated in the outbreaks was Cryptosporidium (46 outbreaks), and the second most predominant pathogen was Campylobacter (9 outbreaks). Although the greatest number of outbreaks implicated Cryptosporidium, Campylobacter outbreaks had the highest mean number of cases per outbreak (1802 cases per outbreak). Thus, Campylobacter is considered to be one of the most important bacteria causing waterborne diseases in several European countries. In Japan, data about health-related incidents caused by microorganisms associated with drinking water, which occurred in the last three decades (1983–2012), were collected.15 The results show that the number of health-related incidents caused by diarrheagenic E. coli was the maximum of 58 cases, the second largest number was 26 cases caused by Cryptosporidium, and the third largest number was 25 cases caused by C. jejuni. Thus, C. jejuni is the second most important pathogenic bacteria after diarrheagenic E. coli in Japan. Based on the above information, C. jejuni was selected as the target pathogen in this study.

In QMRA, stochastic methods are proposed for calculating the concentrations of pathogen in treated water using monitored pathogen concentrations in raw water and estimated treatment efficacy. A large dataset of pathogen concentrations monitored before and after water treatment would be ideal for assessing treatment efficacy. However, it is not easy to measure the concentrations of C. jejuni, and often, the concentrations of C. jejuni in source water are below detection limits. For pathogenic bacteria such as C. jejuni, indicator bacteria such as E. coli and enterococci have been proposed as process indicators for assessing the elimination capacity of water treatment processes.16 In this study, E. coli is present in source water at concentrations greater than that of C. jejuni. It can be detected further down the treatment train. In addition, the fact that E. coli is more frequently measured for legislative purposes makes the data valuable for assessing treatment efficacy.

Also, E. coli was chosen because a water treatment process can both inactivate E. coli and C. jejuni to a similar extent, as well as remove them. As their sizes are similar, the same removal efficacy of coagulation–sedimentation for these bacteria can be assumed. Hijnen et al.17 have reported that the removal of E. coli is slightly more effective as compared with that of Campylobacter in water environment by RSF. Moreover, E. coli and C. jejuni are inactivated to a similar extent by ozonation, while with UV disinfection, the inactivation rate of E. coli is smaller than that of C. jejuni.18,19 Hence, it is sufficiently safe to use E. coli as a surrogate for C. jejuni when assessing the inactivation efficacy of AOP (O3 and UV). For ion-exchange treatment, bacteria were considered to be removed by ion exchange and adsorption on resin. As E. coli and C. jejuni have similar cell sizes and negative surface charges,20,21 it may be reasonable to assume the comparable removal efficacy of ion exchange for these two bacterial species. Furthermore, Vidar et al.22 have reported that E. coli and C. jejuni are inactivated to a similar extent by chlorination. Hence, E. coli was selected as a surrogate for C. jejuni.

#### Application of the type of distribution

PDFs were selected for describing distributions of the concentrations of E. coli in the source water; removal and inactivation efficacies by coagulation–sedimentation, RSF, AOP, cation exchange, anion exchange, and chlorination; the C/E ratio; and consumption of unboiled drinking water. Typically, extreme events can dominate the average health risk. Hence, the PDF should fit the extremes (tail) of the observed variations. From the point of emphasizing the fit to rare events, the results obtained from the AndersonDarling test were more pronounced as compared to those obtained from the chi-square test and Kolmogorov–Smirnov test when selecting a distribution type. Crystal Ball 7® (Oracle Corporation) was used for selecting parametric PDFs fitted to variables. The selected PDFs and estimated parameters are summarized in Table 3.

#### Removal efficacies of the treatment steps

Removal and inactivation efficacies were estimated for the above six treatment steps. Table 3 summarizes the estimated treatment efficacies with the methods to quantify such as a survey at the actual full-scale treatment plant, pilot-scale experiments and laboratory-scale experiments.

For the first step of coagulation–sedimentation in the treatment train, direct measurements of an indicator before and after treatment provided a direct estimate of treatment efficacy. Hence, the concentrations of E. coli in the source water, as well as the removal efficacy of coagulation–sedimentation, were determined by a survey at an actual water treatment plant (K Water Purification Plant). The source of this water treatment plant is the Yodo River. From November 2009 to January 2014, the concentrations of E. coli in the source water and in the water treated by coagulation–sedimentation of the K Purification Plant were simultaneously measured 35 times. Concentrations of E. coli in the source water (E. coli/100 mL) was described by the lognormal distribution with the parameters of μ = 1526 and σ = 26650 as shown in Table 3.　The removal efficacy of coagulation–sedimentation was calculated from the concentrations of E. coli in the influent and effluent. As a result, the removal efficacy (log reduction) of coagulation–sedimentation was described by the gamma distribution with the parameters of α = 49.67, β = 0.06 and L = – 0.37 (see Table 3).

The treatment process of the K Purification Plant consists of the following steps: coagulation–sedimentation, primary ozonation, RSF, secondary ozonation, granular activated carbon adsorption, and chlorination. Since almost all bacteria is inactivated by the primary ozonation, it is not expectable that E. coli can be detected in the water treated with RSF. In addition, as reported in literature, the removal efficacy of RSF is supposed to be less than those of other treatment processes. For example, 12 experimental studies have indicated that the mean elimination capacity (MEC) of bacteria such as E. coli, coliforms, and fecal streptococci is only 0.6 log10, with a range of 0.1 log10 to 1.5 log10.2 Hence, in this study, the removal efficacy of RSF was tentatively set according to a literature value. Hijnen and Medema16 summarized removal efficacies of bacteria by full-scale RSF. RSF can be applied under three different conditions, possibly influencing the MEC: a filter bed without a preceding coagulation or a filter bed in combination with a preceding coagulation/flocculation, either as a secondary floc-removal process or with in-line coagulation. The decimal elimination capacity (DEC) of RSF without a preceding coagulation was 0.5 log10, with a range of 0.2 log10 to 1.0 log10. With a preceding coagulation/sedimentation process, the DEC of RSF increased to a value of 0.9 log10, with a range of 0.4 log10 to 1.5 log10. With in-line coagulation, the removal efficacy increased again with 0.5 log10 to 1.4 log10. The DEC-values of RSF with a preceding coagulation/sedimentation were collected from five studies. The variation in DEC of the different studies is not (only) caused by the differences in conditions of these separate studies but (also) by the accumulated variations in conditions of the processes, micro-organisms and analytical methods. Based on the above background, in this study, the triangular distribution was tentatively constructed with a maximum of 1.5 log10, an MEC of 0.9 log10, and a minimum of 0.4 log10 as shown in Table 3. It is obvious that it is preferable to obtain the removal efficacy under the actual conditions hereafter.

After one or more treatment processes, the concentrations of the indicator in the treated water is typically too low to be determined by routine microbiological measurement methods. In addition, typically, dosing microorganisms to full-scale water treatment processes is not allowed and feasible. An alternative involves conducting dosing experiments at the pilot or laboratory scale under controlled conditions, which mimic full-scale conditions. In this study, dosing tests were conducted for treatment steps of cation exchange, anion exchange, and chlorination. Regarding AOP, a series of pilot-scale experiments and numerical simulation were performed.

With respect to AOP, E. coli dosing experiments were conducted 14 times in total using a pilot-scale AOP bubble-diffuser contactor.23 The inactivation efficacies of AOP under full-scale conditions with an ozone injection dose of 0.25 mg/L were estimated by an axial dispersion reactor model (ADR model). A simplified full-scale O3/UV contactor was assumed based on an actual ozone bubble-diffuser contactor installed at the K Purification Plant. The cylindrical reactor, where water with dissolved ozone flows, was assumed to be 5.64 m in diameter and 5.9 m in length. A 5.9 m long UV lamp was placed along the center axis of the reactor. The operating conditions are as follows: water flow rate, 3090 m3/h; mean hydraulic residence time, 2.86 min; gas flow rate, 464 m3/h; and gas to liquid ratio, 0.15. The reactor was designed to operate under continuous-flow and counter-current conditions. UV fluence of 220 mJ/cm2 was set to the value of the UV lamp installed in the pilot-scale O3/UV contactor.

As a result, a maximum value of 8.59 log10, an MEC of 3.43 log10, and a minimum value of 2.49 log10 were obtained as the inactivation efficacies. Using these parameters, the triangular distribution was constructed as shown in Table 3. A difference in the inactivation efficacy by AOP was 6.1 log10 (8.59–2.49 log10) that was much larger than those in other five treatment steps as shown in Table 3.

The removal efficacies of ion exchange treatment were estimated by experiments using a laboratory column. For cation exchange, a Na+ form of a cation exchange resin (Mitsubishi Chemical, Tokyo, Japan; DIAION UBK16) was used. For anion exchange, a Cl form of an anion exchange resin (Mitsubishi Chemical, Tokyo, Japan; DIAION PA308) was used. Although the K Purification Plant has not installed ion exchange treatment yet, the ion exchange resins used in our study have been widely employed for water treatment. Each ion exchanger was packed in two glass columns (φ40 × 500 mm) in series with a total length of 1 m, corresponding to the same length of a full-scale contactor. A water flow rate was 106 mL/min, and a contact time was 4.74 min. A linear velocity was 5.04 m/h that mimicked the one of ion exchange treatment under the normal full-scale condition.

Cultured E. coli suspended in 5 L of RSF water (with a target concentration of 103 CFU/mL) was continuously fed to the glass columns packed with an ion exchanger. E. coli dosing tests were repeated for each of 14 times for cation exchange and anion exchange. As a result, the removal efficacies of cation exchange and anion exchange were described by the triangular distributions with a maximum of 0.96 log10, an MEC of 0.13 log10, and a minimum of −0.39 log10, and with a maximum of 2.21 log10, an MEC of 1.62 log10, and a minimum of 1.06 log10, respectively (see Table 3). As can be suggested from the above dosing test procedure, the obtained removal efficacies indicate the maximum ones of freshly regenerated ion exchange resins.

The inactivation efficacy of chlorination was determined by conducting E. coli dosing tests in a pilot-scale chlorination reactor installed in the K Purification Plant. The reactor has four contact chambers with a total volume of 0.53 m3. At a flow rate of 0.035 m3/min, the average hydraulic residence time is 15 min. The influent for this pilot plant was the water after RSF treatment at the K Purification Plant. The E. coli spiking solution was continuously injected into the influent water. As it is desirable to decrease the concentrations of residual chlorine in the supplied water in the future, the inactivation efficacy of chlorination was estimated for a case where the residual chlorine level was minimized to approximately 0.1 mg/L. Spiking experiments were conducted nine times in total. As a result, the triangular distribution with a maximum of 5.83 log10, an MEC of 4.03 log10, and a minimum of 3.44 log10 was constructed as shown in Table 3.

#### Ratio of C. jejuni to E. coli (C/E ratio)

E. coli and C. jejuni concentrations in the river water were measured from December 2011 to January 2014 and the ratio of C. jejuni to E. coli (C/E ratio) was calculated. A total of 24 C/E ratios were obtained and no significant seasonal trend was observed. The C/E ratio was described by the lognormal distribution with μ = 4.81 × 10−3 and σ = 0.394 as shown in Table 3.

#### Dose–response relationship of C. jejuni

The dose–response relationship of C. jejuni has been proposed and discussed.24,25 Teunis et al.26 have reported that a dose–response relationship of C. jejuni can be expressed by the beta–Poisson model, where α = 0.024 and β = 0.011. Although the beta–Poisson approximation should retain the criteria of β ≥ 1 and α ≤ β, the above α and β values do not satisfy these criteria. Actually, when the aforementioned beta–Poisson model was applied, notably, the beta–Poisson model can exceed the maximum risk curve at low doses,2 implying that the dose–response model predicts a theoretically impossible probability of infection. Hence, the beta–Poisson model is not appropriate for calculation. Alternatively, the exact beta–Poisson model can be approximated for low doses (<0.1 organisms/L) by setting the γ value of the exponential model equal to the expected value of the beta distribution (α/α + β), thus avoiding this complication. Hence, the beta–Poisson model is approximated by the exponential model (Pd = 1−exp(−0.686 × D), where D is the dose) with γ = 0.686, which was used in this study. Itoh27 examined the impact of using the maximum risk curve or the Beta–Poisson model by the uncertainty analysis.

#### Monte Carlo simulation and calculated risk

The parametric distributions of the concentrations of E. coli in the source water, removal and inactivation efficacies of each process, consumption of unboiled drinking water, and C/E ratios in surface water were expressed by PDFs as summarized in Table 3. A Monte Carlo simulation was performed by drawing random values from each PDF for calculating the yearly infection risk (Py). Correlations between the variables were not assumed in the simulation. Crystal Ball 7® (Oracle Corporation) was used for performing Monte Carlo simulation.

By performing the simulation, the mean overall log reduction by the water treatment and the mean yearly infection risk were estimated to be 13.0 log10 and 3.16 × 10−10 infection/person/year, respectively, as shown in Table 2. This infection risk is far below the acceptable yearly risk of infection of 10−4 infection/person/year proposed by the United States Environmental Protection Agency.28 Thus, the newly developed water treatment process for reducing chlorinous odor was demonstrated to produce safe drinking water with respect to the elimination of C. jejuni, even with a minimized dose of chlorine. On the other hand, the aim of this study is to investigate the theoretical impact of the statistical methods used to analyze the water consumption data.

### Unboiled water consumption data

A water consumption of 2 L/day has been widely accepted when setting standard and guideline values for toxic chemicals.3 It should be noted, however, that only the consumption of cold tap water without heat treatment should be considered for estimating microbial risk.

Early QMRAs had assumed a water consumption of 2 L/day.29,30 Other cases have used the provisional value of 1 L/day, a constant volume, such as in the WHO Guidelines for Drinking-water Quality.3 In Japan too, 1 L/day had been recommended as a conservative value.31

It has since been understood that statistical distributions describing the variability of water consumption within a population are preferable for performing a QMRA.2,32 Mons et al.6 after reviewing the different studies on tap water consumption conducted mainly in western countries, demonstrated that the reported mean consumption of cold tap water varies greatly—between 0.1 L/day and 1.55 L/day. Therefore, they recommended that country-specific data and statistical distributions be used in assessing water consumption for conducting QMRA.

Only a few studies in Japan have reported unboiled water consumption.13,33 Our study used water consumption data obtained from the Osaka City Waterworks Bureau in 2009.13 They conducted an internet questionnaire from Saturday, 19 December 2009 to Monday, 21 December 2009 that drew 600 respondents (297 males and 303 females) between ages 15 and 74, living in Osaka City. The respondents were asked to report their water consumption for each of these four categories: (1) cold tap water/Japanese tea prepared using cold tap water/powdered juice dissolved using cold tap water (2) alcoholic drinks diluted using cold tap water/medicines, etc. taken with cold tap water (3) ice cubes made using cold tap water (4) cold tap water served with meals at restaurants.

The participants recorded the volume of water (in milliliters) they consumed each day, viewing several pictures of typical drinking vessels for properly estimating the volume. Consumption data in milliliters for the three days including Saturday and Sunday were converted to average consumption in milliliters per day in a week.

The total average consumption was calculated to be 327 mL/day with (1) 119 mL/day, (2) 38 mL/day, (3) 14 mL/day, and (4) 156 mL/day reported for the above four categories. The maximum consumption was 2400 mL/day. The ratio of non-consumers (zero mL/day of consumption) to consumers for the three days was 8.2% (49 persons).

According to a nationwide survey conducted in Japan from June to August in 2000,33 the average consumption was 321 mL/day, which is comparable to the 327 mL/day obtained from the above study. Cold tap water consumption is expected to be higher in summer than in winter. However, factors influencing water consumption, such as the seasons, have not been analyzed in detail.

### Sensitivity analysis

For the simulation model that contains a series of steps, sensitivity analysis is performed for identifying components or variables within the simulation model that are most important to the outcome. Hill34 and Frey and Patil35 have reviewed and summarized the methodologies of sensitivity analysis. There are mathematical, statistical, and graphical methods available. In this study, since the mean yearly infection risk (3.16 × 10−10 infection/person/year) was sufficiently low, it is not necessary to find a critical control point for producing safe drinking water. Therefore, the purpose of sensitivity analysis is to examine the importance of variations in model variables. Spearman’s rank correlation coefficients between the assumed and predicted variables were computed. Contribution to variance was calculated by taking the square of the rank correlation coefficients and normalizing the values to 100%. Contribution to variance corresponds to sensitivity, with values ranging from zero to 100%; this result indicates the relative importance by demonstrating the percentage of the variance of the predicted variable that each model variable contributes to.

The variables tested in the sensitivity analysis were the reduction efficacies of the six treatment steps, concentration of E. coli in the source water, C/E ratio, and unboiled water consumption. The different purposes of sensitivity analysis were described in 'Introduction'. If the impact of a certain step among the six treatment steps is great, it can be indicated that the reliability of the treatment step should be improved. If the impact of concentration of E. coli in the source water or C/E ratio is great, priority should be given to data collection of E. coli or C. jejuni in the source water. From a different point of view, source water protection might be stressed. In general, there are few studies on unboiled water consumption. If no country specific data are available, Mons et al.6 recommend to use the Australian distribution data from the Melbourne study as a conservative estimate. If the impact of unboiled water consumption is great, the result would highlight the importance of obtaining country specific consumption data and statistical distributions in order to develop sound local QMRA models. In addition, a water consumption study should be properly designed to estimate accurate consumption volume and account for the variability in consumption within a population.

Sensitivity analysis in a QMRA would be normally performed only on yearly risk of infection, the final estimate of the simulation. In this study, however, the impact of the variables on each estimate in the simulation model has to be discussed. Therefore, the sensitivity for the overall removal efficacy of the six treatment steps, concentration of E. coli in treated water, E. coli dose, and yearly risk of infection were analyzed. Crystal Ball 7® (Oracle Corporation) was also used for performing sensitivity analysis.

### Data availability

Any raw data used in this manuscript can be freely obtained by contacting the corresponding author.