Insights to estimate exposure to regulated and non-regulated disinfection by-products in drinking water

Background Knowledge about human exposure and health effects associated with non-routinely monitored disinfection by-products (DBPs) in drinking water is sparse. Objective To provide insights to estimate exposure to regulated and non-regulated DBPs in drinking water. Methods We collected tap water from homes (N = 42), bottled water (N = 10), filtered tap water with domestic activated carbon jars (N = 6) and reverse osmosis (N = 5), and urine (N = 39) samples of participants from Barcelona, Spain. We analyzed 11 haloacetic acids (HAAs), 4 trihalomethanes (THMs), 4 haloacetonitriles (HANs), 2 haloketones, chlorate, chlorite, and trichloronitromethane in water and HAAs in urine samples. Personal information on water intake and socio-demographics was ascertained in the study population (N = 39) through questionnaires. Statistical models were developed based on THMs as explanatory variables using multivariate linear regression and machine learning techniques to predict non-regulated DBPs. Results Chlorate, THMs, HAAs, and HANs were quantified in 98–100% tap water samples with median concentration of 214, 42, 18, and 3.2 μg/L, respectively. Multivariate linear regression models had similar or higher goodness of fit (R2) compared to machine learning models. Multivariate linear models for dichloro-, trichloro-, and bromodichloroacetic acid, dichloroacetonitrile, bromochloroacetonitrile, dibromoacetonitrile, trichloropropnanone, and chlorite showed good predictive ability (R2 = 0.8–0.9) as 80–90% of total variance could be explained by THM concentrations. Activated carbon filters reduced DBP concentrations to a variable extent (27–80%), and reverse osmosis reduced DBP concentrations ≥98%. Only chlorate was detected in bottled water samples (N = 3), with median = 13.0 µg/L. Creatinine-adjusted trichloroacetic acid was the most frequently detected HAA in urine samples (69.2%), and moderately correlated with estimated drinking water intake (r = 0.48). Significance Findings provide valuable insights for DBP exposure assessment in epidemiological studies. Validation of predictive models in a larger number of samples and replication in different settings is warranted. Impact statement Our study focused on assessing and describing the occurrence of several classes of DBPs in drinking water and developing exposure models of good predictive ability for non-regulated DBPs.


INTRODUCTION
Water disinfection is a necessary public health intervention to prevent waterborne infections.However, unintended disinfection by-products (DBPs) are formed during chemical disinfection processes [1].DBPs occur in complex mixtures, and their relative concentrations depend on the characteristics of organic matter in the raw water, the treatment and disinfectant used, and the length and condition of the distribution system [2][3][4].More than 600 DBPs have been identified to date, constituting a widespread exposure in the population worldwide through drinking water consumption, inhalation, and dermal contact [4].Long-term exposure to DBPs has been consistently associated with increased bladder cancer risk [5].DBP exposure also has been associated with a number of reproductive and pregnancy outcomes, although evidence is less consistent [4].
The current state of knowledge about the health effects linked to DBP exposure mostly relies on regulated DBPs.The EU currently regulates total trihalomethanes (THMs) and bromate in finished drinking water, although new regulations will be enforced from 2023 to incorporate haloacetic acids (HAAs), chlorite, and chlorate [6].Epidemiological research on emerging or non-regulated DBPs is limited in a large extent by the lack of adequate routine monitoring data necessary to evaluate exposure in human studies.However, regulated DBPs are a minor fraction of total halogenated DBPs [7], and may not be the primary drivers of toxicity [8].Epidemiological studies have mainly evaluated trihalomethanes (THMs) and, to a lower extent HAAs [4].THMs have been typically used as DBP markers for association analyses of human health effects, although one can argue that they might not necessarily be the causal agents [9].Among the nonvolatile HAAs, trichloroacetic acid (TCAA) received increased attention as a proxy DBP biomarker due to significant correlations reported between TCAA concentrations in urine and ingested TCAA from drinking water [10][11][12][13].However, there is limited knowledge about other urinary HAAs.
A better understanding of the health effects associated with DBP exposure requires the evaluation of a range of DBP classes in addition to THMs [14].The lack of adequate biomarkers reflecting long-term exposure forces epidemiologists to use water concentrations as the main component of exposure assessment, together with modeling approaches to estimate historical THM concentrations [15].A number of studies have developed predictive models of THMs based on water parameters [16].However, the use of models to predict non-regulated DBPs in finished drinking water with exposure assessment purposes has not been explored, to our knowledge.
We aimed to provide insights to estimate exposure to a wide range of DBPs in drinking water in Barcelona (Spain), by (1) describing occurrence in tap and bottled water; (2) developing statistical models to predict non-regulated DBPs based on routinely monitored parameters in the public water supply; (3) evaluating the effect of domestic filters on tap water concentrations; and (4) exploring the use of DBPs in urine as biomarkers of exposure though drinking water.Findings are potentially applicable for exposure assessment in epidemiological studies to evaluate health effects associated with non-regulated DBPs.

MATERIAL AND METHODS Study area
Barcelona city and the metropolitan area (North-East Spain) are located in a coastal area in the Mediterranean sea characterized by dry weather, whose main drinking water supply rely on surface sources (Llobregat and Ter rivers).The Llobregat river is severely impacted by anthropogenic activities, and contains a higher bromide concentration (range = 2.5-10 mg/L) compared to the Ter river (range = 0.5-5 mg/L) [17], which leads to the predominance of brominated THMs in drinking water [18].Although historically high concentrations of total THMs [19] have been dramatically reduced after incorporating membrane-based technology in the drinking water plants, there is still a relative predominance of brominated species [18].

Study participants and data
We aimed to enroll volunteers living in 42 locations (one per postal code) to represent the geography of Barcelona.Participants were reached through advertisements in social media and were contacted via email.A brief online screening questionnaire including the postal code of residence and type of water consumed was used to create a roster of potential volunteers.We recruited 39 volunteers and conducted home visits to collect urine and tap water samples between August 31st and October 16th of 2020.For 3 postal codes we failed to identify volunteers thus we collected drinking water samples from public fountains during the same period.Among the 39 volunteers, N = 11 used domestic filters.Gender balance was also used as a secondary selection criterion, in order to enroll both men and women.Participants provided written consent prior to voluntary participation.Personal information (sociodemographic, anthropometrics, lifestyle) and drinking water consumption habits (source, amount) were collected through a self-administered online questionnaire.We semi-quantitatively ascertained the amount of bottled water, unfiltered tap water, and filtered tap water consumed at home and outside (≤1, 1, 2, 3-4, 5-6, >6 glasses/day, where 1 glass = 250 mL).The study was approved by the Parc de Salut Mar Ethics committee.

Sample collection
Tap water samples.We collected unfiltered tap water samples at 42 locations, plus filtered tap water samples in a subset of 11 homes: N = 6 activated carbon (pitcher type), N = 5 reverse osmosis filters.Tap water samples (both unfiltered and filtered) were collected in 4 containers: (1) 2.5 L glass bottle for HAAs analysis; (2) 500 mL glass bottle for chlorate and chlorite analysis; (3) 250 mL glass bottle for THMs, haloacetonitriles (HANs), haloketones (HKs), and trichloronitromethane (TCNM) analysis; and (4) 1 L glass bottle for physicochemical parameters analysis.Ascorbic acid was added as quenching agent prior to the collection of the water samples in bottles aimed at quantifying HAAs, THMs, HANs, HKs, and TCNM.Tap water samples were collected after leaving cold water running for 2 min approximately.Bottles without quencher were rinsed twice with tap water on site.Bottles with quencher were slowly filled to the top to avoid air bubbles, an air chamber and quencher loss, and were finally gently shaken for at least 30 s.Samples were transported in a portable cooler with ice packs to the research center, where samples were stored in the refrigerator (≈4 °C) until shipment to the laboratories within 1-4 days.
Bottled water samples.We included samples from 10 brands of natural mineral water selected among the most popular in the area.We purchased 1.5 L polyethylene terephthalate (PET) bottles at local supermarkets, that were transported at room temperature to the laboratory.
Urine samples.First morning-void spot urine samples were collected from 39 volunteers, on the same day that the tap water samples were collected.Participants received the container in advance together with written instructions to self-collect urine samples on the day of the home visit.Urine samples were collected in a 70-mL sterile plastic container and were placed in the fridge until the visit of study personnel.Urine samples were transported at ≈4 °C to the research center and stored at −20 °C until the analysis at the end of enrollment.

Laboratory analyses
Details about analytical methods are in the Supplementary Information (SI).Analytical methodologies and limits of quantification (LOQ) and detection (LOD) are summarized in Table S1 for the different analytes in drinking water and urine.LOQs of DBPs in water ranged between 0.1 µg/L (THMs, HANs, HKs, trichloronitromethane) and 10 µg/L (chlorate, chlorite), and LODs of HAAs in urine were in the range between 0.02 µg/L (TCAA) and 3.98 µg/L (iodoacetic acid) (Table S1).Drinking water samples were analyzed for 11 HAAs, 4 THMs, 4 HANs, 2 HKs, TCNM, chlorate and chlorite.Chlorate and chlorite were measured directly and HAAs were preconcentrated by online solid phase extraction (SPE).HAAs, chlorite and chlorate were analyzed by tandem mass spectrometry coupled to liquid chromatography (LC-MS/MS).Specifically, HAAs were analyzed according to the method developed by Planas et al. with some modifications [20].Analysis of THMs, HANs, HKs, and TCNM were performed by liquid-liquid salted microextraction and gas chromatography (GC Trace 1300, Thermo Fisher Scientific) coupled to a mass spectrometer (GC-MS/MS, Thermo Fisher Scientific).
Urine samples were only analyzed for 11 HAAs with the aim to examine their biomarker potential for exposure assessment in epidemiological studies.HAAs were analyzed using off-line SPE and LC-MS/MS based on the methods previously developed [21,22].Urinary creatinine was determined using an automated alkaline picrate method [23].The limit of detection was 2.9 mg/dL.We divided the concentrations of HAAs in urine samples by the creatinine concentrations to adjust for the urinary concentration (reported as μg/g creatinine).
For all LC-MS/MS analyses, a TSQ quantum triple quadrupole mass spectrometer equipped with an electrospray ionization (ESI) source (Thermo Fisher Scientific, San Jose, CA, USA), a Finnigan Surveyor MS plus pump and a HTC PAL autosampler were used.The analyses were carried out in negative ion electrospray and multiple reaction monitoring acquisition mode (MRM).The spray voltage was chosen at 3.0 kV and the tube lens voltage and collision energy were optimized for each m/z and for each transition, respectively.The ion transfer tube temperature was set at 250 °C.Nitrogen was used as a sheath and auxiliary gas at flow rates of 65 psi and 15 arbitrary units (a.u.), respectively.The argon gas collisioninduced dissociation was used with a pressure of 1.5 millitorr (mTorr).Data acquisition was performed with Xcalibur 2.0.7 software (Thermo Fisher Scientific).
Quantification and quality control measures to comply with the 2002/ 657/EC Commission Decision [24] are described in detail in the SI.All chemicals were measured in all drinking water types, except for THMs, HANs, HKs and TCNMs, which were not analyzed in bottled water because of the low THM levels detected in bottled water in a previous study [25].
More information about the analytical procedure including physicochemical parameters and reagents are detailed in the SI.

Statistical analysis
Descriptive analyses.Maximum, percentiles, mean, and standard deviation (SD) were calculated for measurements >LOQ.The bromine incorporation factor (BIF) was calculated for THMs (1) and HAAs (dihalogenated species (DXAAs) (2) and trihalogenated (TXAAs) (3)) to assess the molar contribution of the brominated species with the following equations (details provided in the SI): Normalized BIF was calculated by dividing BIF by the number of halogen substituents.Spearman rank correlation coefficients were calculated to evaluate the degree of correlation between individual DBPs as well as between ingested TCAA and urine levels.A principal component analysis (PCA) was performed to describe and reduce the dimensionality of the different DBP classes.Samples (water, urine) with concentrations <LOQ were assigned LOQ/2 to estimate correlations and the PCA.
Multivariate predictive models.We used linear regression and machine learning to develop models predicting non-regulated DBPs based on routine monitoring parameters.Linear regression models were based on 4 THM species (trichloromethane: TCM; bromodichloromethane: BDCM; dibromochloromethane: DBCM; and bromoform: TBM) as independent variables.Conductivity was not considered due to its high correlation with THMs.For each DBP and each transformation of the independent variables (no transformation, log, square root, squared) we performed 15 variations of linear regression models within the possible combinations of independent variables (4 simple models, 11 multiple models).We selected the best model for each DBP and each transformation based on the highest R-squared (R 2 ) and variance inflation factor (VIF) lower than 10 to avoid multicollinearity.As a next step, we used 5-fold cross validation as a method to estimate the prediction accuracy of these models and selected the final linear models based on the highest coefficient of determination (R 2 ), narrower confidence interval (95% CI) and lower Root Mean Squared Error (RMSE) for each DBP.
Super learner (SL) modeling is a machine learning method and prediction technique that combines several individual predictive algorithms (library of algorithms) into a new individual model: a weighted combination (ensemble).Separate models were built to predict DBPs concentrations using fivefold cross-validated SL based on the 4 THMs, conductivity, pH, and geocodes as explanatory variables.SL modeling was developed with 3 different cross-validated models using different individual algorithms: Model 1 = algorithm library including generalized linear model, Bayesian GLM, random forest (from 'random forest' and 'ranger' packages), multivariate adaptive regression splines, local polynomial regression, neural network, adaptive polynomial splines; Model 2 = same as Model 1 plus Random Forest algorithm modification; Model 3 = same as Model 2 plus additional screening algorithms for the input variables.For each DBP, models with the highest R 2 , narrower 95% CI and lower RMSE were selected for comparisons with linear regression models.
Effect of domestic filters on DBPs concentrations in tap water.Average concentrations before and after filtration were compared using paired ttests, after checking the normality of the resulting difference with the Shapiro-Wilk test.Log or square root transformation was necessary for some of the variables to meet the assumption of normality.The homogeneity of the variances was evaluated for each variable and considered in the paired t-test.The average percentage change was calculated as the after-before difference in the concentration relative to the average concentration before filtration.
Estimated DBP ingestion.We identified the primary source of drinking water at home and estimated residential DBP exposure by multiplying the volume (in liters) by the concentration of DBPs in the specific type of water consumed.1. Twenty-four participants (60.5%) were female, 14 (36.8%) were male and 1 (2.6%) was non-binary.Mean age and body mass index in the study population were, respectively, 41 years old and 22.7 kg/m 2 .Unfiltered tap water was the drinking water type with the highest mean volume consumed (0.6 L/day) at home, followed by bottled (0.5 L/day) and filtered tap water (0.4 L/day).On average, participants spent 9.2 min/day showering, and 4 participants reported to regularly swim in chlorinated pools.

DISCUSSION DBP occurrence in tap and bottled water
In the present study, a wide range of DBPs were analyzed in drinking water (tap and bottled).Unfiltered tap water is the primary source of human exposure to these chemicals.The patterns of occurrence indicate that although both brominated and chlorinated DBPs were present, brominated species were found in a larger number of samples.Results are in line with previous studies in the study area, that reported higher levels of brominated compared to chlorinated THMs and HAAs in the tap water of Barcelona [3,29].Moreover, our results of high brominated DBPs and THM concentrations are consistent with previous studies that found higher bromide concentrations in water to cause the formation of mainly brominated THMs and reduced formation of HAAs [30].These results are of high importance, because brominated DBPs are reportedly more cytotoxic and genotoxic than chlorinated species and therefore there is a need to minimize the formation of brominated DBPs [1].The median THM (42 µg/L) and HAA (18 µg/L) levels in this study compared to a study conducted in 2010 (median THM = 85 µg/L, median HAA∼35 µg/L, respectively) suggest that concentrations of these two DBP classes halved in Barcelona [29].This can be explained by the technological improvement of the Llobregat drinking water treatment plants, which provides ~50% of the drinking water supply for Barcelona [31,32].Our study shows that current levels of total THMs and HAAs in the tap water of Barcelona are below the new parametric values set by EU Drinking Water Directive (DWD) (2020/2184) for total THMs (100 µg/L) and 5 HAAs (60 µg/L).These parametric values will be implemented by 2023 into national legislation of EU member states and will be legally binding [6].Similar regulatory limits were set by the U.S. EPA for total maximum concentrations of 5 HAAs (MCAA, DCAA, TCAA, MBAA, DBAA) < 60 µg/L and <80 µg/L for total THM concentrations in drinking water [33].
Chlorite and chlorate will also be regulated under the new EU directive with a maximum contaminant level of 250 µg/L (or 700 µg/L where a disinfection method that generates chlorite or chlorate is used).Approximately 25% of the tap water samples in our study contained chlorate levels exceeding 250 µg/L (Table 2).Given that the treatment plants use chlorine dioxide, concentrations are below the 700 µg/L legal threshold applying in this case.Chlorate has been found to cause in vitro mutagenic   effects and to induce thyroid tumors in male rats [1,34].Although adverse human health effects of chlorate have been scarcely investigated, chlorate levels in drinking water have been associated with a higher risk of obstructive urinary defects, cleft palate and spina bifida in newborns [35].Chlorate is very persistent and previous studies highlight that only reverse osmosis has been recognized to effectively remove it from drinking water [36].On the other hand, chlorate was detected in three out of ten analyzed samples (mean = 18.9 µg/L) of popular Spanish bottled water samples.Our results showed that chlorate levels in bottled water were approximately one order of magnitude lower than in tap water samples.Other studies reported higher detection rates but lower concentrations of chlorate in bottled water, for instance, in 71.4% (15/21) of samples from the U.S. (min = 0.2 µg/L, max= 5.8 µg/L) [37] and in 90% (9/ 10) of samples from Japan (mean = 14 µg/L) [38].
Finally, we assessed correlations between DBPs that were the building blocks of the multivariate analysis.Although general patterns were not identified, correlations tended to be stronger and positive between compounds with a similar proportion of equivalent halogenated (chlorine/bromine) substituents, which is consistent with correlations observed in a previous study by Villanueva et al. [3].Chlorate was the DBP that correlated the weakest with other DBPs, except with chlorite, showing an independent behavior from THMs, HAAs, HANs, and TCP, difficult to predict.Individual THMs were moderate to strongly correlated with other individual DBPs.Specifically, at least one individual THM showed significant positive correlations with individual DBPs of other classes except for chlorate.These results are in line with previous studies that reported strong correlations between THMs and HAAs [39,40] as well as between THMs and HANs [41,42].Our results of correlation analyses went beyond previous studies showing high correlations between specific THMs and other DBPs (TCP, chlorite).Results suggested that total THM levels can be a good indicator for levels of other DBPs depending on the right combination of compounds.This finding was the basis for our multivariate models that aimed to develop predictive models for unregulated DBPs using individual or multiple THMs levels.Moreover, statistically significant strong correlations between DBPs and physicochemical parameters may suggest that conductivity, hardness, TOC and pH are important determinants in the formation of specific DBPs, and we can only speculate that these correlations might as well explain differences in the formation of DBPs among waters of different regions.

Multivariate predictive models
We developed linear regression and super learner models to predict 14 individual unregulated DBPs based on the routinely monitored THMs.Models for dichloro-, trichloro-, and bromodichloroacetic acid, dichloroacetonitrile, bromochloroacetonitrile, dibromoacetonitrile, trichloropropnanone, and chlorite showed good predictive ability (R 2 = 0.8-0.9)as 80-90% of total variance could be explained by THM concentrations.In contrast, models had R 2 < 0.7, LCI < 0.5 for the remainder DBPs suggesting that these compounds cannot be reasonably predicted based on routine monitoring data.
When comparing models (LM vs. SL), most target compounds (9/18) had a better fit by linear models and 2/18 by super learner models, while 7/18 showed low goodness of fit (R 2 < 0.7; LCI < 0.5).Our results suggest that SL models perform better when predicting TCAA and DBAN.Notably, our study is restricted to data of low dimensionality, but in high-dimensional data, it is proved theoretically that SL will asymptotically outperform LM, since the LM is included in the library of SL algorithms [27,43].
For HAAs, 3/8 individual compounds (DCAA, TCAA, BDCAA) were based on TCM as main explanatory variable, similarly to total chlorinated HAAs, while total brominated HAAs and total HAAs were better explained by multiple THMs.HANs were better predicted by BDCM & DBCM, and other non-regulated DBPs were predicted by various combinations of THMs.Previous studies aimed to predict THMs and HAAs [44,45], however less emphasis was placed on individual compounds [16].Our results go beyond these studies demonstrating the potential to predict a number of individual as well as group-wise concentrations of DBPs based on THMs.Predictive models of DBPs based on routinely monitored parameters are highly applicable in epidemiological research in order to evaluate exposure to non-monitored DBPs using existing records of THMs and other routinely monitored parameters.Although some of the compounds that we considered unregulated, they will be routinely monitored from 2023 onwards under the new EU directive that has been recently adopted [6].Predictive models can be useful in the future with regards to newly emerging DBPs.This study was limited by the small sample size when considering statistical modeling.Nevertheless, our approach would need to be validated to see whether the experimental data fits well with the predicted data in a larger set of samples.Finally, further research is needed in other settings to evaluate the site-specificity of the predictive models.

Effect of domestic filters on DBPs concentrations in tap water
Our findings showed that domestic activated carbon and reverse osmosis filters, in real operating conditions in the general population, removed DBPs from tap water to a variable extent.Activated carbon filters reduced DBP concentrations in the range of 27-80% depending on the class.Previous studies showed that activated carbon filters were able to remove DBPs by ~97% [46][47][48].Our study was conducted in real operating conditions, and the carbon filters were not likely in optimal state of maintenance.Activated carbon has a limited useful life, and as they filter the water they accumulate compounds until they become saturated.It is very important that the manufacturer's instructions are followed and changed frequently.Reverse osmosis filters reduced DBP concentrations in the range of 98-100%, which is consistent with previous studies showing reverse osmosis to be the most efficient method in removing all types of contaminants including DBPs from water sources up to 99% [36,47,48].However, it is important to note that reverse osmosis also remove minerals from drinking water, that may counteract the health benefits of DBP removal considering certain populations or geographical regions [48].Confirmation of our findings in a larger set of samples is warranted.
Although the use of biomarkers to estimate exposure for etiologically relevant periods is hampered by the short half-life of DBPs, urinary TCAA has been used as a proxy DBP biomarker [12,13,51,52] given that half-life (2.1-6.3 days) is longer than consecutive exposure events.Due to its nonvolatile nature, urinary TCAA can potentially inform about the ingested DBP exposure.In this study, we evaluated the relationship between urinary TCAA and ingested TCAA calculated by self-reported at-home drinking water consumption questionnaire resulting in a statistically significant moderate (r = 0.48) correlation.This finding is directly in line with Smith et al. [13] that showed a significant moderate correlation (r = 0.50, p value=0.002) between ingested TCAA from home tap water and TCAA in urine as well as by Zhang et al. [12] showing a significant strong correlation (r = 0.66, p value < 0.001).

Table 1 .
Characteristics of the study population (N = 39) a .
a 1 missing value in body mass index, 2 missing showering time values.b At least 1 cigarette/day or 1 cigar/week in the last 6 months.

Table 2 .
Table3summarizes the 5-fold cross-validated model parameters of linear regression and super learner models for 14 individual Occurrence and concentrations (µg/L) of disinfection by-products (DBPs) in tap water samples (N = 42) above the limit of quantification (LOQ).

Table 3 .
Cross-validated (fivefold) linear regression and super learner models for non-regulated disinfection by-products (DBPs) based on routinely monitored parameters as explanatory variables.
a Transformation of independent variables.b Model 1= algorithm library including generalized linear model, Bayesian GLM, random forest, multivariate adaptive regression splines, local polynomial regression, neural network, adaptive polynomial splines; Model 2 = same as Model 1 plus Random Forest algorithm modification; Model 3 = same as Model 2 plus additional screening algorithms for the input variables.c Brominated HAAs include MBAA, DBAA, and TBAA.d Chlorinated HAAs include DCAA, TCAA, BCAA, BDCAA, and DBCAA.

Table 4 .
Effect of domestic filters on disinfection by-products (DBPs) concentrations in tap water.