## Abstract

Providing reliable environmental quality standards (EQSs) is a challenging issue in environmental risk assessment (ERA). These EQSs are derived from toxicity endpoints estimated from dose-response models to identify and characterize the environmental hazard of chemical compounds released by human activities. These toxicity endpoints include the classical *x*% effect/lethal concentrations at a specific time *t* (*EC*/*LC*(*x*, *t*)) and the new multiplication factors applied to environmental exposure profiles leading to *x*% effect reduction at a specific time *t* (*MF*(*x*, *t*), or denoted *LP*(*x*, *t*) by the EFSA). However, classical dose-response models used to estimate toxicity endpoints have some weaknesses, such as their dependency on observation time points, which are likely to differ between species (e.g., experiment duration). Furthermore, real-world exposure profiles are rarely constant over time, which makes the use of classical dose-response models difficult and may prevent the derivation of *MF*(*x*, *t*). When dealing with survival or immobility toxicity test data, these issues can be overcome with the use of the general unified threshold model of survival (GUTS), a toxicokinetic-toxicodynamic (TKTD) model that provides an explicit framework to analyse both time- and concentration-dependent data sets as well as obtain a mechanistic derivation of *EC*/*LC*(*x*, *t*) and *MF*(*x*, *t*) regardless of *x* and at any time *t* of interest. In ERA, the assessment of a risk is inherently built upon probability distributions, such that the next critical step is to characterize the uncertainties of toxicity endpoints and, consequently, those of EQSs. With this perspective, we investigated the use of a Bayesian framework to obtain the uncertainties from the calibration process and to propagate them to model predictions, including *LC*(*x*, *t*) and *MF*(*x*, *t*) derivations. We also explored the mathematical properties of *LC*(*x*, *t*) and *MF*(*x*, *t*) as well as the impact of different experimental designs to provide some recommendations for a robust derivation of toxicity endpoints leading to reliable EQSs: avoid computing *LC*(*x*, *t*) and *MF*(*x*, *t*) for extreme *x* values (0 or 100%), where uncertainty is maximal; compute *MF*(*x*, *t*) after a long period of time to take depuration time into account and test survival under pulses with different periods of time between them.

## Introduction

Assessing the environmental risk of chemical compounds requires the definition of environmental quality standards (EQSs). EQS are based on several calculations depending on the context and institutions such as predicted-no-effect concentrations (PNECs)^{1} and specific concentration limits (SCLs)^{2}. Specifically, the derivation of EQSs results from a combination of assessment factors with toxicity endpoints mainly estimated from measured exposure responses of a set of target species to a certain chemical compound^{1,2,3,4}. Estimating reliable toxicity endpoints is challenging and very controversial^{5,6}. Currently, the first step of environmental risk assessment (ERA) is the identification of acute effects, which consists of fitting classical dose-response models to quantitative toxicity test data. For acute effect assessment, such data are collected from standard toxicity tests, from which the 50% lethal or effective concentration (*LC*_{50} or *EC*_{50}, respectively) is generally estimated at the end of the exposure period, meaning that not all observations over time are used. In addition, classical dose-response models implicitly assume that the exposure concentration remains constant throughout the experiment, which makes it difficult to extrapolate the results to more realistic scenarios with time-variable exposure profiles combining different heights, widths and frequencies of contaminant pulses^{6,7,8,9}.

To overcome this limitation at the organism level, the use of mechanistic models, such as toxicokinetic-toxicodynamic (TKTD) models, is now promoted to describe the effects of a substance of interest by integrating the dynamics of the exposure^{1,10,11}. Indeed, TKTD models appear highly advantageous in terms of gaining a mechanistic understanding of the chemical mode of action, deriving time-independent parameters, interpreting time-varying exposure and making predictions under untested conditions^{9,10}. Another advantage of TKTD models for ERA is the possible calculation of lethal concentrations for any *x*% of the population at any given exposure duration *t*, denoted *LC*(*x*, *t*). Furthermore, from time-variable concentration profiles observed in the environment, it is possible to estimate a margin of safety such as the exposure multiplication factor *MF*(*x*, *t*), leading to any *x*% effect reduction due to the contaminant at any time *t*^{9,12} (also called the lethal profile and denoted *LP*(*x*, *t*) by^{12}).

When focusing on the survival rate of individuals, the general unified threshold model of survival (GUTS) has been proposed to unify the majority of TKTD survival models^{10}. In the present paper, we consider the two most used derivations, namely, the stochastic death (GUTS-RED-SD) and individual tolerance (GUTS-RED-IT) models. The GUTS-RED-SD model assumes that all individuals are identically sensitive to the chemical substance by sharing a common internal threshold concentration and that mortality is a stochastic process once this threshold is reached. In contrast, the GUTS-RED-IT model is based on the critical body residue (CBR) approach, which assumes that individuals differ in their thresholds, following a probability distribution, and die as soon as the internal concentration reaches the individual-specific threshold^{10}. The robustness of GUTS models in calibration and prediction has been widely demonstrated, with little difference between GUTS-RED-SD and GUTS-RED-IT models^{9,13,14}. Sensitivity analysis of toxicity endpoints derived from GUTS models, such as *LC*(*x*, *t*) and *MF*(*x*, *t*), has also been investigated^{9,13}, but the question of how uncertainties are propagated is still under-studied.

Quantifying uncertainties or levels of confidence associated with toxicity endpoints is undoubtedly a way to improve trust in risk predictors and to avoid decisions that could increase rather than decrease the risk^{15,16,17}. The Bayesian framework has many advantages for dealing with uncertainties since the distribution of parameters and thus their uncertainties is embedded in the inference process^{18}. While the construction of priors on model parameters can be seen as subjective^{19}, it provides added value by taking advantage of information from the experimental design^{13,20}. Consequently, coupling TKTD models with Bayesian inference allows one to estimate the probability distribution of toxicity endpoints and any other predictions coming from the mechanistic (TKTD) model by taking into account all the constraints resulting from the experimental design. Moreover, Bayesian inference, which is particularly efficient with GUTS models^{13,20}, can also be used to optimize the experimental design by quantifying the gain in knowledge from priors to posteriors^{21}. Finally, Bayesian inference is tailored for decision making as it provides assessors with a range of values rather than a single point, which is particularly valuable in risk assessment^{16,19}.

In the present study, we explore how scrutinizing uncertainties helps provide recommendations for experimental design and the characteristics of toxicity endpoints used in EQSs while maximizing their reliability. We first give an overview of TKTD models, with a focus on the GUTS^{10} to derive EQS explicite equations. We then illustrate how to handle GUTS models within the R package *morse*^{22} with five example data sets. Then, we explore how a variety of experimental designs influence the uncertainties in derived *LC*(*x*, *t*) and *MF*(*x*, *t*). Finally, we provide a set of recommendations on the use of TKTD models for ERA based on their added value and the way the uncertainty may be handled under a Bayesian framework.

## Material and Methods

### Data from experimental toxicity tests

We used experimental toxicity data sets described in^{23} and^{24} testing the effect of five chemical compounds (carbendazim, cypermethrin, dimethoate, malathion and propiconazole) on the survival rate of the amphipod crustacean *Gammarus pulex*. Two experiments were performed for each compound, one exposing *G*. *pulex* to constant concentrations and the other exposing *G*. *pulex* to time-variable concentrations (see Table 1). In the constant exposure experiments, *G*. *pulex* was exposed to eight concentrations for four days. In the time-variable exposure experiments, *G*. *pulex* was exposed to two different pulse profiles consisting of two one-day exposure pulses with either a short or long interval between them.

### GUTS modelling

In this section, we detail the mathematical equations of GUTS models describing the survival rate over time of organisms exposed to a profile of concentrations of a single chemical product. All other possible derivations of GUTS models are fully described in^{10,14}. Here, we provide a summary of GUTS-RED-SD and GUTS-RED-IT reduced models to introduce notations and equations relevant for mathematical derivation of explicit formulations of the *x*% lethal concentration at time *t*, denoted *LC*(*x*, *t*), and of the multiplication factor leading to *x*% mortality at time *t*, denoted *MF*(*x*, *t*).

#### Toxicokinetic

We define *C*_{w}(*t*) as the external concentration of a chemical product, which can be variable over time. As there is no measure of internal concentration, we use the scaled internal concentration, denoted *D*_{w}(*t*), which is therefore a latent variable described by the toxicokinetic part of the model as follows:

where *k*_{d} [*time*^{−1}] is the dominant rate constant, corresponding to the slowest compensating process dominating the overall dynamics of toxicity.

As we assume that the internal concentration equals 0 at *t* = 0, the explicit formulation for constant concentration profiles is given by

An explicit expression for time-variable exposure profiles is provided in the Supplementary Material as it can be useful for implementation but not for the mathematical calculus presented below. The GUTS-RED-SD and GUTS-RED-IT models are based on the same model for the scaled internal concentration. These models do not differ in the TK part but do differ in the TD part describing the death mechanism.

From the toxicokinetic Eq. (2), we can easily compute the *x*% depuration time *DRT*_{x}, that is, the period of time after a pulse leading to an *x*% reduction in the scaled internal concentration:

While GUTS-RED-SD and GUTS-RED-IT models have the same toxicokinetic Eq. (1), the *DRT*_{x} likely differs between them since the meaning of damage depends on the toxicodynamic equations, which are different.

#### Toxicodynamic

The GUTS-RED-SD model supposes that all the organisms have the same internal threshold concentration, denoted *z* [*mol*.*L*^{−1}], and that once this concentration threshold is exceeded, the instantaneous probability of death, denoted *h*(*t*), increases linearly with the internal concentration. The mathematical equation is

where *b*_{w} [*L*.*mol*.*time*^{−1}] is the killing rate and *h*_{b} [*time*^{−1}] is the background mortality rate.

Then, the survival probability over time under the GUTS-RED-SD model is given by

The GUTS-RED-IT model supposes that the threshold concentration is distributed among organisms and that death is immediate as soon as this threshold is reached. The probability of death at the maximal internal concentration with background mortality *h*_{b} is given by

Assuming a log-logistic function, we get \(F(x)=\frac{1}{1+{(x/{m}_{w})}^{-\beta }}\), with the median *m*_{w} [*mol*.*L*^{−1}] and shape *β* of the threshold distribution, which gives

### Implementation and Bayesian inference

GUTS models were implemented within a Bayesian framework with *JAGS*^{25} by using the R package *morse*^{22}. The Bayesian inference methods, choice of priors and parameterisation of the MCMC process have previously been fully explained^{13,20,22}. The joint posterior distribution of parameters was used to predict survival curves under tested and untested exposure profiles, to calculate *LC*(*x*, *t*) and *MF*(*x*, *t*), and to compute goodness-of-fit measures (see hereinafter). The use of the joint posterior distribution allowed us to quantify the uncertainty around all these predictions; therefore, their medians and 95% credible intervals were computed as follows: under a specific exposure profile, we simulated the survival rate over time for every joint posterior parameter set; then, at each time point of the time series, we computed 0.5, 0.025 and 0.975 quantiles, thus providing medians and 95% limits.

### Measures of model robustness

Modelling is always associated with testing robustness: not only the robustness in fitting data used for calibration but also the robustness in generating predictions with new data^{26}. To evaluate the robustness of estimations and predictions with the two GUTS models, we calculated their statistical properties by means of the normalized root mean square error (NRMSE), the posterior predictive check (PPC), the Watanabe-Akaike information criterion and leave-one-out cross-validation (LOO-CV)^{27}. These global measures summary all the fitting, and not a specific part such as at final-time of the experiment^{12}.

#### Normalized root mean square error

The root mean square error (RMSE) allows one to characterize the difference between observations and predictions from the posterior distribution. With *N* observations and *y*_{i,obs} observed individuals (*i *∈ {1, …, *N*}), for each estimation *y*_{.,j} of the Markov chain of size *M* (*j *∈ {1, …, *M*}) resulting from the Bayesian inference, we can define the *RMSE*_{j} as

where the normalized RMSE (NRMSE) is given by dividing RMSE by the mean of the observations, denoted \(\overline{{y}_{obs}}\). We then have the distribution of the NRMSE, from which we can obtain the median and the 95% credible interval, as presented in Table 2.

#### Posterior predictive check (PPC)

The posterior predictive check consists of comparing replicated data drawn from the joint posterior predictive distribution to observed data. A measure of goodness-of-fit is the percentage of observed data falling within the 95% predicted credible intervals^{27}. So the better fit is at a %PCC around 95.

#### WAIC and LOO-CV

Information criteria such as the WAIC and LOO-CV are common measures of predictive precision also used to compare models (i.e. the lower is the value, the better is the fit). The WAIC is the sum of the log predictive density computed for every point, to which a bias is added to take into account the number of parameters. The LOO-CV method uses the log predictive density estimated from a training subset and applies it to another one^{27}. Both the WAIC and LOO-CV criteria were computed with the R package *bayesplot*^{28}.

### Mathematical definition and properties of *LC*(*x*, *t*)

The *LC*(*x*, *t*) makes sense only under conditions of constant exposure profiles (i.e., for any time *t*, *C*_{w}(*t*) is constant). In such situations, we can provide an explicit formulation of the survival rate over time by considering both the GUTS-RED-SD and GUTS-RED-IT models. Many software provide an implementation of GUTS models that make it possible to compute the *LC*(*x*, *t*) at any time and for any *x*%^{14}. Our Bayesian implementation of GUTS models using the R environment is one example^{22}.

Let *LC*(*x*, *t*) be the lethal concentration for *x*% of organisms at any time *t* and *S*(*C*, *t*) be the survival rate at the constant concentration *C* and time *t*. Then, the *LC*(*x*, *t*) is defined as

where *S*(0, *t*) is the survival rate at time *t* when there is no contaminant, which reflects the background mortality.

#### GUTS-RED-SD model

The lethal concentration *LC*_{SD}(*x*, *t*) is given by

As mentioned in the Supplementary Material, under time-variable exposure, *t*_{z} also varies over time, while in the case of constant exposure, *t*_{z} is exactly −1/*k*_{d} ln(1 − *z*/*C*_{w}). This expression of *t*_{z} prevents an explicit formulation of *LC*_{SD}(*x*, *t*). For increasing time, the *LC*_{SD}(*x*, *t*) curve becomes a vertical line at concentration *z*. We assume that the threshold concentration *z* is reached in a finite amount of time, which means that \(\mathop{\mathrm{lim}}\limits_{t\to +\infty }t-{t}_{z}=+\infty \). Therefore, when time tends to infinity, the convergence is

#### GUTS-RED-IT model

The lethal concentration *LC*_{IT}(*x*, *t*) is given by

It is then clear that as *t* increases, the *LC*_{IT}(*x*, *t*) converges to

In the specific case of *x* = 50%, we get \(\mathop{\mathrm{lim}}\limits_{t\to +\infty }LC(50,t)={m}_{w}\).

#### Calculation of the density distribution of **LC**(x, t)

The calculation of *LC*(*x*, *t*) is based on Eq. (9). Using the GUTS models and the estimates of parameters from the calibration processes, we compute the survival rate without contamination (i.e., the background mortality, denoted *S*(0, *t*)) and a set of predictions of the survival rate over a range of concentrations (i.e., *S*(*C*, *t*)).

### Mathematical definition and properties of the multiplication factor *MF*(*x*, *t*)

Contrary to the lethal concentration *LC*(*x*, *t*) used under conditions of constant exposure profiles, the multiplication factor *MF*(*x*, *t*) can be computed for both constant and time-variable exposure profiles.

With the exposure profile *C*_{w}(*τ*), with *τ* ranging from 0 to *t*, the *MF*(*x*, *t*) is defined as

In the Supplementary Material, we show that the internal damage *D*_{w}(*t*) is linearly related to the multiplication factor since regardless of the exposure profile (constant or time-variable), we get the following relationship:

where \({D}_{w}^{MF}(t)\) is the internal damage when the exposure profile is multiplied by *MF*(*x*, *t*).

#### GUTS-RED-SD model

The multiplication factor *MF*_{SD}(*x*, *t*) is given by

#### GUTS-RED-IT model

The multiplication factor *MF*_{IT}(*x*, *t*) is given by

Therefore, from a GUTS-RED-IT model, solving the toxicokinetic part, which gives \(\mathop{{\rm{\max }}}\limits_{0 < \tau < t}({D}_{w}(\tau ))\), is enough to find any multiplication factor for any *x* at any *t*. When the external concentration is constant, this maximum is \({C}_{w}(1-{e}^{-{k}_{d}t})\).

## Results

### Goodness-of-fit of GUTS-RED-SD and GUTS-RED-IT models

For all compounds, fitting observed survival with test data obtained under constant exposure profiles provides better fits than using data from testing under time-variable exposure profiles (Table 2, see also posterior predictive check graphics in Supplementary Material), regardless of the measure of goodness-of-fit (except for the NRMSE measure used on the GUTS-RED-IT model of dimethoate). This result is not surprising since, as shown in Table 1, there are always more time series in data sets with constant exposure profiles. In addition, since there are explicit solutions of differential equations with constant exposure profiles for both the GUTS-RED-SD and GUTS-RED-IT models, the computational process for constant exposure profiles is easier than that for time-variable exposure profiles, which requires the use of a numerical integrator.

For validation, we calibrated the model on a data set A to then predict another data set B. As a result, regardless of the measure of goodness-of-fit, the predictions are always better when the calibration is carried out using data of time-variable exposure profiles to then predict data from constant exposure profiles than when the inverse was carried out, that is, calibration using data from testing under constant exposure profiles to then predict data from testing under time-variable exposure profiles.

Table 2 shows that the GUTS-RED-SD and GUTS-RED-IT models are similar in the quality of their fits. However, the GUTS-RED-IT model particularly underperforms for carbendazim and dimethoate under time-variable exposure profiles. Nonetheless, under time-variable exposure profiles for the malathion and propiconazole data sets, the 95% credible interval for the GUTS-RED-IT model is large (see figures in the Supplementary Material). However, when uncertainties are large, the 95% credible interval around predictions used for the PPC tends to cover all the observations regardless of the fitting accuracy. The Bayesian measures WAIC and LOO-CV are better for penalizing excessively large uncertainties.

### Comparison of *LC*(*x*, *t*) between GUTS-RED-SD and GUTS-RED-IT models

*LC*

There is no obvious difference between the GUTS-RED-SD and GUTS-RED-IT models in their goodness-of-fit nor in the calculation of *LC*(*x*, *t*) over time *t* or for different percentages of the population affected (*x*).

#### LC(x, t) as a function of time t

As expected, Fig. 1(A,B) and the Supplementary Material show that *LC*(*x*, *t*) decreases with time. The shape of this decrease, which is exponential and converges towards the model-specific threshold values, is rarely analyzed. This asymptotic behavior is known as the incipient *LC*(*x*, *t*)^{29}. A direct consequence for risk assessors is that the evaluation of *LC*(*x*, *t*) at an early time induces higher sensitivity to time *t* than that at a later time (with the specific time being relative to the species and the compound). In other words, the sensitivity of *LC*(*x*, *t*) to time *t* decreases as long as *t* increases. For instance, Fig. 1(A,B) reveal that a small amount of change in time around day 2 leads to a greater change in the estimation of *LC*(*x*, *t*) than does a small amount around day 4. However, note that the uncertainty of *LC*(*x*, *tx*, *t*) does not always decreases when time increases. For instance, as shown in Fig. 1(B), the uncertainty at day 6 and afterward is greater than that around day 3.

When *t* increases to infinity, *LC*(*x*, *t*) converges towards the distribution of parameter *z* for the GUTS-RED-SD model (see Eq. (11)) and \({m}_{w}\sqrt[\beta ]{\frac{x}{100-x}}\) for the GUTS-RED-IT model (see Eq. (13)). The specific *LC*_{50,t} tends to *z* for the GUTS-RED-SD model and to *m*_{w} for the GUTS-RED-IT model (see Eqs (11) and (13)).

#### LC(x, t) as a function of percentage of the population affected, x

As shown in Fig. 1(C,D), the uncertainty of *LC*(*x*, *t*) is greater at low values of *x*, that is, when the effect of the contaminant is weak. Although computing *LC*(*x*, *t*) at *x* > 50% is never used for ERA, the uncertainty of *LC*(*x*, *t*) increases when *x* tends to 100%. As a consequence, while the uncertainty is not always minimal at the standard value of *x* = 50%, it seems always to be smaller around this value than around *x* = 10%, another classical value used in ERA.

### Comparison of *MF*(*x*, *t*) between GUTS-RED-SD and GUTS-RED-IT models

#### MF(x, t) as a function of time t

As expected, Fig. 2(D–F) show that the multiplication factor decreases, or stay constant, when the time at which the survival rate is checked increases. In other words, the later the survival rate is assessed, the lower the multiplication factor is. In addition, these graphics reveal that there is no typical pattern in the curves of multiplication factors over time *t* of exposure. Under a constant exposure profile, the curve shows an exponential decreasing pattern, while under pulsed exposure, it shows a constant phase and, at the time when exposure peaks, a sudden decrease in the multiplication factor. The multiplication factor is clearly highly variable around a concentration pulse of the chemical product.

#### MF(x, t) as a function of percent survival reduction x

Unsurprisingly, Fig. 2(G–I) show that the multiplication factor increases with an increase in the percent reduction in the survival rate. An interesting result is the non-linearity of this increase. As observed for the *LC*(*x*, *t*), the uncertainty is greater at low and high percentages than for intermediate values near a 50% survival reduction. As a consequence, it would be relevant to set 50% as a standard for ERA.

### Effect of the depuration time on the predicted survival rate

#### Patterns of internal scaled concentrations

The dominant rate constant *k*_{d}, which regulates the kinetics of the toxicant, is always greater for the GUTS-RED-SD model than for the GUTS-RED-IT model, such that the depuration time for the GUTS-RED-SD model is always smaller than that for the GUTS-RED-IT model (see Fig. 3 and Supplementary Material). As a consequence, under a time-variable exposure concentration, the internal scaled concentration with the GUTS-RED-SD model has a greater amplitude than that with the GUTS-RED-IT model (Figs 4 and 5 and Supplementary Material). In other words, the toxicokinetic with the GUTS-RED-IT model are smoother than those with the GUTS-RED-SD model. Compensation for differences in *k*_{d} and therefore in the scaled internal concentrations comes from the other parameters: the threshold *z* and the mortality rate *k*_{k} for the GUTS-RED-SD model and the median threshold *m*_{w} and shape *β* for the GUTS-RED-IT model. However, when the calibration of the models is based on the same observed number of survivors, the threshold parameter *z* for the GUTS-RED-SD model and the median threshold *m*_{w} for the GUTS-RED-IT model are shifted.

#### Variation in the number of pulses in exposure profiles

The first step has been to explore the effect of the number of pulses (9, 6 and 3 pulses of one day each) over a period of 20 days with the same total dose (i.e. area under the curve) in the external concentration after the 20 days (Fig. 4 and Supplementary Material). For a conservative approach for ERA, regardless of whether the GUTS-RED-SD or GUTS-RED-IT model is used, it seems better to have few pulses of high amplitude than many pulses of low amplitude. Indeed, the survival rate over time with only 3 high pulses is lower than the survival rate under frequent lower exposure. This difference is confirmed in the Supplementary Material for the malathion and propiconazole data sets. Since the cumulative amount of contaminant is not changed, we do not see any effect of contaminant depuration (Eq. (3) and Fig. 3), which could help individuals recover under a lower frequency of peaks. The comparison between constant and time-variable exposure profiles (Fig. 4 and Supplementary Material) suggests that uncertainty is smaller when calibration is performed with data collected under a time-variable exposure profile. This result is counter-intuitive, especially since the number of time series was higher for the constant exposure profiles, which would reduce the uncertainties of parameter estimates. If this result is confirmed, then it would be better to predict variable exposure profiles with parameters calibrated from time-variable exposure data sets.

#### Variation in the period between two pulses

To explore the effect of depuration time, we simulated exposure profiles under two pulses with different periods of time between them (i.e., 1/2, 2 or 7 days). The cumulative amount of contaminant remained the same for the three simulations. Figure 5 shows that increasing the period between two pulses may increase the survival rate of individuals, regardless of whether the GUTS-RED-SD or GUTS-RED-IT model is used. This is a typical result of extending the depuration period, which reduces the level of scaled internal concentration and therefore reduces the damage. We can easily see that the highest scaled internal concentration is reached when the pulse interval is the smallest. In this scenario, the addition of damages from the two pulses is clear. Again, because of the different depuration times of the two GUTS models, the results are different.

## Discussion

### Tracking uncertainties for environmental quality standards

Regardless of the scientific field, risk assessment is by definition linked to the notion of probability, characterized by different uncertainties such as the variability among organisms and noise in observations. In this sense, tracking how uncertainty propagates into models from collected data to model calculations of toxicity endpoints that are finally used for EQSs derivation is of fundamental interest for ERA^{15}. For ERA, achieving good fits of experimental data is not enough. Instead, the key objective is the application of these fits to predict adverse effects under real environmental exposure profiles and to derive robust EQSs^{1,5,6,12,16}. In this context, as we have shown in this paper, calibrated TKTD models allow predictions of regulatory toxicity endpoints under any type of exposure profile^{30}. Moreover, the Bayesian approach provides the joint posterior distribution on parameters from which marginal distributions of each parameter can be extracted, and in this way, allows one to easily track the uncertainty of any prediction of interest. The cost of using a Bayesian approach is the need to provide a clear probability structure of the parameter space. Notice that such an uncertainty propagation from the estimation process of the model parameters to outputs of interest could also be performed based on a frequentist inference method^{30,31}.

Previous studies investigating goodness-of-fit did not find typical differences between GUTS-RED-SD and GUTS-RED-IT models^{9,13}. Our study confirms that under the specific consideration of uncertainties in regulatory toxicity endpoints, there is no evidence to support choosing either the GUTS-RED-SD or GUTS-RED-IT model over the other. A simple recommendation is therefore to use both and then, if they are successfully validated, take the most conservative scenario in terms of the ERA. With the 10 data sets we used and the 20 fittings we performed, the four measures of goodness-of-fit showed similar outputs for the GUTS-RED-SD and GUTS-RED-IT models under both constant and time-variable exposure profiles. The percentage of observed data falling within the 95% predicted credible interval, %*PPC*, has the advantage of being linked to visual graphics, i.e., PPC plots, and is therefore easier for risk assessors and stakeholders to interpret than the Bayesian WAIC and LOO-CV measures^{17}. However, when the uncertainty is very large, predictions with their 95% credible intervals are likely to cover all of the observations, even in cases of low model accuracy. We showed that the WAIC and LOO-CV criteria are more robust probability measures for penalizing fits with large uncertainties^{27}. Since the NRMSE is easy to calculate for any inference method (e.g., maximum likelihood estimation), it is also a relevant measure for checking the goodness-of-fit of models, as recently recommended by^{12}.

### What about the use and abuse of the lethal concentration?

After checking the quality of model parameter calibration, the next question is about the uncertainty of toxicity endpoints used to derive EQSs. Lethal concentrations are currently a standard for hazard characterization at the levels of a 10, 20 and 50% effect on the individuals. We show that the uncertainty of lethal concentrations differs according to the percentage *x* under consideration (Fig. 1). It appears that this uncertainty is maximal at the extremes (toward 0 and 100%) and limited around 50%. Since the point of minimal uncertainty may drastically change depending on the experimental design, it could be relevant to extrapolate the lethal concentration for a continuous range of *x* (e.g., 10 to 50%), as we did for Fig. 1(C,D).

Many criticisms have targeted the lethal and effective concentrations for *x*% of the population and other related measures^{6}. For instance, the classical way to compute the lethal concentration, at the final time point, ignores information provided by the observations made throughout the experiment and thus hides the time dependency. For the lethal effect, a classical approach to limit the variability in the period of time is to consider a long enough exposure duration to obtain the incipient lethal concentration (i.e., *LC*(*x*, *t* → +∞))^{29}, that is, when the lethal concentration reaches its asymptote and no longer changes with an increasing duration of exposure, as observed in Fig. 1. We provide mathematical expression for the lethal concentration convergence and explicit results when *x* = 50% for both GUTS models. We can therefore use the joint posterior parameter distribution provided by Bayesian inference to compute the distribution of the incipient lethal concentration.

A consequence of the exponential decrease in the lethal concentration with increasing time is that the sensitivity to time is greater early on, when a small change in time induces a great change in the lethal concentration regardless of *x*. Our analysis confirms that the classical evaluation of lethal concentration at the last time point of an experiment is supported by theoretical considerations. Hence, when comparing the lethal concentrations of different compounds or species that may require different experiment durations, using TKTD to extrapolate to other time points is highly advantageous.

### What does it mean to use a margin of safety?

Among the criticisms of the lethal concentration, one is that it is meaningful only under a set of constant environmental conditions, including a constant exposure profile^{6,29}. When the concentration of chemical compounds in the environment is highly variable over time, the use of toxicity endpoints based on toxicity data for constant exposure profiles may hide some processes, such as the response to pulses of exposure. This inadequacy is the reason underlying the interest in multiplication factors for ERA^{9,12}.

A margin of safety deduced from a multiplication factor quantifies how far the exposure profile is below toxic concentrations^{9}. Then, a key objective for risk assessors is to target the safest exposure duration and percentage effect on survival, *x*. Our study reveals a lower uncertainty around an *x* value of 50%. Thus, to reduce the uncertainty of the multiplication factor estimation, we recommend that 50% be selected, at least for comparisons between studies. We also show that under constant exposure profiles, the multiplication factor exhibits an asymptotic shape similar to that of the lethal concentration. There is an incipient value of the multiplication factor for any *x* as time goes to infinity. Therefore, under constant profiles, we recommend that the latest time point in the exposure profile be used to determine toxicity endpoints to reduce the sensitivity of the multiplication factor estimation to time.

The multiplication factor is also meaningful when being applied to realistic exposure profiles, which are rarely constant, and our study shows that there is no asymptotic shape under such conditions. In addition, we observed great sensitivity of the multiplication factor to time around peaks in the exposure profiles, that is, high variation in the multiplication factor with small changes in time. Therefore, it is recommended that multiplication factors are computed only some time (e.g., several days) after a peak. More generally, the multiplication factor is designed to be compared to the assessment factor (AF) classically used with the effect/lethal concentration value to derive EQSs based on real-world exposure profiles. As a consequence, assessors must be very careful in examining the characteristics of pulses in the exposure profiles (e.g., frequencies and amplitudes) to understand how they drive changes in the multiplication factor. For such exploration, taking advantage of TKTD capabilities to generate predictions at any time is valuable.

### Effect of depuration in time-variable exposure profiles

Depuration time and so the toxicokinetic part of the TKTD model influences the survival response to pulses. The kinetics of assimilation and elimination of compounds integrated within the toxicokinetic module are a fundamental part of ecotoxicological models^{32}. In reduced GUTS models, namely, GUTS-RED-SD and GUTS-RED-IT models, we assume no measure of the absolute internal concentration, which is therefore calibrated at the same time as other parameters included in the toxicodynamic part. The resulting scaled damage is defined by the toxicodynamic, for which there are two different hypotheses regarding the mechanism of mortality for GUTS-RED-SD and GUTS-RED-IT models. As a consequence, our results illustrate that the scaled damage does not have the same meaning in GUTS-RED-SD and GUTS-RED-IT models and therefore cannot be directly compared between them.

In both models, from the underlying mechanism, we know that damage is positively correlated with pulse amplitude: the lower the amplitude is, the lower the damage is, as shown in Fig. 4. As a result, for the same cumulative amount of contaminant in an experiment, using fewer pulses reduces final survival rates. Therefore, the most conservative experimental design is one with fewer pulses of relatively high amplitude.

Furthermore, in Fig. 5, we bring to light the effect of depuration time. When pulses are close together, the organisms do not have time to depurate; therefore, the damage accumulates and thus has a cumulative effect on survival. As a consequence, in a long enough experiment, when pulses become less correlated in terms of cumulative damage (i.e., lower period of time between them), then the final survival rate increases. Because of this phenomenon, we recommend an experimental design with two close pulses, as it is the more conservative in terms of ERA. However, to achieve better calibration of the toxicokinetic parameter, which would potentially differentiate the GUTS-RED-SD model from the GUTS-RED-IT one, it is important to also include uncorrelated pulses in the experimental design.

Finally, our study reveals that the uncertainty of predictions under time-variable exposure profiles seems to be smaller when calibration is performed with data sets under time-variable rather than constant exposure profiles. While this observation makes theoretical sense, since predictions are made with the same type of profile as that used for calibration of the parameters, further empirical studies must be performed to confirm this point.

The environmental dynamics of chemical compounds can be highly variable depending not only on the whole environmental context (e.g., anthropogenic activities, geochemical kinetics, and ecosystem processes) but also on the chemical and biological transformation of the compound under study. Therefore, as a general recommendation, we would like to point out the relevancy of experimenting with several types of exposure profiles. Generally, a control and both constant and time-variable exposure profiles including toxicologically dependent and independent pulses seem to be the minimum requirements.

### Practical use of GUTS models

#### Optimization and exploration of experimental designs

The complexity of environmental systems combined with thousands of compounds produced by human activities implies the need to assess environmental risk for a very large set of species-compound combinations^{33}. As a direct consequence, optimizing experimental design to maximize the gain in high-quality information from experiments is a challenging requisite for which mechanism-based models combined with a Bayesian approach offer several tools^{21}. An extension of the present study would be to use the joint posterior distribution of parameters and the distribution of toxicity endpoints to quantify the gain in knowledge from several potential experiments. The next objective is thus to develop a framework that could help in the construction of new experimental designs to minimize their complexity and number while maximizing the robustness of toxicity endpoint estimates.

Despite their many advantages, TKTD models and therefore GUTS models remain little used. This lack of use is due to the mathematical complexity of such models based on differential equations that need to be numerically integrated when fitted to data^{34}. By promoting GUTS models within regulatory documents associated with ERAs, the models could be further extended when available within a software environment allowing their implementation without the need to engage with technicalities. Currently, several software allow these difficulties to be circumvented^{14,22,35}, and a web platform has been proposed^{36}.

#### Limitations

Survival is the most often measured response to chemical toxins in the environment, but it may be more relevant to manage sub-lethal effects in ERA to prevent community collapse^{37}. While the lethal concentration decreases as time increases, other sub-lethal effects (e.g., reproduction and growth) do not always follow this pattern^{6,38}. The concentration levels in acute toxicity tests are higher than those classically observed in the environment. Therefore, under real environmental conditions, sub-lethal effects may have more direct impacts on population dynamics than on survival. For these reasons, while our study is based on relatively simple life cycle species (*Gammarus pulex*), the sub-lethal effects with more complex life cycle species is likely to be of critical interest. Finally, it would be of real interest to encompass different effects in a global TKTD approach to generate better predictions scaling up to the population and community levels^{6} and at multi-generationnal scales^{15}.

Another well-known limitation is the derivation of EQSs from specific species-compound combinations. To extrapolate ecotoxicological information from a set of single species tests to a community, ERA uses a species sensitivity (weighted) distribution (SS(W)D) which can be used to derive EQSs covering a set of taxonomically different species^{39}. This calculation is classically applied to *LC*(*x*, *t*) and could easily be performed with *MF*(*x*, *t*) with the benefit of being applicable to time-variable exposure profiles^{12}.

## Conclusion

As recently written by EFSA experts, “uncertainty analysis is the process of identifying limitations in scientific knowledge and evaluating their implications for scientific conclusions”^{40}. Inspired by the recent EFSA scientific opinion on TKTD models^{12}, we evaluated a combination of mechanism-based models with a Bayesian inference framework to track uncertainties of toxicity endpoints used in regulatory risk assessment with one compound-one species survival bioassays. We showed that the degree of uncertainty can change dramatically with time and depending on the exposure profile, revealing that single values such as the mean or median may be totally irrelevant for decision making. Description of uncertainties also increases transparency and trust in scientific outputs and is therefore key in applied sciences such as ecotoxicology. Many other kinds of uncertainties emerge along the decision chain, from the hazard identification to the characterization of risk. Focusing on uncertainty, such as through a Bayesian approach, should be a concern at every step and, above all, for any information returned by mathematical-computational models.

## References

- 1.
EFSA Panel on Plant Protection Products and their Residues (PPRs). Guidance on tiered risk assessment for plant protection products for aquatic organisms in edge-of-field surface waters.

*EFSA**Journal***11**, 3290 (2013). - 2.
ECHA. Guidance on information requirements and chemical safety assessment, https://echa.europa.eu/guidance-documents/guidance-on-information-requirements-and-chemical-safety-assessment (2017).

- 3.
Isigonis, P.

*et al*. A multi-criteria decision analysis based methodology for quantitatively scoring the reliability and relevance of ecotoxicological data.*Science of the Total Environment***538**, 102–116 (2015). - 4.
Syberg, K. & Hansen, S. F. Environmental risk assessment of chemicals and nanomaterials- the best foundation for regulatory decision-making?

*Science of the Total Environment***541**, 784–794 (2016). - 5.
Laskowski, R. Some good reasons to ban the use of NOEC, LOEC and related concepts in ecotoxicology.

*Oikos*140–144 (1995). - 6.
Jager, T. Some good reasons to ban ECx and related concepts in ecotoxicology (2011).

- 7.
Reinert, K. H., Giddings, J. M. & Judd, L. Effects analysis of time-varying or repeated exposures in aquatic ecological risk assessment of agrochemicals.

*Environmental Toxicology and Chemistry***21**, 1977–1992 (2002). - 8.
Brock, T. C.

*Linking aquatic exposure and effects: risk assessment of pesticides*(CRC Press, 2009). - 9.
Ashauer, R., Thorbek, P., Warinton, J. S., Wheeler, J. R. & Maund, S. A method to predict and understand fish survival under dynamic chemical stress using standard ecotoxicity data.

*Environmental Toxicology and Chemistry***32**, 954–965 (2013). - 10.
Jager, T., Albert, C., Preuss, T. G. & Ashauer, R. General unified threshold model of survival - a toxicokinetic-toxicodynamic framework for ecotoxicology.

*Environmental Science & Technology***45**, 2529–2540 (2011). - 11.
Hommen, U.

*et al*. How to use mechanistic effect models in environmental risk assessment of pesticides: case studies and recommendations from the SETAC workshop MODELINK.*Integrated Environmental Assessment and Management***12**, 21–31 (2016). - 12.
EFSA PPR Scientific Opinion. Scientific Opinion on the state of the art of Toxicokinetic/Toxicodynamic (TKTD) effect models for regulatory risk assessment of pesticides for aquatic organisms.

*EFSA Journal***16**, e05377 (2018). - 13.
Baudrot, V., Preux, S., Ducrot, V., Pavé, A. & Charles, S. New insights to compare and choose TKTD models for survival based on an inter-laboratory study for

*Lymnaea stagnalis*exposed to Cd.*Environmental Science & Technology***52**, 1582–1590 (2018). - 14.
Jager, T. & Ashauer, R.

*Modelling survival under chemical stress*.*A comprehensive guide to the GUTS framework*.*Version 1*.*0*., https://leanpub.com/guts_book (Leanpub, 2018). - 15.
Dale, V. H.

*et al*. Enhancing the ecological risk assessment process.*Integrated Environmental Assessment and Management***4**, 306–313 (2008). - 16.
Gray, G. M. & Cohen, J. T. Policy: rethink chemical risk assessments.

*Nature***489**, 27 (2012). - 17.
Beck, N. B.

*et al*. Approaches for describing and communicating overall uncertainty in toxicity characterizations: US Environmental Protection Agency’s Integrated Risk Information System (IRIS) as a case study.*Environment International***89**, 110–128 (2016). - 18.
Siu, N. O. & Kelly, D. L. Bayesian parameter estimation in probabilistic risk assessment.

*Reliability Engineering & System Safety***62**, 89–116 (1998). - 19.
Ferson, S. Bayesian methods in risk assessment.

*Unpublished Report Prepared for the Bureau de Recherches Geologiques et Minieres (BRGM)*.*New York*(2005). - 20.
Delignette-Muller, M. L., Ruiz, P. & Veber, P. Robust fit of toxicokinetic–toxicodynamic models using prior knowledge contained in the design of survival toxicity tests.

*Environmental Science & Technology***51**, 4038–4045 (2017). - 21.
Albert, C., Ashauer, R., Künsch, H. & Reichert, P. Bayesian experimental design for a toxicokinetic–toxicodynamic model.

*Journal of Statistical Planning and Inference***142**, 263–275 (2012). - 22.
Baudrot, V.

*et al*.*morse: MOdelling Tools for Reproduction and Survival Data in Ecotoxicology*, https://cran.r-project.org/web/packages/morse/index.html. R package version 3.2.4. (2018). - 23.
Ashauer, R., Hintermeister, A., Potthoff, E. & Escher, B. I. Acute toxicity of organic chemicals to

*Gammarus pulex*correlates with sensitivity of*Daphnia magna*across most modes of action.*Aquatic Toxicology***103**, 38–45 (2011). - 24.
Nyman, A.-M., Schirmer, K. & Ashauer, R. Toxicokinetic-toxicodynamic modelling of survival of

*Gammarus pulex*in multiple pulse exposures to propiconazole: model assumptions, calibration data requirements and predictive power.*Ecotoxicology***21**, 1828–1840 (2012). - 25.
Plummer, M.

*rjags: Bayesian Graphical Models using MCMC*https://CRAN.R-project.org/package=rjags. R package version 4–6 (2016). - 26.
Grimm, V. & Berger, U. Robustness analysis: Deconstructing computational models for ecological theory and applications.

*Ecological Modelling***326**, 162–167 (2016). - 27.
Gelman, A.

*et al*.*Bayesian Data Analysis*(Chapman and Hall/CRC, 2013). - 28.
Gabry, J. & Mahr, T.

*bayesplot: Plotting for Bayesian Models*https://CRAN.R-project.org/package=bayesplot. R package version 1.4.0 (2017). - 29.
Jager, T., Heugens, E. H. & Kooijman, S. A. Making sense of ecotoxicological test results: towards application of process-based models.

*Ecotoxicology***15**, 305–314 (2006). - 30.
Ashauer, R.

*et al*. Modelling survival: exposure pattern, species sensitivity and uncertainty.*Scientific Reports***6**(2016). - 31.
Focks, A.

*et al*. Calibration and validation of toxicokinetic-toxicodynamic models for three neonicotinoids and some aquatic macroinvertebrates.*Ecotoxicology***27**, 992–1007 (2018). - 32.
Wang, W.-X. & Fisher, N. S. Assimilation efficiencies of chemical contaminants in aquatic invertebrates: a synthesis.

*Environmental Toxicology and Chemistry***18**, 2034–2045 (1999). - 33.
Ashauer, R. & Jager, T. Physiological modes of action across species and toxicants: the key to predictive ecotoxicology.

*Environmental Science: Processes & Impacts*(2018). - 34.
Albert, C., Vogel, S. & Ashauer, R. Computationally Efficient Implementation of a Novel Algorithm for the General Unified Threshold Model of Survival (GUTS).

*PLoS Computional*.*Biology***12**, e1004978 (2016). - 35.
Albert, C. & Vogel, S.

*GUTS: Fast Calculation of the Likelihood of a Stochastic Survival Model*, https://CRAN.R-project.org/package=GUTS. R package version 1.0.4. (2017). - 36.
Baudrot, V., Veber, P., Gence, G. & Charles, S. Fit GUTS reduced models online: from theory to practice.

*Integrated Environmental Assessment and Management***14**, 625–630 (2018). - 37.
Baudrot, V., Fritsch, C., Perasso, A., Banerjee, M. & Raoul, F. Effects of contaminants and trophic cascade regulation on food chain stability: Application to cadmium soil pollution on small mammals–raptor systems.

*Ecological Modelling***382**, 33–42 (2018). - 38.
Álvarez, O. A., Jager, T., Redondo, E. M. & Kammenga, J. E. Physiological modes of action of toxic chemicals in the nematode

*Acrobeloides nanus*.*Environmental Toxicology and Chemistry***25**, 3230–3237 (2006). - 39.
Duboudin, C., Ciffroy, P. & Magaud, H. Effects of data manipulation and statistical methods on species sensitivity distributions.

*Environmental Toxicology and Chemistry***23**, 489–499 (2004). - 40.
EFSA Scientific Opinion. Guidance on uncertainty analysis in scientific assessments.

*EFSA Journal***16**(2018).

## Acknowledgements

The authors are very grateful for inputs from Theo Brock on an earlier version of the manuscript. We thank Andreas Focks and two anonymous reviewers for their valuable suggestions. The authors also thank the French National Agency for Water and Aquatic Environments (ONEMA, now the French Agency for Biodiversity) for its financial support. This manuscript has not been submitted for publication in another journal, but a pre-print version is available and has already been peer-reviewed and recommended by Peer community In Ecology. Two reviewers (Andreas Focks and two anonymous reviewers) evaluated this manuscript and Luís César Schiesari recommended it based on these reviews. The reviewers and the recommender have no conflict of interests with us or with the content of the manuscript. The reviews and the recommendation text are publicly available at the following address: https://doi.org/10.24072/pci.ecology.100007.

## Author information

### Affiliations

### Contributions

V.B. and S.C. designed the model and the computational framework. V.B. carried out the implementation and performed the calculations. V.B. and S.C. analysed the data. V.B. and S.C. discussed the result and wrote the manuscript.

### Corresponding author

Correspondence to Virgile Baudrot.

## Ethics declarations

### Competing Interests

The French National Agency for Water and Aquatic Environments (ONEMA, now the French Agency for Biodiversity) provided financial support.

## Additional information

**Publisher’s note:** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary information

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Baudrot, V., Charles, S. Recommendations to address uncertainties in environmental risk assessment using toxicokinetic-toxicodynamic models.
*Sci Rep* **9, **11432 (2019) doi:10.1038/s41598-019-47698-0

#### Received

#### Accepted

#### Published

#### DOI

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.