Modelling survival: exposure pattern, species sensitivity and uncertainty

The General Unified Threshold model for Survival (GUTS) integrates previously published toxicokinetic-toxicodynamic models and estimates survival with explicitly defined assumptions. Importantly, GUTS accounts for time-variable exposure to the stressor. We performed three studies to test the ability of GUTS to predict survival of aquatic organisms across different pesticide exposure patterns, time scales and species. Firstly, using synthetic data, we identified experimental data requirements which allow for the estimation of all parameters of the GUTS proper model. Secondly, we assessed how well GUTS, calibrated with short-term survival data of Gammarus pulex exposed to four pesticides, can forecast effects of longer-term pulsed exposures. Thirdly, we tested the ability of GUTS to estimate 14-day median effect concentrations of malathion for a range of species and use these estimates to build species sensitivity distributions for different exposure patterns. We find that GUTS adequately predicts survival across exposure patterns that vary over time. When toxicity is assessed for time-variable concentrations species may differ in their responses depending on the exposure profile. This can result in different species sensitivity rankings and safe levels. The interplay of exposure pattern and species sensitivity deserves systematic investigation in order to better understand how organisms respond to stress, including humans.


Example code and software
All example code and software can be downloaded free of charge as supporting information to this publication.

Mathematica
Model calibration, predictions and estimation of uncertainty bands to forecast survival of G. pulex across different exposure patterns were performed in Mathematica 10.0 (Wolfram Research). This is an example implementation of GUTS-SIC-IT and GUTS-SIC-SD, where the IT version assumes a loglogistic distribution of the threshold. The Mathematica code is provided as separate supporting information.

Matlab
Additional calculations on the synthetic data (see sections below) were made with the BYOM package for Matlab and the GUTS2 package (which can be used for all GUTS flavours). Both can be downloaded free of charge from http://www.debtox.info/byom.html. The BYOM package applies a Simplex optimisation routine (which can be easily changed to other Matlab-based optimisation functions) for the likelihood function (following from the multinomial distribution 1 ). To produce confidence intervals, the BYOM package includes likelihood profiling 2 and MCMC sampling (yielding a Bayesian posterior that is used to derive marginal distributions for individual parameters, as well as the joint distribution for forward prediction). Example fits are shown in the sections below.

GUTS R package
Inference with GUTS proper on the synthetic data was performed with the R-package "GUTS" 3,4 . A log-normal distribution of thresholds was used i . The priors were chosen to be flat. The posterior samples were generated by means of adaptive Monte Carlo chains (R package "adaptMCMC"). To get a start value for the chains, the optimizer "hjkb" from the R-package "dfoptim" was employed. Since the posterior marginals of some of the parameters seem to have non-vanishing support at infinity, all parameters were transformed with exp(-ϑ) before running the chains. Chains of length 1 M (resp 2 M) were used and the adaptation phase (10 % of the chain) was discarded. The maximum of the posterior (best fit) as well as 2.5% and 97.5 % quantiles of the posterior marginals are plotted in Figure 3.
The R-package can be downloaded free of charge from http://cran.rproject.org/web/packages/GUTS/index.html. To demonstrate the fits on the synthetic data sets, two examples are shown in Figure S9.

GUTS implementation with graphical user interface
An example implementation of GUTS-SIC-IT and GUTS-SIC-SD in Delphi (http://www.embarcadero.com/products/delphi) is also provided. For the IT version a log-logistic, log-normal or Weibull distribution of the threshold can be selected, although the log-logistic distribution is recommended. The optimal parameter set is found by optimizing the likelihood function for GUTS (see Eq. 8) and parameter confidence intervals are calculated using likelihood i We do not see any strong reasons to prefer one threshold distribution over the other (e.g. log-normal vs loglogistic), however the log-logistic distribution is easier to implement computationally in some software environments. In this study we used the log-logistic distribution in the Mathematica and ModelMaker implementations and log-normal in R and Matlab. profiling. The best optimization results are usually obtained with Simulated Annealing, but the fastest optimization is typically achieved using Downhill Simplex. The code for both these algorithms originates from the TPmath library (Version 0.81, 2011 http://sourceforge.net/projects/tpmath/). The implementation comes with a graphical user interface, needs no installation and has been used previously [5][6][7][8] .

Implementation in ModelMaker
An example implementation of GUTS-SIC-IT and GUTS-SIC-SD in ModelMaker is also provided and can be downloaded free of charge from http://www.ecotoxmodels.org/guts/. The IT version assumes a log-logistic distribution of the threshold. ModelMaker (http://www.modelkinetix.com/) is a user friendly software that is well suited to model prototyping, differential equations and model calibration. The ModelMaker implementation includes the likelihood function for GUTS (Eq. 8) and uses the Downhill Simplex algorithm for parameter estimation. GUTS-SIC-IT, GUTS-SIC-SD, GUTS-SID-IT and GUTS-SID-SD versions have been used previously [9][10][11] .

GUTS-SIC-SD and GUTS-SIC-IT
All the models we used were previously published 1,10 . We recall them here for clarity.

Time course of dose metric
We use the scaled internal concentration as a dose metric, which is the true internal concentration divided by the bioconcentration factor 1 . This TK model can be used in absence of information on body residues; it should be noted that has the dimension of an external concentration. The time course of the scaled internal concentration is determined by the dominant rate constant (time -1 ) and given by the differential equation: The driving variable or model input is the external concentration in water . The scaled internal concentration is linked to the stochastic death (SD) or individual tolerance (IT) model to describe the survival over time 1,10 .

Link to survival in GUTS-SIC-SD
In the SD model, the hazard rate, or the instantaneous probability to die, is calculated from the scaled internal concentration: Eq. 2 which describes hazard increasing in proportion to how much the dose metric exceeds the threshold (concentration). The killing rate constant (concentration -1 .time -1 ) is the proportionality constant and (time -1 ) the background hazard rate.
For the parameterisation of the SD model, the killing rate constant and the internal threshold have to be estimated from the survival data. As the dose metric is the scaled internal concentration, and will have the dimension of external concentrations.
The survival probability is the probability of individual to survive until time . It is calculated as follows: with Eq. 3 The background hazard rate should be fitted on the entire dataset, although sometimes it can be defensible to fit it only on the control survival data.

Link to survival in GUTS-SIC-IT
In the IT model we assume that the threshold follows a probability distribution, such as the loglogistic (or log-normal or other suitable distribution) ii , and that death is immediate when the scaled internal concentration exceeds the individual's threshold. We can calculate the survival probability for an individual following a cumulative log-logistic distribution of the threshold by 10 : where a is the median of the distribution of (concentration), and b (-) is the shape parameter of the distribution and F(t) is the cumulative log-logistic distribution of the threshold .
Instead of the shape parameter, an easy-to-interpret alternative can be used in the form of a 'factor of spread' . This is the factor that needs to be used on the median value of to cover 95% of the probability distribution (i.e., from the 0.025-0.975 percentile). This parameter can be used for all distributions that are symmetrical on log-scale (also for the log-normal distribution). For the loglogistic distribution, b equals , and for the log-normal distribution, the standard deviation on scale equals .
In the IT model, the survival is related to the maximum dose metric that individuals experienced until time t rather than to the actual value of the dose metric at time t, because organisms that died previously remain dead (this is particularly important when dealing with time-varying exposure). The probability to survive until time t is then: ii For the synthetic data to optimize experimental design for calibration of the GUTS proper, a log-normal distribution was used. Survival probabilities for both the SD and the IT model are calculated in explicit dependence of the  parameter vector and the external concentration over time, so formally   Eq. 6 and Eq. 7

Parameter optimisation used for forecasting survival across different exposure patterns
Survival data follow a multinomial distribution, hence the following log-likelihood function applies 1 : Eq. 8 Where are experimental observations at sampling times and are simulated survival probabilities at sampling time given parameter vector q . The maximum of the log-likelihood function indicates an optimal fit between observed and simulated survival. In practice, the negative of the log-likelihood function is often used because parameter estimation algorithms usually minimise the objective function. Here we used: Objective function = ln ( | ). Eq. 9 The smaller the objective function, the better is the fit.
To find the optimal vector of parameter values for each of the SD and the IT models, we optimised the parameter values with respect to the experimentally observed survival data. We used the built-in optimisation routine Simulated Annealing of the method NMinimize of Mathematica (Wolfram Research, version 10.0) to obtain the best fit between data and model simulations by minimizing − ln ( | ), i.e. maximizing the log-likelihood function itself.
The optimisation routine yielded an optimal parameter vector for which the log-likelihood function shows the highest values ln ( | ). Optimisation was repeated twice after the first optimisation run, each time using the optimum values from the previous optimisation step as starting values, finally resulting in an optimal parameter set for each species and model version.
All parameter vectors q being tested during the optimisation routine have been stored together with the resulting likelihood values ln ( | ) for further analyses.

Calculation of model performance statistics
For the quantification of the accuracy of the model predictions, we used the chi-square as measure of model performance Eq. 10 where indicate observed numbers of surviving individuals and the predicted number of surviving individuals (not to be confused with the survival rates) at sampling times and the number of observation points. Addition of 1 in the denominator is corresponding to division of surviving individuals into classes: class for numbers of surviving individuals from to . Since the test statistic only asymptotically follows a distribution, and the expected counts in some bins will be low, the result should be considered an approximate measure for goodness-of-fit.

Numerical estimation of parameter confidence intervals
The likelihood ratio method was used to estimate confidence intervals for the optimal parameters 2,12 . Confidence intervals for the single parameters for a given confidence level a were calculated as the profile likelihood. The confidence set includes those parameter values for which the likelihood ratio fulfils the condition: with being the value of the Chi-square distribution for the confidence level a , and the degrees of freedom of the likelihood ratio. For single parameter confidence intervals, the likelihood ratio has . To find these parameter values, one value in the parameter vector, say the p th parameter was set to successively decreasing values (the vector θ (p) then denotes all parameters except p), starting at the maximum-likelihood estimate, i.e. the best parameter value. All parameter values with exception of parameter were again optimised (using the NMinimize method and the SimulatedAnnealing algorithm in Mathematica) to give a best fit to the experimental data. This procedure was repeated until the value of ln ( | ) satisfied eq. 11. Likewise, this procedure was repeated for successively increasing parameter values, again starting with the optimal maximumlikelihood estimate. The first parameter values that did not fulfil condition (eq. 11) were taken as lower and upper confidence interval limits to the confidence level a . This procedure was performed for all parameters and . All parameter vectors being tested during the confidence interval estimation routine have been stored together with the resulting likelihood values for further analyses.

Stochasticity of survival
Survival is a binary process, i.e. an individual can either be alive or death. The number of individuals in a group of individuals influences the uncertainty of model predictions about the survival of the group of individuals, especially for small numbers ( ). To consider the specific nature of survival, survival is modelled as a binomial process. Multinomial distributions have previously been used to model survival 1 and they are equivalent to the conditional binomial distribution, but only when there are no animals lost to follow-up 13  where is the survival probability calculated by the SD or the IT model for parameter vector q , external concentration time course , and the time .
is the probability to survive until time , which is different to the conditional probability of Eq. 13. Note that the time step is not fixed to 1 here, as the survival probabilities can be evaluated for arbitrary small or large time steps, hence conditional probabilities can be modelled as a random process using a flexible time step .

Including parametric uncertainty in forecasts
Parametric uncertainty in model simulations is in addition to the stochasticity of survival itself. To consider parametric uncertainties, the following approach is used.
For all monitored parameter vectors that have been generated and tested within the parameter optimisation and confidence interval estimation routines, the corresponding likelihood values ln ( ( ) | ) have been stored. To approximate the joint confidence regions for the parameters, we selected those parameter sets from the optimisation procedure (simulated annealing) that were not rejected in a likelihood ratio test (critical value 7.81 from Chi-square distribution, , ): Eq. 14 with being the value of the Chi-square distribution for the confidence level a , and the degrees of freedom of the likelihood ratio. The degrees of freedom for the likelihood ratio that is being used for the construction of the joint confidence region are defined by the difference in free model parameters, that is in this case , because for optimisation all three model parameters of the GUTS-SIC-SD or GUTS-SIC-IT models were free, whereas for the construction of the threedimensional joint confidence region all model parameters were fixed. The background mortality rate constant was not taken into account here, because it was not considered for the uncertainty calculations as purpose of the forecasts is to inform about the effect of chemical stress on survival.

Probabilistic modelling of survival over time
Information about parameter uncertainty and about stochasticity of survival were combined to characterise uncertainties of model simulations. For a given exposure , a number of parameters from the confidence region were drawn, and for every parameter vector the survival over time was simulated as an explicit conditional binomial process in iterations. The total numbers of simulations hence amounts to . These realisations of the simulated numbers of surviving individuals within the population of initial size can be described statistically, e.g. by the median and minimum and maximum of the distribution of the numbers of survivors, quantifying uncertainty in the model predictions. For this study, we used , for malathion and carbendazim, and for cypermethrin and dimethoate; hence we performed and simulations of survival over time, respectively.

Experiments with G. pulex
The toxicity tests with G. pulex and carbendazim, dimethoate and cypermethrin followed previously established methods and protocols 10,11,14,15 . Adult test organisms were collected in the field (Itziker Ried, Greifensee catchment, near Zurich, Switzerland), acclimatised to laboratory conditions (e.g. 13°C) and fed with pre-conditioned horse-chestnut leaves. Toxicity experiments were conducted in 600 mL pyrex beakers filled with 500 mL of pre-aerated artificial pond water. At the start of the experiments each beaker contained 10 individual G. pulex, which were fed with pre-conditioned horse-chestnut leaf discs ad libitum. Survival was observed daily. Conductivity, pH and dissolved oxygen content were monitored.
We used 14

Concentration-response curves for constant exposures
Simulated concentration-response relationships for constant exposure (Figures 1, S1-S3, panels B,H) were constructed by predicting survival over time for a series of exposure levels (see table S13 for details). Predictions were done both in a deterministic (using the optimal parameter vector) and a probabilistic manner (with 10000 Monte Carlo runs for CYP, CBZ and MAL, and 5000 MC runs for DIM, respectively). Simulated survival at day 4 as appearing from the optimal solutions, and median, minimum and maximum as appearing from the probabilistic simulations together with the exposure concentration generated a series of data points which were interpolated to obtain the concentration-response curves.

Concentration-response curves for pulsed exposures
Simulated concentration-response relationships for pulsed exposure scenarios A and B (Figure 3, Figures S1-S4 panels D, F, J, L) were constructed by predicting survival over time for a series of manipulated pulsed exposure profiles. The concentrations as used in the original pulsed exposure experiments A and B were multiplied by a series of multiplication factors (see table S13 for details). Predictions were done both in a deterministic (using the optimal parameter vector) and a probabilistic manner (with 10000 Monte Carlo runs for cypermethrin, carbendazim and malathion, and 5000 MC runs for dimethoate, respectively). Simulated survival at the end of the pulsed experiments as appearing from the optimal parameter sets, and median, minimum and maximum as appearing from the probabilistic simulations together with the exposure concentrations generated a series of data points that were interpolated to create the concentration-response curves at the last day of the experiment.

Detailed assessment of survival predictions across exposure patterns Background
Exposure concentrations, e.g. in water bodies adjacent to agricultural fields or downstream of sewage outlets and wastewater treatment plants, show highly variable dynamics 16 , which mainly depend on chemical application schemes and entry pathways (e.g. spray drift, surface runoff and erosion, or drainage 17 ) as well as hydrology and weather 18 . Standard toxicity tests are designed to ensure constant exposure to the chemical over a short test duration and nominal, initially measured, or mean measured concentrations are traditionally used to derive effect thresholds. However, field exposure patterns often differ from the laboratory situation and require assessing the consequences of time-variable exposure patterns on the effect. Toxicokinetic-toxicodynamic models 19 such as GUTS provide a tool to overcome such limitations, i.e. different temporal pattern and overall duration of exposure in toxicity tests and in the field. First, the toxicokinetic-toxicodynamic model is calibrated to the toxicity test data where the nominal, initially measured or mean measured concentrations generally serve as model driving variable (input), and the observed effects are the simulated variable (output). The actual exposure pattern should be represented as accurately as possible, preferably by measured concentration time series 20 . Then the toxicokinetic-toxicodynamic model is run with the environmentally relevant exposure profile as model driving variable (input) to predict effects in the field (output) 9,10 .

Calibration
Estimated optimal parameter values, likelihood values and confidence limits are provided in the SI (Table S14) together with values for the model performance indicator Chi 2 (Table S15). Calibration data ( Figures S1-S4, upper panel) showed different concentration-response patterns for the various chemicals, which were all captured by the SD and IT model fits. For cypermethrin, the concentrationresponse curve was relatively steep (Figure S2), and the observed mortality at day 4 was adequately captured by both IT and SD models and optimal parameter values. Parameter uncertainty, as expressed by uncertainty bands of the simulated concentration-response curves, was similar for the SD and the IT model. Negative log-likelihood values were similar for the SD and the IT model, and for the cypermethrin the smallest for all four compounds. AChE inhibitors malathion ( Figure S1) and dimethoate ( Figure S4) showed shallower concentration-response curve compared to that of cypermethrin ( Figure S2). Both models tended to underestimate mortalities for intermediate concentrations, albeit the SD model gave a better fit to dimethoate data. Uncertainty bands were similar in size as those for cypermethrin, and similar between SD and IT. Model error values indicate a better fit for malathion than dimethoate. For both compounds, the SD model fitted data better than the IT model. For the fungicide carbendazim ( Figure S3), survival data displayed an inconsistent response: survival for one of the intermediate tested concentrations was similar to that of the control. Moreover, the concentration-response curve was shallower than for other compounds. These properties of the calibration data set resulted in an increase in parameter uncertainty, as reflected in the large uncertainty bands obtained for the probabilistic simulations of both the IT and SD models. Also the negative log-likelihood values for carbendazim were larger than for the other compounds.

Prediction
For assessing the accuracy of our model predictions of the experimentally observed values, we used Chi 2 values (Table S15).
Differences in terms of accuracy and precision of model predictions of survival for the studied compounds were obvious. Survival curves for cypermethrin substantially differed between the exposure regimes A and B (Figure 2, Figure S2) in the magnitude of effects following the second pulse. In the exposure regime A, the second pulse reduced survival from approximately 90% to 20%, whereas in the exposure regime B, the effect of the second pulse was smaller, decreasing survival from approximately 80% to 40%. The difference between A and B can be explained by the time interval between pulses, which may not have been sufficient in regime A to allow for detoxification or recovery from internal damage. In regime B, the time interval between the pulses seemed sufficiently long to ensure recovery. Indeed, the two pulses appeared to have "independent" effects, given that they induced a similar mortality (approximately 20% mortality). Looking at survival over time, it appears that both model predictions based on optimal parameter sets underestimated experimental mortalities but captured the temporal pattern of the response (Figure 2, Figure S2). The medians of the probabilistic simulations were similar to the predictions based on optimal parameters. At the end of the pulsed-exposure experiments, observed survival was always comprised within the uncertainty ranges for the model (except for the IT model under regime A). Partly this is because the uncertainty bands for the SD model were pretty large. Looking at the modelled concentration-response curves, it appears that predictions of the 10d survival with a given model (IT or SD) were similar for both exposure regimes. This can be explained by the fact that the models did not fully reproduce the differences between the pulse regimes (i.e. recovery between pulses).
For both organophosphate insecticides malathion and dimethoate, the survival data were quite similar for both exposure regimes A and B. For dimethoate, a regular decrease in survival was observed over time, which can be interpreted as no effect of the timing of the second pulse. The survival at day 10 was rather similar i.e. 12% and 5% for exposure regimes A and B, respectively. For malathion however, each exposure pulse induced an immediate drop of survival, which occurred at days 1 and 6 in regime A and at days 1 and 10 in regime B. This decrease in survival was more intense after the second peak. Both exposure regimes resulted in survival rates of approximately 30% at day 10. For both compounds, the SD and the IT model differed in the quality of model predictions. The SD model considerably overestimated mortality for both exposure regimes of malathion, but matched the experimental survival over time well for dimethoate (see also Chisquared values in table S15). The IT model also overestimated mortality for malathion, but to a lesser extent than the SD model. On the other hand, the IT model underestimated mortality for dimethoate. In all cases, the median of probabilistic simulations and the simulations based on optimal parameters were relatively similar. These results are confirmed in the concentrationresponse view after 10 days.
For the fungicide carbendazim, model predictions based on optimal parameters overestimated mortality over time for both the SD and the IT model and both exposure regimes A and B. The difficulty in calibrating the models based on data leading to a shallow concentration-response curve that showed an inconsistent mortality response at lower concentrations levels explains why probabilistic predictions of the concentration-response relationship exhibited large uncertainty bands. The predicted survival over time showed smaller uncertainty bands presumably because the high mortality pushed the maximum predicted survival to zero relatively soon.

Calculations on synthetic data with Matlab
Calculations were made with the BYOM framework for Matlab and the GUTS2 package. Both can be downloaded free of charge from http://www.debtox.info/byom.html. The GUTS2 package was modified to automate the process of loading datasets from file, fitting them, and writing the results files. To illustrate the fits on the synthetic data sets, two examples are shown in Figure S9. The Matlab package allows to profile the likelihood function, and in that way create confidence intervals on the model parameters. The overall results for each case and each parameter are shown in Figure  S6. The Matlab results are very similar to the R results. Differences in the confidence intervals relate to the differences in approach taken in each package (profile likelihood in Matlab, and Bayesian quantiles on the marginal distributions in R).  Figure S7: Two example fits with Matlab (best fitting parameter set). Top panels: fit for one replicate data set in Case 1. Bottom panels: fit for one replicate data set in Case 4 (same cases as in Fig. S9). Right panels show the predicted curves for the scaled internal concentration. Figure S8: Parameter estimates; best value with 95% confidence intervals. Dotted horizontal line indicates the true parameter value that was used to produce the data sets. Vertical broken line separates the data sets with 8 exposure concentrations (left) from those with only 5 (right). Arrows on confidence intervals indicate that the error bar extends much further (truncated to improve readability).

What is a 'slow' dominant rate constant?
In fitting the different GUTS flavours to the malathion data, it is clear that the dominant rate constant ( ) cannot be identified from this data set: the survival pattern is best explained by an (almost) linear build-up of the dose metric. This situation of 'slow kinetics' is quite common and leads to a range of problems, as outlined below. For our synthetic data, starting from a situation of slow kinetics is not useful, as we know a priori that we will never be able to identify the parameters that we used to generate the data. For that reason, we derived the initial parameter set using a slightly different approach. We started by fixing , which provides a situation where we reach 98% of steady state after four days.
If the dominant rate constant ( ) is slow, this means that the dose metric (e.g., the scaled internal concentration) increases almost linearly in time. This implies that cannot be properly identified from the survival data (it will get a confidence interval that extends to zero), and that the other parameters ( and ) can also not be identified (they become heavily correlated to ). What is 'slow' depends on the test duration: long tests can identify lower values than short tests. Fig. 1 shows the relationship between test duration and identifiability of . When 90% of steady state is achieved, can be confidently estimated from the data, and even at only 50% of steady state, this is feasible. This implies that in a 4-day test we cannot expect to be able to identify dominant rate constants that are below 0.1 d -1 .
This situation of 'slow kinetics' is quite common and leads to a range of problems. Firstly, optimisation routines will not converge on a unique, 'best', set of model parameters as all values of below a certain value can produce the same fit (see Figure S8 for the link between this critical value of and test duration). This results in wide confidence intervals of the type . Secondly, the other TD model parameters ( and ) will become strongly correlated to , and hence will also show wide (half-open) confidence intervals. Several pragmatic strategies are available to deal with this situation: 1) Accept it as is. It is still possible to get a sample from the posterior distribution in this situation, which includes the correlations between the parameters. The model predictions using this sample do not necessarily suffer from the large uncertainties in each model parameter because of these strong correlations. 2) Set an arbitrary minimum value for the elimination rate, based on practical consideration (e.g., the duration of the experiment or the life expectancy of the species). Depending on the purpose of the analysis, such arbitrariness may be acceptable. 3) Use an informed prior to constrain the optimisation. For example, using information from related compounds or from body residues. This latter strategy should be used with care, as does not necessarily relate to whole-body internal concentrations.
4) Reformulate the model to new parameters that can be estimated in this situation. This strategy was used in the original DEBtox software 28 . The number of parameters was reduced by one by taking the quotient and the product as new parameters. The disadvantage is, however, that these new parameters are impossible to interpret ecotoxicologically. Figure S9: Relationship between the dominant rate constant and the exposure duration needed to obtain a certain percentage of steady state. Figure S10: Two example fits with R (best fitting parameter set). Top panels: fit for one replicate data set in Case 1. Bottom panels: fit for one replicate data set in Case 4 (same cases as in Fig. S6). Right panels show the predicted curves for the scaled internal concentration.