INTRODUCTION

In 2013 the European Chemical Agency (ECHA) proposed threshold limit values for exposure concentration of hexavalent chromium, CR(VI), at workplaces in Europe which serve as a (non-legally binding).1 The values given are based on an assumed dose–response relationship and an accepted excess lifetime lung cancer mortality risk. Assuming 4 additional lung cancer cases among 1000 workers as lifetime risk (40 years of exposure) the exposure limit value is fixed at the concentration of 1 μg/m3, which would translate into 52 lung cancer cases compared to 48 for a non-exposed population. Though it is stated in the ECHA document “that the excess risk in the low exposure range might be an overestimate”, no acknowledgement of statistical uncertainty is considered. The subsequent paper focuses on this aspect and aims to demonstrate how to quantify the statistical uncertainty of data-based fixations of exposure limit values.

The ECHA guidelines are derived from a larger meta-analysis carried out by Seidler et al.,2 see also Pesch et al.3 These authors reviewed several epidemiological studies investigating the exposure risk relationship for occupational CR(VI) exposure and lung cancer. Two cohort studies are playing a central role in the analyses as they account for smoking status among the workers. The first, the so-called Baltimore cohort study, was first analysed by Gibb et al.4 with follow ups by Park et al.,1, 5 and recently by Haney et al.6 The second cohort study, the so-called Painesville study was presented by Crump et al.7 and Luippold et al.8 We here refer to both studies and analyse the published summarised data with respect to quantifying the uncertainty of derived exposure limit values.

The specification of exposure limit values relies on a model for the dose–response relation. Apparently, the choice of the dose–response relationship may have crucial effects on any data-based specification of exposure limit values. The exposure limit values issued in the ECHA guidelines were derived by linear interpolation of standard mortality rates (SMR) as proposed in Seidler et al.2 by fitting standardised mortality ratios against observed exposure using weighted linear regression.

We give details in the paper. An alternative modelling approach is based on Poisson regression models, which in statistical terms are more natural for modelling count data.9 This approach has been pursued for example by Park et al.1and Becher and Wahrendorf10 to provide an overview of the different modelling approaches to study and analyse dose–response data.

Apparently, the linear model for SMR and the log-linear dose–response Poisson model might give different dose–response relations and consequently lead to different specifications of the exposure limit values. Often, like in occupational CR(VI) exposure, data are obtained from cohort studies where methods for exposure measurement may vary considerably. Only in rare cases, for example for dioxin, measurements are possible in biological samples. In radiation epidemiology, dosimeter data allow a precise estimation of the individual radiation dose. Usually, however, exposure estimates are a combination from air concentration measurements combined with individual workplace descriptions. In such cases it seems therefore nearly impossible to apply statistical model selection routines to choose the best model for the unknown dose–response curve. In other words, there remains a large amount of uncertainty, which is not immediately quantifiable in statistical error bands. This uncertainty is, however, often ignored when fixing exposure limit values. In this paper we aim to explore the effect of different model specifications on the estimation of threshold limit values. We will show that a linear dose–response relation for the SMRs is not implausible (for occupational CR(VI) exposure) and resembles the dose–response model fitted using a Poisson approach.

Besides the chosen model itself the fitted parameters of the model are subject to statistical estimation uncertainty which induce estimation variability on the exposure limit value. Following traditional statistical reasoning this implies that any data-based specification of a limit value requires the indication of confidence bands. Such confidence quantification is not given in the ECHA document but will be explored in our paper. Theoretical derivations are thereby clumsy so that we propose to quantify the confidence level of the exposure limit value by bootstrapping.11 We therefore simulate (synthetical) dose–response data based on the fitted model and refit the model to the simulated data. Based on the simulated and refitted model we derive a (bootstrapped) exposure limit value. This simulation step is carried out multiple times, which provides random variation of the fitted exposure limit values that mimicks the original estimation variability.

Uncertainty of the fixation of threshold limit values is driven by the model choice as well as statistical estimation uncertainty, as motivated above. A third source of uncertainty are the data themselves. As indicated above, the measurement and quantification of the exposure level is difficult and subject to measurement errors. Often, exposure information is only available from grouped data. In such cases, measures are therefore usually given in intervals, where for fitting of the dose–response curve an appropriate exposure mean within the interval is used. That is, the measurement error, which is reflected by grouping the exposures to intervals, is ignored and for simplification only a single exposure level is used for each interval group. To assess the effect of such grouping on the fixation of the threshold limit value we again pursue a simulation based approach. We simulate (synthetical) grouped dose–response data by varying the dose level within each interval. To do so we assume that the exposure follows a log-normal distribution. We make use of this property and let the exposure randomly vary within each interval according to the log-normal assumption. From the random variation of the exposure we simulate dose–response data and refit the model leading to simulated exposure limit values. The induced and increased variation mirrors the variability occurring exclusively through uncertainty of dose level measurements. We emphasise, that uncertainty in exposure is very difficult to quantify and hence our investigation remains at a superficial level here. To explore uncertainty of exposure in a more realistic manner data on individual and temporal level are required. Such data are rarely available so that deeper investigation remains an open research task.

The three determined sources of uncertainty, that is, (i) the model choice, (ii) estimation variability and (iii) measurement error of exposure level lead to the critical conclusion that any threshold limit value given without reflecting these sources of uncertainty is open for discussion. In this report we therefore shed some light on this and explore the impact of the emphasised three sources of uncertainty. For the modelling exercise we will promote Poisson based models. Apparently, the dose–response relation is the central element in each model. There are different methods to derive a dose–response relation. The most flexible one is to remain unspecific and just demand that the dose–response relation is smooth, meaning it is continuous and differentiable. This leads to non-parametric models where the dose–response relation is a smooth function to be fitted from the data. This can be carried out using smoothing techniques, as discussed in Hastie and Tibshirani12and Fahrmeir et al.13 The idea here is that the dose–response relation is considered to be a smooth but otherwise unspecified function which is fitted from the data using for example, a spline based approach. Another approach is to find an appropriate parametric dose–response function using established algorithms such as fractional polynomials.14 We fit such a smooth model to both, the published (and grouped) Baltimore as well as the Painesville cohort data, and explore competing parametric models with respect to goodness-of-fit. It appears that both, a log-linear dose–response Poisson model as well as a linear SMR model perform well and mirror the structure of the dose–response relation. As the second step in our investigation we take statistical estimation variability into account in order to assess the uncertainty of exposure limit values derived from the model. As last step we aim to explore how the uncertainty of exposure measurement induce an excess variability on derived exposure limit values.

The paper is structured as follows. Section 2 discusses modelling issues for dose–response curves. Section 3 explores the statistical uncertainty in the estimation of threshold limit values. Section 4 aims to explore how variation in the exposure influences the results. Finally, Section 5 concludes the paper.

DATA ANALYSIS OF PAINESVILLE AND BALTIMORE STUDY

Painesville Cohort

Statistical model

We first consider the Painesville cohort study and (re-)analyse (the data provided in 7 p. 1157), see also Table 1. We make use of Poisson based regression and compare this with linear models for the SMR. The primary focus is thereby to asses the implications on the resulting threshold limit value, that is, the cut point of the fitted dose–response curve with a specified excess risk. Let Yi denote the observed cases of lung cancer in exposure intervals i=1,..., I. We assume that Yi is Poisson distributed with intensity depending on the mean exposure level denoted by xi. Apparently, the lung cancer rate depends on confounding quantities, such as smoking status, age, gender, and so on. These are incorporated in the data by providing expected numbers of lung cancer cases for a reference population, that is, for a non-exposed cohort. Let ei denote these expected numbers of cases which are also provided by Crump et al.7 and which are based on the mortality in the underlying population. These quantities now allow to specify the Poisson model to

Table 1 Data from Painesville and Baltimore cohort study.1, 2, 7

where s(xi) gives the dose–response relation with s(0)=0 to guarantee identifiability of the effects. Here β0 denotes the cohort effect, that is, the workforce in the data cohort may have an increased (or decreased, also called healthy worker effect) overall risk.

The term log(ei) is denoted as offset in statistical terminology. It guarantees that, setting β0≡0, the expected value of the Poisson variable in Park et al.1 equals exp(log(ei))=ei when taking zero exposure with x=0. To account for potential overdispersion we use a quasi-Poisson fit, that is, we allow for excess variability in the data such that Var(Yi)=ϕ · λi where ϕ is the overdispersion parameter fitted from the data.15

Function s(x) can now be estimated by replacing s(x) by some parametric form. Before doing so we pursue however a more general approach and fit s(x) non-parametrically with splines, where the amount of smoothness and non-linearity is chosen data driven. We use standard software (R, see Wood16) to fit the models leading to the fitted dose–response curve shown in Figure 1 (solid line). The fit was generalised with the mgcv package described in detail in Wood16 using a cubic regression spline basis with six quantile-based knots. The smoothing parameter is chosen by cross-validation. The shaded area gives the confidence interval based on a quasi-Poisson assumption. The fitted function is adjusted in the plot to refer to the dose–response relationship on the population level, that is we shift the fitted function such that s(xi) describes the exposure effect. The fitted smooth dose–response curve has a non-linear convex shape. This shape suggests to replace the non-parametric dose–response model (1) through a log-linear dose–response relation. This means we replace the smooth curve s(x) in Park et al.1 by the log-linear shape

Figure 1
figure 1

Painesville study: fitted dose–response curves with different models. Grey area indicates confidence bands based on the non-parametric model.

where β is the slope parameter. The fitted curve is also shown in Figure 1 as dotted purple line with fitted coefficient =0.78. We label model (2) subsequently as log-linear dose–response model. Apparently the shape of the fitted log-linear dose–response curve has a similar shape to the spline based fit. This is also seen from Table 2 where we show the deviance and the Akaike information criterion (AIC) for model (1), model (2) and for two commonly used alternatives resulting by replacing the dose–response relation s(x) in1 by and β, respectively. Taking the AIC as criterion for model selection we select the log-linear dose–response model. However, there is no clear evidence for superiority of one of the models which is not surprising given the limitation in the data. We will subsequently work with the log-linear dose–response relation.

Table 2 Painesville study: different Poisson models and their goodness-of-fit measures.

Besides the Poisson model it is possible to model the SMR directly. Like in Crump et al.7 we look at the weighted linear model

where the variance of residuals ɛi is proportional to the reciprocal of the person-years in each group given in the data. Apparently on a population level it is reasonable to set β0 equal 1 but for the data we again allow for a cohort effect and omit the constraint for fitting, comparable to Seidler et al.2 We label model (3) subsequently as linear SMR model which can be rewritten to

and therewith resembles the log-linear Poisson model (2). We include the fit of the model on the log-scale in Figure 1 as dashed blue line and provide parameter estimates in Table 3. We see that both parametric fits are comparable and within the confidence region of the general non-parametric model. A comparison based on the AIC is avoid as we compare two different stochastic models and any comparison of relative differences in the AIC values do not make sense. We conclude therefore that the data support the use of both, the log-linear dose–response model in combination with a Poisson model as well as the linear model applied to the SMR. We will come back to the question of model evaluation in section Combined Analysis of Painesville and Baltimore Cohort.

Table 3 Painesville study: parameter estimates in different models.

Exposure limit value

The exposure limit value is the dose under which an acceptable excess risk (ER) occurs. To calculate this dose we need some basic assumptions. Since the above dose–response models provide an estimate on the relative scale under a specific dose metric, we need (i) 9 baseline absolute risk estimates, (ii) the acceptable excess risk value and (iii) a justification for the dose metric. Regarding (i), the baseline lifetime probability to die from the targetdisease (lung cancer) is taken as 48/1000. For (ii) an excess risk of 4/1000 was defined as target. Regarding (iii), the cumulative dose has been used in previous risk assessments. We therefore assume that this metric is biologically relevant yielding the unit of exposure as μg/m3 × years. From (i) and (ii) we obtain the corresponding relative risk (RR) as (48+4)/48=1.083, which is shown on the log-scale in Figure 1. The threshold limit value is therefore the cumulative dose x × years, which yields a RR of 1.083. The cumulative doses as resulting from the models are given in the right column in Table 4, for example 0.22 μg/m3 × years with model (1). This dose accumulates after 40 years of workplace exposure with a concentration of 5.50 μg/m3 which is the threshold concentration for this model. For the other models we obtain different threshold values as given in the left column in Table 4. Apparently, the different models lead to quite different limit values, which mirrors the uncertainty of the model when fitting dose–response data. Nonetheless, the log-linear Poisson model (2) and the linear SMR model (3) yield to similar threshold limit values. Note that no estimation variability is given in Table 4 which will be delivered later in the paper.

Table 4 Painesville study: annual exposure limit values in mug.

Combined analysis of painesville and baltimore cohort

We extend the previous analysis by looking at the Baltimore and the Painesville cohort data jointly. We merge the two data sources and fit the above models to the combined data set. We also include an additive cohort variable as covariate, which however shows no statistically significant effect. We fit the same models as above, i.e. the spline based model (1), the log-linear dose–response Poisson model (2) and the weighted linear SMR model (3). The combined data lead to the fits shown in Figure 2. The fitted coefficients are shown in Table 5 and in Table 6 we list the AIC values for the models using the Poisson approach.

Figure 2
figure 2

Painesville and Baltimore study: fitted dose–response curves with different models. Grey area indicates confidence bands based on the non-parametric model.

Table 5 Painesville and Baltimore study: parameter estimates in different models for combined data.
Table 6 Painesville and Baltimore study: different Poisson models and their goodness-of-fit measures.

A combination of the data have been analysed before in Seidler et al.2 who conducted a meta-analysis of studies reporting on occupational CR(VI) exposure. They included the Painesville and Baltimore cohort studies data using a linear SMR model. They proposed a combined estimation by taking the simple average of the fitted coefficients βx for the Painesville data (=0.68, see Crump et al.7) and the Baltimore data (=2.82, see also Park et al.1) leading to . We include this value in Figure 2. The value 1.75 has had an impact since it was incorporated in the ECHA document as underlying dose–response effect. We immediately see form Figure 2 that the setting does not look appropriate. Note that the value 1.75 is a merge out of two separate estimates but not an estimate of merged data. It is therefore not surprising that the dose–response curve for lies away from the three other curves which are estimates using the merged data to fit the models while the log-linear dose–response model (2) as well as the weighted SMR model (3) again show a comparable fit. This is even better seen from the data itself. We plot the residuals of the model by taking Pearson residuals for the Poisson model. These are shown in Figure 3. While the smooth model, the log-linear dose–response Poisson model and the linear SMR mirror an unstructured behaviour of the residuals the dose–response relation with to performs not appropriately. The corresponding limit values, i.e. the intersection of the model with the excess risk levels referring to 4 additional lung cancer cases on 1000 individuals, are given in Table 7. Again no estimation variability is considered as this will be discussed in the next section.

Figure 3
figure 3

Painesville and Baltimore study: Pearson residual plot for combined data.

Table 7 Painesville and Baltimore study: annual exposure limit values in μg/m3 yielding an excess risk of 4 additional lung cancer cases among 1000 individuals assuming 40 years of exposure for combined data.

Finally, note that the ECHA released 1 μg/m3 as annual limit value. We observe that all fitted models lead to an exposure limit estimate, which lies above the ECHA value. To draw reliable inference we need to take estimation variability into account which we do in the next section.

We conclude that visualising the data with non-parametric, smooth models allows to get more insight into the shape of the dose–response relationship. This shows that the proposal of Seidler et al.2 with a slope parameter set to 1.75 as simple average of the separately fitted slopes in the two cohort studies has some weakness and does not appropriately mirror the dose–response relation in the combined data. Moreover we recognise that the log-linear Poisson model and the linear SMR model yield comparable results. We have therewith discussed one of the three questions proposed in the introduction. We have shown how different models can fit the data and investigated the amount of information in the data with respect to the dose–response relationship. It remains to investigate and quantify the amount of estimation variability for the fitted threshold limit value, which is carried out in the next section.

ESTIMATION VARIABILITY OF THRESHOLD LIMIT VALUE

Painesville Cohort

Our intention is now to assess the estimation variability of the exposure limit value. In principle this could be done by calculating the exposure limit not only for the fitted curves shown in Figure 1 but taking the estimation variability of the estimates into account using asymptotic normality arguments. We propose a different, though comparable, idea here by assessing the variability following the bootstrap principle. The idea of bootstrapping11 is to refit the model from simulated data and use the random variation of the refitted parameter estimates to derive bootstrap based confidence intervals. We pick up this idea using a so-called parametric bootstrap, that is we simulate from the fitted models. Conceptually this would be possible for all parametric models, but simulating from the weighted linear SMR model is problematic, since a simple normal error distribution is assumed and monotonicity is not guaranteed at all. We therefore only use the log-linear dose–response Poisson model to simulate from. We will first constrain the investigation to the Painesville study and simulate from the fitted log-linear Poisson model (2). To be specific, we take both, the categorised exposure levels as well as the expected number of cases published in Crump et al.7 We insert these values in model (2) and simulate (bootstrap) observed cases using the estimated parameter values. To account for overdispersion we do not draw from a Poisson model directly but draw the observed cases from a negative- binomial model, such that the mean structure follows the fitted model (2) but the variance is inflated by the fitted overdispersion parameter. The simulated cohort data are then refitted with the log-linear Poisson model (2) as well as the linear SMR model (3). Each simulation is repeated 1000 times. For the sake of simplicity we leave the smooth model aside for the remainder of this paper.

Figure 4 shows exemplary the simulated relative risk curves refitted with the log-linear Poisson model (upper plot), as well with the linear SMR model approach. The black solid lines refer to the estimated dose–response model fitted in the previous chapter. With the main interest in the absolute risk, we include the absolute excess risk of 4 additional lung cancer cases per 1000 individuals as black horizontal line. Obviously, both models suffer from rather strong estimation variation leading to quite different resulting exposure limit values. To have a closer look on that, we look at the exposure for each bootstrap simulation, that is we calculate the cut point of the dose–response curve with the excess risk of 4 additional cases among 1000 workers. This value is again given as average annual exposure concentration assuming 40 years of exposure, like above.

Figure 4
figure 4

Painesville study: simulated relative risk curves on log-scale (grey) and risk curve of original data (black).

The bootstrap principle allows to quantify the variation seen in Figure 4 as estimation variability which allows to derive confidence intervals for the exposure limit value. This leads for the log-linear Poisson and the linear SMR to a 95% confidence intervals given in Table 8. We recognise that there is a substantial amount of uncertainty which should be taken into account. In fact we can argue that with a confidence of 97.5% the dose exposure limit such that the absolute excess risk is below 4 deaths among 1000 workers is greater or equal 1.42 μg/m3 following the log-linear Poisson model. This is based upon the Painesville cohort data only.

Table 8 Painesville study: 95% confidence intervals for annual exposure limit values (in μg/m3, assuming 40 years of exposure).

Combined Analysis of Painesville and Baltimore Cohort

As in the previous section we extend the analysis to the combined data set. Like before, we simulate data from the fitted log-linear Poisson model and refit these with the log-linear model (2) and the linear SMR model (3). The fitted dose–response curve is used to calculate the exposure limit value which leads to the bootstrapped values with confidence intervals provided in Table 9. We conclude that the left hand side of the confidence interval is slightly increased while the right hand side is decreased in the combined data. Again, a large amount of estimation uncertainty is visible.

Table 9 Painesville and Baltimore study: 95% confidence intervals for annual exposure limit values (in μg/m3 assuming 40 years of exposure) for combined data.

VARIATION OF EXPOSURE LEVELS

The final step in our investigation is to look at the exposure measurements itself. We therefore vary the exposure level within each interval in the simulation starting with the Painesville study. To do so we first need to investigate the exposure distribution itself. Figure 5 (left plot) shows the density of exposure in the exposure intervals given in Crump et al.7 calculated by dividing the person-years measurements in the data by the width of the exposure intervals. A skew distribution is apparent which strikingly mirrors a log-normal distribution included in the figure as dashed line. We also calculate the mean exposure based on the log-normal model for each of the exposure intervals in the data and compare this with the mean exposure given in the data. This is shown in the right hand side plot in Figure 5. A convincing resemblance is obvious which clearly speaks for log-normally distributed exposure. This distributional assumption will be used subsequently.

Figure 5
figure 5

Painesville study: density of exposure (black line), that is person-years in exposure group over the width of exposure intervals, a log-normal density (dashed green line), Right hand side figure gives the mean exposure plotted against a log-normal mean exposure.

Instead of taking the mean exposure for each group we simulate a group specific exposure based on the log-normal exposure shown in Figure 5. That is for each interval we simulate a censored log-normal exposure level. With this exposure we simulate data using the overdispersed log Poisson model, as in the previous section, and refit the model. Based on the refitted model we derive the threshold limit value. We restrict our analyses on the log-linear Poisson model only and repeat each simulation 1000 times. We only look at the lower (left) value of the confidence interval. We observe excess variation in the lower threshold limit value when exposure measurement error is taken into account, which is mirrored in the resulting confidence intervals shown in Table 10. We observe that a slight excess variation occurs at the lower end of the confidence interval and the ECHA value 1 μg/m3 is not included in the confidence interval. For completeness we also show in Figure 6 the variation of the estimated slope parameter with and without consideration of exposure measurement error. Apparently an increased estimation variability is visible. We also investigated the effect of measurement error for the combined data. This occurs however to be numerically unstable, so that we do not report the result here. Instability might occur due to the fact that the exposure distribution differs in the two studies which is difficult to be captured in the simulation design.

Table 10 Left side of 95% confidence intervals for threshold limit values (in μg/m3 assuming 40 years of exposure) with and without variation of exposure level.
Figure 6
figure 6

Painesville study: variation of the estimates based on the parametric model (2).

CONCLUSION

The European Chemical Agency (ECHA) issued the threshold limit value of 1 μg/m3 as exposure leading to an increased lung cancer risk of 4/1000. The ECHA document states that “the excess risk in the low exposure range might be an overestimate. Supported by statistical means in this paper we can state that with a confidence of 97.5% and based on the Baltimore and Painesville study the exposure level of 1 μg/m3 leads to less than 4 additional lung cancer cases amongst 1000 workers. In this respect we can support the ECHA document, but base our reasoning on statistical grounds, and hence are more rigorous than the reasoning applied in the ECHA document itself.2 We emphasise, however, that the data are sparse and additional uncertainty remains which cannot be captured with statistical tools. Finally, we emphasise that our (re-)analysis is based on published grouped data only. A more refined modelling would be possible with the original individual data.