Introduction

Moderate and high doses of ionising radiation are well established causes of most types of cancer1,2. There is emerging evidence, particularly for leukaemia and thyroid cancer, of risk at low dose (< 0.1 Gy) radiation3,4,5,6 (roughly 50 times the dose from background radiation in a year). For most other cancer endpoints it is necessary to assess risks via extrapolation from groups exposed at moderate and high levels of dose7,8,9,10,11,12,13. Such extrapolations, which are dependent on knowing the true dose–response relationship, as inferred from some reference moderate/high-dose data (very often the Japanese atomic bomb survivors), are subject to some uncertainty, not least that induced by systematic and random dosimetric errors that may be present in that moderate/high-dose data1,14. Extensive biostatistical research over the last 30 years have done much to develop understanding of this issue15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30 and in particular the role played by various types of dose measurement error31. Among the simplest methods of correction for dose error, regression calibration, which entails substitution of the conditional expectation of the true dose given the observed dose, is straightforward to apply, and is often used to correct for classical error31. However, it only takes account of the 1st order dose error terms in the Taylor expansion of the likelihood, and does not take account of correlations between dose errors32. It is also prone to bias when dose errors are large, or errors are differential, or the dose response has substantial curvature33. Other methods of correction for dose error, in particular Monte Carlo maximum likelihood (MCML)25,26,30,34, and fully Bayesian methods21,22,23,29, both of which take full account of the uncertainty in doses (in particular 2nd and higher order dose error terms in the Taylor expansion of the likelihood), can work better in these circumstances.

A variant of the commonly used method of regression calibration31 has been very recently proposed which is particularly suited to studies with substantial shared error with dose response non-linearity32. This so-called extended regression calibration (ERC) method can be used in settings where there is a mixture of Berkson and classical error32. In fits to synthetic datasets in which there is substantial upward curvature in the true dose response, and varying (and sometimes substantial) amounts of classical and Berkson error, the ERC method generally outperformed both standard regression calibration, MCML and unadjusted regression, particularly with respect to the coverage probabilities of the quadratic coefficient, and for larger magnitudes of the Berkson error, whether this is shared or unshared32.

A Bayesian model averaging (BMA) method has been also recently proposed, the so-called 2-dimensional Monte Carlo with Bayesian model averaging (2DMC with BMA) method28, which has been used in fits to radiation thyroid nodule data35. The so-called frequentist model averaging (FMA) model has also been recently proposed, although only fitted to simulated data36. In the present paper we shall assess the performance of a variant implementation of the 2DMC with BMA method, which is more closely aligned with standard implementations of BMA37, and FMA against ERC, making also comparisons with other methods of correction for dose error using simulated data. The simulated data used is exactly as in the previous report32.

Methods

Synthetic data used for assessing corrections for dose error

The methods and data exactly parallel those of the previous paper32, using publicly available Life Span Study (LSS) leukaemia data38 to guide construction of a number of artificial datasets. Specifically we used the person year distribution by bone marrow dose groups 0–0.07, 0.08–0.19, 0.20–0.99, 1.00–2.49, ≥ 2.50 Gy. The central estimates of dose we assumed are close to the person year weighted means of these groups, and as given in Supplement A Table A1, although for the uppermost dose group we assigned a central estimate of 2 Gy. A composite Berkson-classical error model was used in which the true dose \(D_{true,i,j}\) and the surrogate dose \(D_{surr,i,j}\) to individual \(i\) (in dose group \(k_{i}\)) in simulation \(j\) are given by:

$$D_{true,i,j} = D_{{cent,k_{i} }} \exp \left[ { - 0.5(\sigma_{share,Berkson}^{2} + \sigma_{unshare,Berkson}^{2} )} \right]\exp \left[ {\sigma_{share,Berkson} \varepsilon_{j} + \sigma_{unshare,Berkson} \delta_{i,j} } \right]$$
(1)
$$D_{surr,i,j} = D_{{cent,k_{i} }} \exp \left[ { - 0.5(\sigma_{share,Class}^{2} + \sigma_{unshare,Class}^{2} )} \right]\exp \left[ {\sigma_{share,Class} \mu_{j} + \sigma_{unshare,Class} \kappa_{i,j} } \right]$$
(2)

The variables \(\varepsilon_{j} ,\delta_{i,j} ,\mu_{j} ,\kappa_{i,j}\) are independent identically distributed \(N\left( {0,1} \right)\) random variables. It should be noted that the errors \(\varepsilon_{j} ,\mu_{j}\) are common to all individuals and are uniform within each sub-simulation nested within each meta-simulation, and hence give rise to a shared error structure. By comparison the \(\delta_{i,j} ,\kappa_{i,j}\) are chosen so as to be independent for all other individuals within each sub-simulation and within a meta-simulation, as well as independent of those in other sub-simulations/meta-simulations, and hence give rise to an unshared error structure. The factors \(D_{{cent,k_{i} }} \) are the central estimates of dose, as given in Supplement A Table A1. The factors \(\exp \left[ { - 0.5(\sigma_{share,Berkson}^{2} + \sigma_{unshare,Berkson}^{2} )} \right]\) and \(\exp \left[ { - 0.5(\sigma_{share,Class}^{2} + \sigma_{unshare,Class}^{2} )} \right]\) ensure that the distributions given by (1) and (2) have theoretical mean that coincides with the central estimates \(D_{{cent,k_{i} }}\). The model has the feature that when the Berkson error geometric standard deviations (GSDs) are set to 0 (\(\sigma_{share,Berkson} = \sigma_{unshare,Berkson} = 0\)) the model reduces to one with classical error (a mixture of shared and unshared); likewise when the classical error GSDs are set to 0 (\(\sigma_{share,Class} = \sigma_{unshare,Class} = 0\)) the model reduces to one with pure Berkson error (a mixture of shared and unshared).

Table 1 Coverage probability of profile likelihood (for extended regression calibration) or Bayesian posterior confidence intervals (for quasi-2DMC with BMA) or frequentist model averaging (FMA) for fits of linear-quadratic model.

We generated a number of different versions of the dose data, with \(\sigma_{share,Berkson}\), \(\sigma_{unshare,Berkson}\), \(\sigma_{share,Class}\), \(\sigma_{unshare,Class}\) taking values of 0.2 (20%) or 0.5 (50%). This individual dose data was then used to simulate the distribution of \(N = 250\) cancers for each of \(m = 1000\) sub-simulated datasets, using a model in which the assumed probability of being a case for individual \(i\) in simulation \(j\) (\(j = 1,...,m = 1000\)) is given by:

$$\exp [\kappa_{j} ][1 + \alpha D_{true,i,j} + \beta D_{true,i,j}^{2} ]$$
(3)

the scaling constant \(\kappa_{j}\) being chosen for each simulation (but not for the Bayesian model fits) to make these sum to 1. As previously, we assumed a linear-quadratic model, with coefficients \(\alpha = 0.25/{\text{Gy}},\beta = 2/{\text{Gy}}^{2}\), also a linear model with \(\alpha = 3/{\text{Gy}},\beta = 0/{\text{Gy}}^{2}\), both models derived from fitting a stratified linear-quadratic or linear model to the LSS leukaemia data38.

A total of n = 500 meta-simulations, consisting of ensembles of dose + cancer simulations were used. Within each meta-simulation a total of \(m = 1000\) sub-simulations were taken of each type of dose (true dose \(= D_{true,i,j}\), surrogate dose \(= D_{surr,i,j}\)). The true dose (\(= D_{true,i,j}\)) averaged over the 1000 sub-simulations within each meta-simulation will be used to generate the distribution of cancers by dose group within the meta-simulation, using the summed relative risks \(RR_{ij} = 1 + \alpha D_{true,i,j} + \beta D_{true,i,j}^{2}\). Within each of the n = 500 meta-simulations, this cancer distribution will be constant over the \(m = 1000\) sub-simulations that comprise it, but this distribution will of course change slightly between each of the \(n = 500\) meta-simulations. The \(n = 500\) meta-simulated dose + cancer ensembles were used to fit models and evaluate fitted model means and coverage probability. Having derived synthetic individual level data, for the purposes of model fitting, for all models except MCML and 2DMC with BMA, the data were then collapsed (summing cases, averaging doses) into the 5 dose groups used previously32. Poisson linear relative risk generalised linear models39 were fitted to this grouped data, with rates given by expression (3), using as offsets the previously-specified number per group32. Models were fitted using six separate methods (unadjusted, regression calibration, ERC, MCML, 2DMC and BMA, FMA). For ERC and the other methods previously used the methods of deriving doses and model fitting were as in our earlier paper32. It should be noted that for all except the unadjusted regressions the true (rather than surrogate) dose is used, so that classical error does not figure. Only for unadjusted regression is it the surrogate dose that is used.

We used a BMA method somewhat analogous to the 2DMC with BMA method of Kwon et al.28, using the full set of mean true doses per group as previously generated for MCML, the mean doses per group for each simulation being given by group means of the samples generated by expression (3), averaged over the \(m = 1000\) dose samples. The model was fitted using Bayesian Markov Chain Monte Carlo (MCMC) methods. Associated with the dose vector is a vector of probabilities \(p_{j} ,j = 1,...,1000\) which is generated using variables \(\lambda_{j} ,j = 1,...,999\), so that:

$$p_{j} = \frac{{\exp [\lambda_{j} ]}}{{1 + \sum\nolimits_{k = 1}^{999} {\exp [\lambda_{k} ]} }},j = 1,...,999,\;\;p_{1000} = \frac{1}{{1 + \sum\nolimits_{k = 1}^{999} {\exp [\lambda_{k} ]} }}$$
(4)

This is therefore quite close to the method proposed by Hoeting et al.37, and somewhat distinct from the formulation of 2DMC with BMA proposed by Kwon et al.28, as we discuss at greater length below. For this reason we shall describe our own method as quasi-2DMC with BMA. The standard formulation of BMA, as given by Hoeting et al.37, and which we employ, is based on the posterior probability:

$$\begin{aligned} & p(\alpha ,\beta ,\kappa |Y,X,(D_{true,k} )_{k = 1}^{m} ,(p_{j} )_{k = 1}^{m} ) \hfill \\ & = \sum\limits_{k = 1}^{m} {p(\alpha ,\beta ,\kappa |\gamma = k,Y,X,D_{true,\gamma } )p(\gamma = k|Y,X,(D_{true,k} )_{k = 1}^{m} ,(p_{j} )_{k = 1}^{m} )} \hfill \\ & = \sum\limits_{k = 1}^{m} {p(\alpha ,\beta ,\kappa |Y,X,D_{true,k} )p_{k} } \hfill \\ \end{aligned}$$
(5)

where \((p_{k} )_{k = 1}^{m}\) are given by Eq. (4). We fitted via successive application of Metropolis–Hastings samplers to (a) sample conditionally the model dose response parameters \(\alpha ,\beta ,\kappa\) conditionally on \((p_{k} )_{k = 1}^{m}\) and (b) sample conditionally the dose vector probability parameters \((\lambda_{j} )_{k = 1}^{m - 1}\) which determine \((p_{k} )_{k = 1}^{m}\) conditionally on \(\alpha ,\beta ,\kappa\). Kwon et al.28 did this slightly differently, sampling (a) the dose response parameters \(\alpha ,\beta ,\kappa\) conditional on \(k\), (b) sampling of \(k\) (via a multinomial distribution) conditional on \(\alpha ,\beta ,\kappa\) and \((p_{k} )_{k = 1}^{m}\) and (c) sampling of the parameters \((p_{k} )_{k = 1}^{m}\) (via a Dirichlet distribution) conditional on \(\alpha ,\beta ,\kappa\) and \(k\). Kwon et al.28 resorted to use of an approximate Monte Carlo sampler, the so-called stochastic approximation Monte Carlo (SAMC) method of Liang et al.40. Unfortunately Kwon et al. do not provide enough information to infer the precise form of SAMC that was used by them28, and for that reason we have adopted this alternative.

All main model parameters (\(\alpha ,\beta ,\kappa ,\lambda_{k}\)) had normal priors, with mean 0 and standard deviation (SD) 1000. The Metropolis–Hastings algorithm was used to generate samples from the posterior distribution. Random normal proposal distributions were assumed for all variables, with SD of 0.2 for \(\kappa\) and 1 for \(\alpha ,\beta\), and SD 2 for all \(\lambda_{k}\). The \(\lambda_{k}\) were proposed in blocks of 10. Two separate chains were used, in order to compute the Brooks-Gelman-Rubin (BGR) convergence statistic41,42. The first 1000 simulations were discarded, and a further 1000 simulations taken for sampling. The proposal SDs and number of burn-in sample were chosen to give mean BGR statistics (over the 500 simulated datasets) that were in all cases less than 1.03 and acceptance probabilities of about 30% for the main model parameters (\(\alpha ,\beta ,\kappa\)), suggesting good mixture and likelihood of chain convergence. For the ERC model confidence intervals (CI) were (as previously) derived using the profile likelihood39 and for the quasi-2DMC with BMA model Bayesian uncertainty intervals were derived.

The FMA model of Kwon et al.36 was also fitted. For each of \(j = 1,...,m = 1000\) dose vectors model (3) was fitted using Poisson regression (via maximum likelihood) and the Akaike Information Criterion (AIC)43, \(AIC_{j}\), computed, as well as the central estimate and profile likelihood 95% CI for each coefficient, \(\alpha_{j,MLE} (\alpha_{j,0.025} ,\alpha_{j,0.975} )\) and \(\beta_{j,MLE} (\beta_{j,0.025} ,\beta_{j,0.975} )\); from these were derived the estimated standard deviation, via \(SD(\alpha )_{j} = \min (\alpha_{j,MLE} - \alpha_{j,0.025} ,\alpha_{j,0.975} - \alpha_{j,MLE} )/1.96\) and \(SD(\beta )_{j} = \min (\beta_{j,MLE} - \beta_{j,0.025} ,\beta_{j,0.975} - \beta_{j,MLE} )/1.96\), in other words the minimum of the distance from each CI to the mean, divided by the asymptotic 97.5% centile (1.96) of the normal distribution, to recover the SD. For each fit \(k = 100\) simulations were taken from the respective normal distributions \(N(\alpha_{j,MLE} ,SD(\alpha )_{j}^{2} )\) and \(N(\beta_{j,MLE} ,SD(\beta )_{j}^{2} )\), and each such sample given an AIC-derived weight via:

$$\exp [ - AIC_{j} /2]/\sum\limits_{k = 1}^{1000} {\exp [ - AIC_{k} /2]}$$
(6)

[It should be noted that Eq. (6) differs from the formula mistakenly given by Kwon et al.36 in their paper, \(\exp [AIC_{j} /2]/\sum\limits_{k = 1}^{1000} {\exp [AIC_{k} /2]}\)]. The central estimate for each coefficient was taken as the AIC-derived-weighted sum of these samples, and 95% CI estimated from the 2.5% and 97.5% centiles of the AIC-derived-weighted samples. A variety of \(k\) in the range 100–1000 were used, yielding very similar results. We also tried using asymmetric confidence intervals, employing \(SD(\alpha )_{j,lower} = (\alpha_{j,MLE} - \alpha_{j,0.025} )/1.96\) and \(SD(\alpha )_{j,upper} = (\alpha_{j,0.975} - \alpha_{j,MLE} )/1.96\) to separately generate the samples above and below the central estimate \(\alpha_{j,MLE}\), and likewise for the \(\beta\) coefficient. However, this generally yielded badly biased estimates of both coefficients, because of occasional samples in which one or other CI was very large.

The Fortran 95-2003 program used to generate these datasets and perform Poisson and Bayesian MCMC model fitting, and the relevant steering files employed to control this program are given in online Supplement B. Using the mean coefficients for each model and error scenario over the 500 simulated datasets, \((\alpha_{mean} ,\beta_{mean} )\), the percentage mean bias in predicted excess relative risk (ERR) is calculated, via:

$$100\left[ {(\alpha_{mean} D_{pred} + \beta_{mean} D_{pred}^{2} )/(\alpha D_{pred} + \beta D_{pred}^{2} ) - 1} \right]$$
(7)

This was evaluated for two values of predicted dose, \(D_{pred} = 0.1{\text{ Gy}}\) and \(D_{pred} = 1{\text{ Gy}}\).

Results

As shown in Table 1, using the linear-quadratic model the coverage probabilities of the ERC method for the linear coefficient \(\alpha\) are near the desired 95% level, irrespective of the magnitudes of assumed Berkson error, whether shared or unshared. However, the ERC method yields coverage probabilities that are somewhat too low when shared and unshared Berkson errors are both large (with logarithmic SD = 50%), although otherwise it performs well (Table 1). It should be noted that classical error will have no effect on any of these models, as its only effect is on the unadjusted regression model (via sampling of the surrogate dose), and so the effect is not shown. By contrast the coverage probabilities of both the linear coefficient \(\alpha\) and the quadratic coefficient \(\beta\) for the quasi-2DMC with BMA method are generally much too low, and when shared Berkson error is large (50%) the coverage probabilities do not exceed 5% (Table 1). The coverage for the FMA method is generally better, and for the coefficient \(\alpha\) does not depart too markedly from the desired 95%; however, the coverage of the coefficient \(\beta\) tends to be too high, for any non-zero level of Berkson error, whether shared or unshared (Table 1).

Table 2 shows that for the linear model the coverage percentage is generally too high for ERC, MCML and FMA, and slightly too low for quasi-2DMC with BMA, but more or less correct for regression calibration.

Table 2 Coverage probability of profile likelihood (for extended regression calibration) or Bayesian posterior confidence intervals (for quasi-2DMC with BMA) or frequentist model averaging (FMA) for fits of linear model.

Table 3 shows the coefficient mean values, averaged over all 500 simulations, assuming a linear-quadratic model. A notable feature is that for larger values of Berkson error, the linear coefficient \(\alpha\) for quasi-2DMC with BMA is substantially overestimated, and the quadratic dose coefficient \(\beta\) substantially underestimated, both by factors of about 10. For ERC the estimates of the quadratic coefficient \(\beta\) are upwardly biased, but not by such large amounts. For FMA both coefficients have pronounced upward bias, particularly for large shared Berkson error (50%) (Table 3).

Table 3 Mean over n = 500 dose + cancer ensembles of regression coefficients in fits of linear-quadratic model, all with classical error (shared and unshared) of 20%.

Table 4 shows that the bias in ERR for ERC evaluated either at 0.1 Gy or 1 Gy does not exceed 30% in absolute value. Regression calibration performs somewhat worse, with bias ~ 60% when shared and unshared Berkson error are large (50%) for predictions at 1 Gy, although otherwise with bias under 30%. For all but the smallest shared and unshared Berkson errors (both 0% or 20%) MCML has bias ~ 35–70%, with bias particularly severe at 1 Gy. Quasi-2DMC with BMA performs somewhat worse, with bias in excess of ~ 30% and sometimes in excess of 100% when Berkson errors are large, and bias most pronounced at low dose. Unadjusted regression yields almost as severe bias, which exceeds 50% in many cases for predictions at 1 Gy, and when shared or unshared classical errors are large (50%) often exceeds 100% (Table 4). Bias for FMA is generally moderate to severe, and particularly bad (> 100%) when shared Berkson error is large (50%) (Table 4).

Table 4 Percentage bias in excess relative risk (ERR) using linear-quadratic model predicted at 0.1 Gy or 1 Gy with various sorts of dose error correction, using mean regression coefficients over n = 500 dose + cancer ensembles (as in Table 2).

Table 5 shows the coefficient mean values, averaged over all 500 simulations, for the linear model, and percentage bias. For most methods bias is modest, generally under ~ 10%. The only significant exception is FMA where the bias approaches 30% when shared Berkson errors are large (50%) (Table 5).

Table 5 Mean and percentage bias over n = 500 dose + cancer ensembles of regression coefficients in fits of linear model, all with classical error (shared and unshared) of 20%.

It should be noted that the behaviour of all regression methods in both scenarios (linear, linear-quadratic) is very similar when Berkson errors are 50%, irrespective of whether unshared Berkson errors are 0, 20% or 50% (Tables 1, 2, 3, 4, 5).

Discussion

We have demonstrated that the quasi-2DMC with BMA method performs well when a linear model is assumed and fitted, albeit with coverage for the linear coefficient that is slightly too low (~ 90%) (Table 2). However, it performs very poorly when a linear-quadratic model is assumed and fitted, with coverage probabilities both for the linear and quadratic dose coefficients that are under 5% when the magnitude of shared Berkson errors is large (50%) (Table 1). For the linear model the bias is generally modest, under 10% (Table 5). However, if a linear-quadratic model is assumed there is substantial bias (by a factor of 10) in estimates of both the linear and quadratic coefficients, with the linear coefficient overestimated and the quadratic coefficient underestimated (Table 3). FMA performs as well as the other method when a linear model is assumed (Table 2), and generally much better assuming a linear-quadratic model, although the coverage probability for the quadratic coefficient is uniformly too high (Table 1). However both linear and quadratic coefficients have pronounced upward bias, particularly when Berkson error is large (50%) (Table 3). By comparison the ERC method yields coverage probabilities that are too high when a linear model is fitted (Table 2), and too low when a linear-quadratic model is fitted with shared and unshared Berkson errors both large (50%) (Table 1), although otherwise it performs well, and coverage is generally better than for quasi-2DMC with BMA or FMA, particularly for the linear-quadratic model. As shown previously it generally outperforms all other methods (regression calibration, MCML, unadjusted regression) when a linear-quadratic model is assumed32. The upward bias in estimates of the \(\alpha\) coefficient and the downward bias in the estimates of the \(\beta\) coefficient, at least for larger magnitudes of error (Table 3) largely explains the poor coverage of quasi-2DMC with BMA in these cases (Table 1). The bias of the predicted ERR using the linear-quadratic model at a variety of doses is generally smallest for ERC, and largest (apart from unadjusted regression) for quasi-2DMC with BMA and for FMA, with standard regression calibration and MCML exhibiting bias in predicted ERR generally somewhat intermediate between the other two methods (Table 4). The fact that bias in ERR tends to be larger for a dose of 1 Gy (Tables 4, 5) relates to the fact that at higher assumed dose the contribution of the quadratic coefficient is relatively more important. However, this is not always the case, and for example for the quasi-2DMC with BMA method the inflated linear coefficient and much reduced quadratic coefficient (Table 3) generally result in bias being more severe at the lower dose (Table 4).

As noted above, the form of the quasi-2DMC with BMA model that we fit differs slightly from that employed by Kwon et al.28. Our method and that of Kwon et al.28 should be approximately equivalent, although the latter is considerably more computationally challenging, and may be the reason why Kwon et al.28 resorted to use of the SAMC method40, in order to get their method to work. As noted in the Methods, unfortunately Kwon et al. do not provide enough information to infer the precise form of SAMC that was used by them28, and it was for that reason we adopted this alternative, which is in any case possibly more computationally efficient. It is possible that the SAMC implementation used by Kwon et al.28 may behave differently from the more standard implementation of BMA given here. Kwon et al.28 report results of a simulation study that tested the 2DMC with BMA method against what they term “conventional regression”, which may have been regression calibration. They did not assess performance against MCML, and in all cases only a linear model was tested28 unlike the simulations given here and in a previous publication32, which used a linear-quadratic model, and in the present paper also a linear model. Kwon et al.28 report generally better performance of 2DMC with BMA against the regression calibration alternative. Their findings using the linear model are consistent with ours (Tables 2, 5). Kwon et al.36 tested FMA against the 2DMC method and against the so-called corrected information matrix (CIM) method44 and observed similar performance, in particular adequate coverage, of all three methods, although narrower CI were produced by FMA compared with CIM under a number of scenarios. However, in all cases only a linear model was tested36. Set against this, Stram et al. reported results of a simulation study45 which suggested that the 2DMC with BMA method will produce substantially upwardly biased estimates of risk, also that the coverage may be poor, somewhat confirming our own findings, at least using the linear-quadratic model; although we do not always find upward bias, and generally little bias when a linear model is assumed, the coverage of both regression coefficients for quasi-2DMC with BMA is poor for larger values of shared Berkson error when a linear-quadratic model is assumed (Tables 1, 3, 4). One possible reason for the bias that may occur in quasi-2DMC with BMA is that by chance a dose vector is chosen which results in good fit by the model (linear or linear-quadratic) but which nevertheless is substantially biased, resulting in substantial bias in the linear and/or quadratic coefficients. While in most circumstances (as in the present simulations) there is no information regarding the dose vector, it might occur that one has an informative prior for the \(p_{j}\), which would be expected to reduce the likelihood of such bias.

Dose error in radiation studies is unavoidable, even in experimental settings. It is particularly common in epidemiological studies, in particular those of occupationally exposed groups, where shared errors, resulting from group assignments of dose, dosimetry standardizations or possible variability resulting from application of, for example, biokinetic or environmental or biokinetic models result in certain shared unknown (and variable) parameters between individuals or groups. There have been extensive assessments of uncertainties in dose in these settings46,47,48. Methods for taking account of such uncertainties cannot always correct for them, but they at least enable error adjustment (e.g. to CI) to be made31. As previously discussed32 the defects in the standard type of regression calibration are well known, in particular that the method can break down when dose error is substantial31, as it is in many of our scenarios. It also fails to take account of shared errors. Perhaps because of this a number of methods have been recently developed that take shared error into account, in particular the 2DMC and BMA method28 and the CIM method44. The CIM method only applies to situations where there is pure Berkson error. Both 2DMC with BMA and CIM have been applied in number of settings, the former to analysis of thyroid nodules in nuclear weapons test exposed individuals35, and the latter to assessment of lung cancer risk in Russian Mayak nuclear workers49 and cataract risk in the US Radiologic Technologists50. In principle the simulation extrapolation (SIMEX) method51 can be applied in situations where there is shared (possibly combined with unshared) classical error, where the magnitudes of shared and unshared error are known. However, this was not part of the original formulation of SIMEX51. Possibly because of the restrictions on error structure and its extreme computational demands SIMEX has only rarely been used in radiation settings27,52.

Conclusions

Using methods and data that exactly parallel those of the previous paper32, differing only in that linear as well as linear-quadratic models were assumed, we have demonstrated that the quasi-2DMC with BMA method performs well when a linear model is assumed (Table 2), but very poorly when a linear-quadratic dose response is assumed, with coverage probabilities both for the linear and quadratic dose coefficients that are under 5% when the magnitude of shared Berkson error is moderate to large (Table 1). The bias with a linear model for this method is generally modest (under 10%) (Table 4), but for the linear-quadratic model bias is substantial (by a factor of 10) both for the linear and quadratic coefficients, with the linear coefficient overestimated and the quadratic coefficient underestimated (Table 3). FMA performs generally better, although when assuming a linear-quadratic model the coverage probability for the quadratic coefficient is uniformly too high (Table 1), as it is also for the linear coefficient assuming a linear model (Table 2). However both linear and quadratic coefficients using FMA have pronounced upward bias, particularly when Berkson error is large (50%) (Tables 3, 4, 5). By comparison the recently developed ERC method32 yields coverage probabilities that are too high when a linear model is assumed, and too low when a linear-quadratic model is assumed and shared and unshared Berkson errors are both large (50%), although otherwise it performs well, and coverage is generally better than for quasi-2DMC with BMA or FMA, particularly for the linear-quadratic model. The bias of the predicted ERR at a variety of doses is generally smallest for ERC, and largest for quasi-2DMC with BMA and FMA, with standard regression calibration and MCML exhibiting bias in predicted ERR generally somewhat intermediate between the other two methods (Tables 4, 5). In general ERC performs best in the scenarios presented, and should be the method of choice in situations where there may be substantial shared error, or suspected curvature in the dose response.