Abstract
Measurement error in exposure assessment is unavoidable. Statistical methods to correct for such errors rely upon a valid error model, particularly regarding the classification of classical and Berkson error, the structure and the size of the error. We provide a detailed list of sources of error in residential radon exposure assessment, stressing the importance of (a) the differentiation between classical and Berkson error and (b) the clear definitions of predictors and operationally defined predictors using the example of two German case–control studies on lung cancer and residential radon exposure. We give intuitive measures of error size and present evidence on both the error size and the multiplicative structure of the error from three data sets with repeated measurements of radon concentration. We conclude that modern exposure assessment should not only aim to be as accurate and precise as possible, but should also provide a model of the remaining measurement errors with clear differentiation of classical and Berkson components.
Introduction
Radon, a ubiquitous radioactive gas, is the second leading cause of lung cancer after smoking in the general population (NAS, 1994). Epidemiological studies on lung cancer and residential radon exposure have been conducted in many countries worldwide to obtain relative risk (RR) estimates and to describe the exposure–disease relationship. Despite the size and the quality of these studies, the results range from no increased risk to a significantly increased relative lung cancer risk of about 1.10 per 100 Bq/m^{2} radon gas concentration (Pershagen et al., 1994).
For example in Western and Eastern parts of Germany, two case–control studies were conducted during the 1990s including over 2500 patients diagnosed with lung cancer from hospitals (cases) and an adequate group of about 4000 diseasefree participants recruited via population registry (population controls). The participants were interviewed with regard to their longtime residential, smoking, and occupational history. The radon concentrations in the bedroom and the living room of their homes were measured by alpha track detectors over 1 year. Based on these measurements and interview information on the time of the rooms’ occupancy and on each home's residency, the residential radon exposure was assessed retrospectively. The reported RR estimates per 100 Bq/m^{3} and 95% confidence intervals were 0.98 (0.82, 1.17) and 1.04 (0.96, 1.12) for the West and the East study, respectively, based on measurements in the homes inhabited at index date and 0.97 (0.82, 1.14) and 1.10 (0.98, 1.24) for the West and the East study, respectively, based on all measurements in homes inhabited up to 15 years prior to interview (Wichmann et al., 1998, 1999; Kreienbrock et al., 2001; Kreuzer et al., 2003).
Measurement errors in exposure assessment are unavoidable, with residential radon exposure being no exception (Bäverstam and Swedjemark, 1991; Lubin et al., 1995), and induce bias on RR estimates. Methods to correct for such errors are available, but require a model for the error in the assessed exposure, and quite different results emerge depending on the error type (classical or Berkson, i.e. error independent from true exposure or independent from observed exposure), on the structure (additive or multiplicative), or the size (Carroll et al., 1995). Using the example of residential radon exposure, we identify the error components and classify them with regard to classical or Berksontype error. Further, we show that the applicability of an error component depends on the practical variable chosen to represent the “predictor” of the disease in the specific study, the “operationally defined predictor” (Carroll et al., 1995). We provide intuitive measures of the size of the error and analyse three data sets with repeated radon concentration measurements to provide further information on error structure and size.
Methods
Predictor and Operationally Defined Predictor
The error in the exposure assessment is the difference between the “observed exposure” from the “true exposure”. The components of error that are applicable depend upon the “predictor” and the “operationally defined predictor”. Generally, the predictor of interest is given by the epidemiological objective of the study. However, each investigator has to define a practical variable, which is measurable and a valid surrogate for the predictor: the operationally defined predictor.
In the German radon studies, the objective was to quantify the RR of lung cancer due to “residential radon exposure”. From a methodological point of view, any variable can be plugged as “exposure” into the “exposure–diseasemodel”. However, from the biological point of view, several stages of the diseasecausing process are distinguished and the term “exposure” is one of the three terms employed (Armstrong, 1990): (1) the concentration, c(t), a measure of the agent's density at time point t; (2) the exposure, a measure for the agent's mass accumulating during time period T in the environment of an individual, , or, if the exposure is to have the same unit as the concentration and is thus given per time unit, ; and (3) the effective organ dose from the exposure experienced during time period T.
Using the example of residential radon studies, we elucidate the complexity of this issue. The true radon gas concentration in an environment is the concentration of ^{222}Rn at a certain time point. The unit is Bequerel per cubic metre (Bq/m^{3}). However, what is actually measured is the average concentration during the exposure of the detector, RN(detector). In the German radon studies, the detectors were exposed for 1 year and the radon gas concentration in the ith home of a study participant is assessed as the average between bedroom and living room concentrations, weighted by the relative occupancy time, RN_{i}(detector)=0.5(w_{i}RN_{i}^{B}(detector)+(1−w_{i})RN_{i}^{L}(detector), where w_{i} denotes the percentage of time spent in the bedroom. The lung cancer predictor true residential radon exposure is mostly defined as the environmental (external) exposure of an individual per year to radon gas in the residencies inhabited during a time period T, which is relevant for the cancerogenesis at the index date, RN(T). A measurable proxy for this, that is, an operationally defined predictor, is derived by using RN_{i}(detector) as valid proxy for the average radon concentration during the residency of the ith home, RN_{i}(residency), and by computing the timeweighted mean (TWM) concentration, that is, the mean across all homes inhabited during the relevant time period, T, weighted by the residency time in the ith home, T_{i},
The unit is Bequerelyears per cubic metre and year, which equals Bequerel per cubic metre [Bqa/(m^{3}a)=Bq/m^{3}]. Another proxy is the cumulative radon exposure per year accounting for absolute home occupancy, that is, the percentage of time spent in the home, O_{i},
The unit is, again, Bq/m^{3}; however, due to the fact that O_{i} is on average 50%, RNCUM(T) is about half of TWM(T). A third proxy for residential radon exposure is the average radon concentration in the current home (i.e. the home inhabited at index date) during the time of residency, RN_{1}(residency), abbreviated by RN_{1}, if the residency time in the current home covers a good proportion of T. Naturally, the quantities RN_{i}(residency), T_{i}, and O_{i} can only be observed with a certain error. With the observed quantities denoted by RN_{i}(residency)^{*}, T_{i}^{*}, and O_{i}^{*}, the “observed proxies for residential radon exposure” are
An alternative lung cancer predictor to residential radon exposure is the alpha dose, that is, the energy imparted to the lung tissue by alpha particles from the radioactive decay of radon and radon progeny in the residencies during the time period T, D(T), which differs from RN(T) by certain factors (Jacobi, 1989, 1964, ICRP, 1994): (a) the equivalence factor describing the equilibrium between radon and radon progeny given the environmental conditions (temperature, compression), (b) a factor describing the amount of radon progeny activity deposited in the lungs (depending on particles in the air and the individual's inhalation depth and frequency), and (c) a factor describing the dose delivered to the sensitive cells in the lungs (depending on where the progeny are deposited and the depth of the sensitive cells).
Most radon studies use TWM(T) as operationally defined predictor for residential radon exposure, where T covers the 5–35 years prior to index date. Appropriate weighting of the radon gas concentrations depending on time since exposure is then necessary, since exposures before 15 years are said to lose potential to induce lung cancer (Lubin et al., 1994). The German studies were analysed based on two operationally defined predictors: RN_{1} and RNCUM(T), where T covers 5–15 years prior to index date, a homogeneous time period regarding the potential to cause cancer. For these three variables, TWM(T), RNCUM(T) and RN_{1}, as operationally defined predictors for the true predictors “residential radon exposure” or “alpha dose”, we establish a list of error components, formulate an error model, and provide, as far as possible, a sense of the plausible error size.
Error Models
In this work, we deal with random error (i.e. zero expectation), which is nondifferential towards disease status (i.e. structure and size independent of disease status) and homoscedastic (i.e. same structure and size for all observations). We elaborate particularly on the differences between classicaltype error (i.e. statistically independent of the true variable) versus Berksontype error (i.e. statistically independent from the observed variable) and between additive versus multiplicative structure. Figure 1 summarizes the notation.
Errors of the classical type arise when a quantity is measured by some device and repeated measurements vary around the true value. Error of the Berkson type is involved, when a group's average is assigned to each individual suiting the group's characteristics. The group's average is thus the “measured value”, that is, the value that enters the analysis, and the individual latent value is the “true value”. Examples of Berkson error include the use of jobexposurematrix entries instead of individual exposure measurements or the use of environmental exposure measurements via fixed monitors instead of individual dose measured via personal dosimetres (Tosteson et al., 1989).
The difference between additive and multiplicative classical error is elucidated in Figure 2. For additive error, the spread of true exposure given measured exposure (vertical spread of dots) is constant for the full range of the exposure: The graph shows a “tube” (Figure 2a). For multiplicative error, the spread increases proportionally to measured exposure: The graph shows a “trumpet” (Figure 2b). Since the multiplicative error is additive on the logscale, all characteristics of the additive error are valid for the multiplicative error on the logscale: The plot of the log of true exposure versus the log of exposure measured with multiplicative error would provide the same picture as Figure 2a.
Measures of Error Size
The error size is usually given as the standard deviation (SD) of the error on the original scale for additive error, σ_{EA}, or on the logscale for multiplicative error, σ_{log EM}. For multiplicative error, alternatives are: (1) the geometric standard deviation (GSD) of the error, exp(σ_{log EM}), or (2) the coefficient of variation (CV) defined (a) as the error's SD on the original scale divided by the mean on the original scale, , or (b) as , that is, the SD on the original scale divided by the geometric mean (i.e. the exponentiated mean of the log of exposure) (GM). We compute a conversion table for the different measures. Further, we provide an intuitive way to grasp the size of a classical error by presenting the range of measured predictor values that would most likely be observed given a true value, x. For multiplicative error, if error and true predictor are fairly lognormally distributed, 95% of the measured values on the original scale lie within [x(1/(GSD^{2})), x(GSD^{2})]. For additive error, if error and true predictor are fairly normally distributed, 95% of the observed values lie within [x−2σ_{EA},x+2σ_{EA}]. Finally, we present the error variance as a proportion of the exposure variance observed in the German radon studies (on the original scale for additive error, on a logscale for multiplicative error).
Replicate Data
Three data sets with replicate radon concentration measurements conducted by alpha track detectors in German dwellings are available addressing different issues. We exploit this information to provide evidence about the error structure (additive versus multiplicative) and error size by plotting the data and applying analysisofvariance models (ANOVA) using PROC MIXED by SAS^{®}.
Bedroom/living room measurements: For each study participant in the analysed sample of the German casecontrol studies, 1year radon gas concentration measurements in bedroom and living room of the current home are available. These internal data allow the estimation of the betweenmeasurementvariability, given that the differences between rooms can be controlled for. This analysis is solely based on the controls (i.e. about 8000 1year measurements) to reflect the situation for the general public.
Yearbyyear replicates: The German Federal Office for Radiation Protection has measured radon gas concentrations for several consecutive time periods, each covering about 1–2 months during 1995–2001 to monitor changes in radon concentrations over time. Two measurements were conducted under identical conditions for each time period in 11 arbitrarily selected houses including basements of laboratories and houses with very high radon levels in Schneeberg, an area of former uranium mining. We computed the timeweighted average radon concentration of consecutive time periods covering 12 months (i.e. 2·5·11=110 1year measurements). This external data, cordially provided by R Lehmann, allows the estimation of the betweenyearvariability, of the betweenmeasurementvariability, and of both combined.
Intercomparison study: In 1990/91, an intercomparison study was conducted to evaluate within and betweenlaboratoryvariability of laboratories from different European countries measuring radon gas concentrations in five houses with concentrations typical of those expected in the then ongoing epidemiological radon studies (Poffijn et al., 1992; Kreienbrock et al., 1999). From this external data, the sixmonth measurements from five detectors placed in each of five houses conducted by the German laboratory, the Biophysics Department of the University of Saarland, are utilized in this analysis (i.e. 25 6month measurements) to estimate the betweenmeasurementvariability for the laboratory, which conducted all measurements of the German case–control studies.
Statistical Models to Analyse the Replicate Data
Based on the intercomparison data, the size of the error from betweenmeasurementvariability is estimated by applying
where Z_{i,j} denotes the jth measurement in the ith house and HOUSE_{i} the effect of the ith house, and by computing the SD of the residuals ɛ_{i,j} (j=1, …5, i=1,…5). Based on the bedroom/living room measurements, the same error size is estimated by applying
where Z_{i,j} denotes the jth measurement in the ith house, HOUSE_{i} the effect for the ith house, and ROOM_{j} the effect of the bedroom versus the living room, and by obtaining the SD of the residuals (j=0, living room, j=1, bedroom). We also explore whether the floor level difference explains most of the room effect. Based on the yearbyyear data, both the size of the error from betweenmeasurementvariability and the size of the error from betweenyearvariability are estimated by applying
where Z_{i,j,k} denotes the kth measurement in the jth year for the ith house, HOUSE_{i} the effect of the ith house, HOUSE_{i} YEAR_{j} the effect of the jth year by house, and by deriving the SD of the residuals and the squareroot of the variance estimate of HOUSE_{i} YEAR_{j} (k=1, 2, j=2, …, 5, i=1, …, 11). An estimate of both errors combined is derived from
by the SD of the residuals. By further including a fixed effect of the jth year, YEAR_{j}, in model (3), we test for a potential effect of the years independent of the house.
Results
Identification of Error Components
In the following, we present a detailed list of sources of error in radon exposure assessment with special consideration of their applicability for the operationally defined predictors used in the German analyses, RN_{1} and RNCUM(T), and for TWM(T). We propose to distinguish four stages for assessing the predictor “residential radon exposure”, plus an additional fifth stage if “alpha dose” is the predictor of interest:

1)
Estimating the average radon gas concentration in the ith home during the exposure of the detector, RN_{i}(detector).

2)
Using (1) to estimate the average radon gas concentration in the ith home over the year in which the measurement took place, RN_{i}(year), that is, extrapolation to one year.

3)
Using (2) to estimate the radon gas exposure of an individual over a certain time period prior to the measurement, RN_{i}(residency), that is, extrapolation to prior years.

4)
Using (3) for the current home or for all homes inhabited during a certain time period T as operationally defined predictor for residential radon exposure, RN(T).

5)
Using (4) as a surrogate for the true alpha dose, D(T).
The stages (1), (2) and (3) describe the deviation between the observed predictor and the operationally defined predictor, stage (4) the deviation between the operationally defined predictor and “exposure”, and stage (5) the deviation between “exposure” and “dose”.
Regarding stage (1), there is (a) the error from betweenmeasurementvariability, that is the deviation between measurements obtained repeatedly at the same time and place.
A measurement by alpha track detectors involves the exposure of a small box of specific geometry containing a thin foil. The emitted alpha particles leave a small trace (track) on the foil. In order to count these tracks, the foil is etched. The specific number of counts of a randomly chosen area of the foil is obtained manually or by a computer program, and the number of counts per unit is then calculated. The exposure of the detector to radon is derived from the track density on the foil by taking into account the average background track densitiy on similar foils and the sensitivity of the foil to radon exposure, determined by calibration Thus, this error component includes the error from background track density (number of tracks observed on a detector not exposed to radon), the error from miscounting the number of tracks, the error from variations in track counting efficiency, the error from calibration, and the error from underestimating high exposure, when the tracks are so close together to cause difficulty in distinguishing them after etching.
Further, there are (b) the error from betweenlaboratoryvariability (not applicable for the German studies, since all measurements were conducted by the same laboratory), (c) the error from betweendetectorplacementvariability due to the variation of the radon concentrations depending on the placement in the room, (d) the error from betweenroomvariability due to the fact that radon concentrations in the rooms without measurements differ from the radon concentration in the living room, which was used as proxy for the concentrations in the other rooms (except the bedroom).
Regarding stage (2), there is (a) the error from betweenseasonvariability due to seasonal variation of the radon concentration and applicable, if a measurement of less than a year is used to estimate the 1yearaverage (not applicable for the German studies, where only 1yearmeasurements were used). If seasonal correction is applied, (2a^{*}) an error from statistical uncertainty in estimating the correction factor remains and (2a^{**}) an error from assigning a groupmatched correction factor is introduced. (One factor is assigned to all individuals with a certain sesaonal pattern.)
Regarding stage (3), there is (a) the error from betweenyearvariability from radon concentrations” yearbyyear variation due to differences in the weather and the habits of the occupants, and (b) the error from betweensubphasevariability. We define the period of time that a house remains without radonrelevant changes as a subphase due to the fact that the radon concentration during the measurement differs from the concentrations before radonrelevant alterations to the home (Gunby et al., 1993). If the operationally defined predictor takes into account homes other than the current home (RNCUM(T) or TWM(T)), there is (c) the error from betweenownervariability arising from the different ventilation habit of the current owners of the study subject's previous homes, which leads to conditions in the home during the measurement different from the conditions during the residency of the study subject. If correction from information on the change of the average radon concentration by certain house alterations or ventilation habit is performed (Gerken et al., 2000), an error from the statistical uncertainty of estimating the correction factor, (3b^{*}) or (3c^{*}), and an error from assigning a groupmatched correction factor, (3b^{**}) or (3c^{**}), is introduced. (A constant multiplicative effect on the radon concentration is assigned to all houses with a certain pattern in house characteristics or ventilation differences.)
Regarding stage (4), there is (a) the error from the differences in the ventilation habit depending on room and daytime.
This error is due to the fact that the detectors measure average radon concentration for the full day, but the bedroom is occupied during the night and the other rooms during the day. If the bedroom is ventilated more during the day than at night, the measured bedroom concentration underestimates the concentration during the bedroom's occupancy; if a participant sleeps with window open and the window is closed during the day, the measured radon concentration overestimate the concentration during the occupancy. This induces a random error, if it can be assumed that there is no systematic pattern in the day–night cycle of ventilating the rooms across all study participants (that is some participants sleep with window open, some with window closed).
Further, there is (b) the error from betweenenvironmentvariability due to the fact that the radon concentration in residential environments other than the principal home are usually not measured and assumed to be as high, on average, as the principal home. (This error is lessened in the German studies by including only subjects with home occupancy of at least 25%.) Note that it is “residential radon exposure” that is considered here, which does not include the exposure at the workplace. There are (c) the error from betweenhomevariability (for RN_{1}) from using the radon concentration in the current home as proxy for the average radon concentration in all homes inhabited during the relevant time period, (d) the error from false recall of the residency time (for (RNCUM(T) or TWM(T)), (e) the error from false recall of the relative bedroom occupancy, (f) the error from false recall of the absolute house occupancy (for (RNCUM(T)), (g) the error from misspecifying the relevant exposurewindow (for (RNCUM(T) or TWM(T)) due to the fact that a time period other than T may be relevant for the lung cancer genesis at index date, and (h) the error from ignoring the absolute house occupancy (for RN_{1} and TWM(T)).
Regarding stage (5), there are (a) the uncertainty in determining the equilibrium factor and (b) the error from betweenpersonvariability due to the fact that the lung doses of persons with the same radon and radon progeny exposure vary due to respiratory differences.
Classification of Error Components: Classical versus Berkson
In Table 1, the error components corresponding to each of the five stages are summarized indicating the dependence on the operationally defined predictor, the applicability to the German radon studies and the classification into Berkson or classical type. We used different arguments to classify errors as classical error or Berkson error:
Classical I: Repeated observations, given all other error components were nonexistent, would yield different values and vary about the true value: Applied for the components 1a, e, 2a, 3c, 4a, d–g.
If the radon gas concentration measurements were repeated under identical conditions (1a), if the measurement was repeated in a newly specified house (1e), if the time period of the detector exposure covered a different period of the year (2a), if the measurement was repeated in the same house with again a different owner (3c), if the measurement was repeated with different day–night variation in ventilation of the rooms (4a), if the participant was interviewed again (4d–f), if the observation was repeated with a different exposurewindow (4g), the new observation would differ from the original.
Classical II: One measurement is used as proxy for the average (Repeated observations would vary about the average): Applied for the components 1d, 3a, b, 4b, c.
The measurement in the living room is a proxy for all rooms (1d); the measurement during 1 year is a proxy for the average over all relevant years (3a); the measurement of the current subphase is a proxy for the average over all subphases (3b); the measurement in the current principal home is a proxy for the average of all currently occupied homes (4b) or for the average of all principal homes inhabited during the relevant time period (4c).
Classical III: Uncertainty in the estimation of a correction factor (sampling error): Resampling, that is, the repetition of the observation with a different sample of participants, would yield different correction factors. Applied for 2a^{*}, 3b^{*}, c^{*}.
Berkson I: A group's observation is assigned to each individual in the group, but the individual's values differ within each group. Applied for 4h, 5a, b.
A certain level of RN_{1} or TWA(T) is assigned to a group of persons regardless of their absolute home occupancy; a certain level of radon gas exposure is assigned to a group of persons regardless of the specific equilibrium factor in their environment (5a) or of the persons” specific respiratory characteristics (5b), which may cause differing exposure to radon progeny (for 5a) and differing lung dose (5a,b).
Berkson II: A correction factor derived for a group of individuals with certain characteristics in common is assigned to all individuals of this group: Applied for 2a^{**}, 3b^{**}, c^{**}.
A certain factor is assigned to all individuals with the same seasonal pattern (2a^{**}), to all individuals with the same radonrelevant house alterations (3b^{**}), or to all individuals with the same changes in ventilation habit between current house owner and study participant.
Evidence of Multiplicative Error Structure
Information on the error from betweenmeasurementvariability under epidemiological conditions is provided by the bedroom and living room measurements of the German case–control studies. Plotting the measurements versus their mean within house (Figure 3) shows the “trumpet” on the original scale and the “tube” on the logscale indicating a multiplicative structure of this error component (compare Figure 2). The analogous graph of the yearbyyear data (Figure 4) presents a similar picture of a rather multiplicative structure of the error from betweenyearvariability. However, the smaller number of measurements can clearly be viewed, and a glance at the unit of the axes labelling, 1000 Bq/m^{3} instead of 1 Bq/m^{3} in Figure 3, shows that the radon concentrations encountered in these houses do not reflect the epidemiological situation. The intercomparison data provides, again, information on the error from betweenmeasurement variability. Figure 5 displays the original data by house. (The analogous graph to Figures 3 and 4 is not displayed, since it would be uninformative due to sparse data.) It shows that the betweenmeasurementvariability is small, but slightly larger for the two most highly exposed houses (houses 3 and 4) hinting at a multiplicative error structure.
Error Size
The conversion table of several measures of multiplicative error sizes, Table 2, shows that the two definitions of the CV yield similar results for small errors, but that the difference increases rapidly for errors larger than 0.3. The SD of the log of the error is close to the CV defined as SD divided by the mean with the difference increasing, again, with error size. Table 3 relates the SD of the log of the error to the percentage of the error variance compared to the observed radon exposure variance (on the logscale for multiplicative error). These table entries are datadependent, that is, for a given percentage, the corresponding SD depends upon the study data and is here given for the German West study. (Similar results are obtained from the East study.) Table 4 shows the range of radon exposure values that is likely to be observed given a certain true radon exposure level and a certain classical error. For example for an error of 0.4, measurements from 22 to 111 Bq/m^{3} can be expected to be observed given a true radon exposure of 50 Bq/m^{3}.
The results of the ANOVA of the replicate data are summarized in Table 5. It can be seen that the estimated size of the error from betweenmeasurementvariability is 0.07 (yearbyyear data, ANOVA model (3), 0.10 (Intercomparison data, ANOVA model (1)), 0.28, or 0.33 (bedroom/living room measurements, ANOVA model (2)) depending on the analysed data set. In the case of the bedroom/living room measurements, we need to ascertain that the room difference is sufficiently controlled for. The measured radon concentrations in the bedrooms are overall about 10% (30%) higher than those in the living rooms in the West (East) study. Adding a fixed effect of the floor difference between the rooms did not influence the error size estimate, but reduced the effect of the bedroom to 5% (20%) in the West (East) study. Repeating the application of ANOVA model (2) for the sample reduced to only those individuals with bedroom and living room on the same floor yields similar results. The size of the error caused by betweenyearvariability is estimated as 0.58 (yearbyyear data, ANOVA model (3)). The estimate of the size of both error components combined, 0.55, is smaller than the “sum” of the error sizes (yearbyyear data, ANOVA model (4)), hinting at a correlation of the two error components. Including a fixed effect of the year and graphical inspection (not shown) shows no effect (Pvalue=0.2) of the years and certainly no trend over the years. The results without house 11, the most influential, reported in parentheses in Table 5, show smaller estimates.
Discussion
The importance of highquality exposure assessment and the need to minimize errors in the exposure is well acknowledged in epidemiology. Whereas the statistical methodology for dealing with such errors in the estimation of disease risk has been around for already 20 years (Carroll et al., 1984; Rosner et al., 1989), the increasing awareness of their practical implications in epidemiology called for “a renaissance of measurement error” (Michels, 2001). Frequently, the discussions of published epidemiological studies include a noteofcaution regarding the involvement of measurement error in the exposures hinting at a potential attenuation of risk estimates. However, there is often a lack of clear differentiation between classical and Berkson error despite their greatly different impact.
Figure 6 shows several theoretically observed dose–response curves. The xaxis shows the exposure — here mimicking the German radon study situation (range of 0–400 Bq/m^{3}); however, this quantity can be any other epidemiological continuous exposure. The yaxis shows the RR that would be observed, if the exposure was measured with a certain error and the RR was not corrected for it. Each curve shows the increase of the RR under the logistic model for increasing exposure assuming various error models. The four curves correspond to four different error models, the additive classical error, the multiplicative classical error, the additive Berkson error, or the multiplicative Berkson error. The curve labelled “none=add Berk” shows the true dose–response curve with an RR of 1.12 per 100 units of the exposure, that is, the RR under a normal logistic model without errors in the exposure. The curves are drawn based on the expected exposure given the observed exposure and given a certain error model (following the reasoning of the regression calibration method). This figure clearly indicates that classical error attenuates the dose–response curve, in the case of multiplicative error even inducing a spurious curvature, that additive Berkson error has no effect on the risk estimate and that multiplicative Berkson error, if any, slightly intensifies the dose–response relationship. However, the fact that the impact of the Berkson error on the risk estimate (point estimate) is negligible does not mean that the Berkson error can be ignored, since the estimate's precision may suffer severely.
In the statistical literature, new error correction methodology for error models combining both classical and Berksontype have recently been developed and applied (Reeves et al., 1998; Schafer et al., 2001; Heid, 2002). For the correct application of these methods, the foremost prerequisite is the meticulous identification of all sources of error in the exposure assessment, their correct classification (classical versus Berkson) and collection of information on error structure (additive versus multiplicative) and size, which we provided for residential radon exposure assessment via air measurements with particular reference to the German lung cancer studies. Random error in the exposure assessment from the physical process of measuring radon gas concentrations were previously studied in great detail (Wrixon et al., 1988; Hardcastle and Miles, 1996), but for the errors in the epidemiological setting, only a very crude list of such errors was described so far (Bäverstam and Swedjemark, 1991, Lubin et al., 1995). A detailed revalidation is of immediate interest in the light of the ongoing implementation of a new assessment procedure of residential radon exposure by measuring polonium in glass objects (e.g. Lagarde et al., 2002).
Further, we showed the usefulness of the concepts “predictor of interest” and “operationally defined predictor”, their impact on the applicability of error components, on size and predominant type of the error. We found that “external residential radon exposure” as lung cancer predictor involves almost no Berkson error component, whereas the predictor “lung dose” introduces a Berkson error. Further, it became clear that, however the choice of the operationally defined predictor, there is a tradeoff to be made: The TWM(T) radon concentration in all relevant homes, rather than the concentration in the current home, is “closer” to the predictor of interest, but more difficult to measure.
Next to the differentiation between classical and Berkson type error, it is the error size that is generally very influential on the impact of the measurement error on risk estimation. The computation formulas (Methods Section) and the conversion table (Table 2) should guide the reader through the various measures of error size. Additionally, the size of the error referred to as “percentage of the error variance compared to the observed exposure variance” (Table 3) is particularly valuable to put the error size in the right perspective when considering real data. For classical error, this percentage describes the proportion of the observed radon exposure variance that is explained by the error and which would disappear, if the variable was measured without error. For Berkson error, it is the proportion, by which the true exposure variance exceeds the observed exposure variance and by which the observed exposure variance would increase, if the error was eliminated. For example, a classical multiplicative error with SD (on the logscale) of 0.4, which was suggested as reasonable for the German studies (Heid., 2002), explains 50% of the observed radon exposure variance (on the logscale); a Berkson error of this size would indicate that the true exposure variance is 1.5 times as large as the one observed. For a classical error of 100%, all the observed exposure variance would be due to error, the true exposure variance would be zero, which is the reason for the classical error not exceeding 100%. The 300% Berkson error indicates that the true exposure variance trebles the observed exposure variance. This proportion is thus an indicator for what the error does to the exposure variance, but even more, for classical error, it is a measure for the impact of the error on risk estimates across data sets. For example, a multiplicative classical error with SD (on the logscale) of 0.48, as estimated for the English radon study (Darby et al., 1998), explains about 20% of the observed radon exposure variance in this study, but would explain over 65% in the German studies. The impact of such an error is, hence, more severe in the German studies. Note that the Berkson error's impact does not depend on the exposure variance and is thus the same across data sets assuming the same underlying error size and risk (Heid et al., 2002), which is of interest in metaanalyses.
A real example of the error size of two error components, the errors from betweenmeasurementvariability and from betweenyearvariability, is given by three data sets with repeated measurements: The size of the error from betweenmeasurementvariability is 0.07 (yearbyyear data), 0.10 (intercomparison data), and 0.3 (bedroom/living measurements) depending on the analysed data set. These differences indicate the importance of having a critical look at the replicate data. In the yearbyyear data, the houses with very high radon concentrations were not representative for the epidemiological situation. The intercomparison data, while overcoming this problem, are rather sparse and the laboratory personnel were aware that their results were being evaluated. The value of 0.1 obtained in a controlled exercise with a limit number of measurements can thus be viewed as a minimum error size for the epidemiological studies, where over 10,000 measurements were conducted by this laboratory. The bedroom/living room measurements have the advantage of being available for nearly all study participants and of being conducted under epidemiological field conditions. However, we showed that there are differences between the radon concentrations in the rooms, which were beyond that due to different floor levels and which we might not have been able to completely correct for in the error size estimation. The error from betweenyearvariability, estimated as 0.58 (yearbyyear data) is quite large compared to the betweenmeasurementerror. However, again, the fact that these data are unrepresentative calls for caution. It should be noted that the estimated error variance of the combination of both betweenyearvariability and betweenmeasurementvariability is smaller than the sum of both, which may well be due to correlations between the components. It is therefore of no use to estimate each error component's size separately and to take their sum as the total error size.
The multiplicative error model in the assessment of residential radon exposure is already established (Gunby et al., 1993; Lagarde et al, 1997; Darby et al., 1998). We find that the replicate measurements provide further evidence of the multiplicativity by graphically viewing the mean versus single measurements (Figures 3 and 4).
Conclusions
We conclude that, generally in epidemiology, clear differention between classical and Berkson error components is essential in the assessment of error sources and for establishing an error model, a fact which we believe is not fully acknowledged. This differentiation is crucial due to the different impact of these two error types. The classical error can induce severe bias on the risk estimate; multiplicative classical error can even distort the dose–response curve. This bias can be reduced by using the mean of multiple measurements in the analysis requiring internal replicates for each individual, or it can be corrected for by using the information from (internal or external) replicate measurements for a subgroup. Also, the spuriously narrow confidence intervals for the risk estimates in the presence of classical error in the exposure, which are yielded without error correction, can be corrected. The analysis of our replicate data, their usefulness, and the struggle with their limitations motivate our recommendation for more internal repeated measurements in future epidemiological studies (e.g., in radon studies, more than one detector per room and repeated measurements over a series of years). At first glance, the Berkson error is less problematic, since it does not induce notable bias on the risk estimates. However, it weakens the precision of the estimates, which is often more difficult to correct for than in the classical situation due to the problem of grasping the extent of the Berkson error. For example, the lung dose is hard to measure and such a measurement would reintroduce classical error. Just replicating measurements does not help in the Berkson case. Simplified, classical error is rather related to the measurement process, whereas Berkson error is often a matter of defining the exposure: Using fixed monitors (e.g. using the distance of a home to the next power station as predictor instead of individual measurements), using measurements in the environment (e.g. residential radon exposure instead of lung dose), or using a person's affiliation to a group in order to use the exposure assigned to this group (e.g. using jobexposure matrices) instead of personal monitors is a question of how to define the exposure; it induces Berkson error.
The general statement that “wellbehaved” (random, nondifferential, homoscedastic) errors attenuate regression coefficients applies only to the classical error. It should be kept in mind, that considering more precise (or more relevant) exposures and thus inducing more potential sources of error does not necessarily increase the bias of the risk estimates (compare to Lubin et al., 1995). For example, extending the definition of the predictor from “residential radon exposure” to “lung dose” induces Berkson error and does not attenuate the risk estimate.
To assume the sum of both error type's sizes as known and to vary the percentage of the Berkson error is no option in such situations (see Mallick et al., 2002). We support instead a twodimensional view of measurement error, that is, a classicaltype dimension and a Berksontype dimension, where the size of each dimension needs to be studied separately. The full error is represented in the continuum of a twodimensional space (compare with Zeger et al., 2000). Modern exposure assessment should therefore not only aim to be as accurate and precise as possible, but should also provide a model of the measurement errors that unavoidably remain with clear differentiation of classical and Berkson components.
References
Armstrong B.G. The effects of measurement errors on relative risk regression. Am J Epi 1990: 132(6): 1176–1184.
Bäverstam U., and Swedjemark G.A. Where are the errors when we estimate radon exposure in retrospect? Radiat Prot Dosim 1991: 36(2/4): 107–112.
Carroll R.J., Ruppert D., and Stefanski L.A. Measurement Error in Nonlinear Models. Chapman & Hall, London, 1995.
Carroll R.J., Spiegelmann C., Lan K.K., Bailey K.T., and Abbott R.D. On errorsinvariables for binary regression models. Biometrika 1984: 74: 19–26.
Darby S., Whitley E., Silcocks P., Thakrar B., Green M., Lomas P., Miles J., Reeves G., Fearn T., and Doll R. Risk of lung cancer associated with residential radon exposure in southwest England: a case–control study. Br J Cancer 1998: 78(3): 394–408.
Gerken M., Kreienbrock L., Wellmann J., Kreuzer M., and Wichmann H.E. Models for retrospective quantification of indoor radon exposure in casecontrol studies. Health Phys 2000: 78(3): 268–278.
Gunby J.A., Darby S.C., Miles J.C.H., Green B.M.R., and Cox D.R. Factors affecting indoor radon concentration in the United Kingdom. Health Phys 1993: 64: 2–12.
Hardcastle G.D., and Miles J.C.H. Ageing and fading of alpha particle tracks in CR39 exposed to air. Radiat Prot Dosim 1996: 67: 295–298.
Heid I.M. Measurement error in exposure assessment: an error model and its impact on studies of lung cancer and residential radon exposure in Germany. PhD Thesis, 2002. http://edoc.ub.unimuenchen.de/archive/00000522/.
Heid I.M., Küchenhoff H., Wellmann J., Gerken M., Kreienbrock L., and Wichmann H.E. On the potential of measurement error to induce differential bias on risk estimates: an example from radon epidemiology. Stat Med 2002: 21: 3261–3278.
International Commission on Radiological Protection (ICRP). Lung cancer risk from indoor exposures to radon daughters. ICRP Publ Nr. 50. Pergamon Press, New York, 1994.
Jacobi W. The dose to the human respiratory tract by inhalation of shortlived 222Rnand 220 Rndecay products. Health Phys 1964: 10: 1163–1174.
Jacobi W. Dose to tissue and effective dose equivalent by inhalation of radon222, radon220 and their shortlived daughters. GSFreport S626, Neuherberg,, 1989.
Kreienbrock L., Kreuzer M., Gerken M., Dingerkus G., Wellmann J., Keller G., and Wichmann H.E. Casecontrol study on lung cancer and residential radon in West Germany. Am J Epidemiol 2001: 153(1): 42–52.
Kreienbrock L., Poffijn A., Tirmarche M., Feider M., Kies A., and Darby S.C. Intercomparison of passive radondetectors under field conditions in epidemiological studies. Health Phys 1999: 76(5): 558–563.
Kreuzer M., Heinrich J., Wölke G., Schaffrath Rosario A., Gerken M., Wellmann J., Keller G., Kreienbrock L., and Wichmann H.E. Residential radon and risk of lung cancer in Eastern Germany. Epidemiology 2003: 14: 559–568.
Lagarde F., Falk R., Almren K., Nyberg F., Svensson H., and Pershagen G. Glassbased radonexposure assessment and lung cancer risk. J Expo Anal Environ Epidemiol 2002: 12: 344–354.
Lagarde F., Pershagen G., Akerblom G., Axelson O., Bäverstam U., Damber L., Enflo A., Svartengren M., and Swedjemark G.A. Residential radon and lung cancer in Sweden: risk analysis accounting for random error in the exposure assessment. Health Phys 1997: 72: 269–276.
Lubin J.H., Boice J.D., Edling C.H., Hornung R., Howe G., Kunz E., Kusiak A., Morrison H.I., Radford E.P., Samet J.M., Tirmarche M., Woodward A., Xiang Y.S., and Pierce D.A. Radon and lung cancer risk: a joint analysis of 11 underground miner studies. NIH publication no. 943644, Rockville, MD, 1994.
Lubin J.H., Boice Jr J.D., and Samet J.M. Errors in exposure assessment, statistical power and the interpretation of residential radon studies. Radiat Res 1995: 144: 329–341.
Mallick B., Hoffmann F.O., and Carroll R.J. Semiparametric regression modeling with mixtures of Berkson and classical error, with application to fallout from the Nevada test stite. Biometrics 2002: 58: 13–20.
Michels K.B. A renaissance for measurement error. Int J Epidemiol 2001: 30: 421–422.
National Academy of Sciences (NAS) National Research Council. Health effects of exposure to radon: time for reassessment? BEIR VI Report of the Committee on the Biological Effects of Ionizing Radiation, National Academy Press, Washington, DC, 1994.
Pershagen G., Axelson O., Clavensjö B., Damber L., Desai G., Enflo A., Lagarde F., Mellander H., Svartengren M., Swedjemark G.A., and Akerblom G. Residential radon exposure and lung cancer in Sweden. N Engl J Med 1994: 330: 159–164.
Poffijn A., Tirmarche M., Kreienbrock L., Kayser B., and Darby S.C. Radon and lung cancer: protocol and procedures of the multicentre studies in the ArdennesEifel region, Brittany, and the Massiv Central. Radiat Prot Dosim 1992: 45(Suppl 1/4): 651–656.
Reeves G.K., Cox D.R., Darby S.C., and Whitley E. Some aspects of measurement error in explanatory variables for continuous and binary regression models. Stat Med 1998: 17: 2157–2177.
Rosner B., Willett W.C., and Spiegelman D. Correction of logistic regression relative risk estimates and confidence intervals for systematic withinperson measurement error. Stat Med 1989: 8: 1051–1069.
Schafer D.W., Lubin J.H., Ron E., Stovall M., and Carroll R.J. Thyroid cancer following scalp irradiation: a reanalysis accounting for uncertainty in dosimetry. Biometrics 2001: 57: 689–697.
Tosteson T.D., Stefanski L.A., and Schafer D.W. A measurementerror model for binary and ordinal regression. Stat Med 1989: 8: 1139–1147.
Wichmann H.E., Gerken M., Wellmann J., Kreuzer M., Kreienbrock L., Keller G., Wölke G., and Heinrich J. Lungenkrebsrisiko durch Radon in der Bundesrepublik Deutschland (Ost)  Thüringen und Sachsen (in German). Fortschritte in der Umweltmedizin. ecomed verlagsgesellschaft, 1999.
Wichmann H.E., Kreienbrock L., Kreuzer M., Gerken M., Dingerkus G., Wellmann J., and Keller G. Lungenkrebsrisiko durch Radon in der Bundesrepublik Deutschland (West) (in German). Fortschritte in der Umweltmedizin. ecomed verlagsgesellschaft, 1998.
Wrixon A.D., Green B.M.R., Lomas P.R.M., Miles J.C.H., Cliff K.D., Francis E.A., Driscoll C.M.H., James A.C., and O’Riordan M.X. Natural radiation exposure in UK dwellings. NRPB R190, 1988.
Zeger S.L., Thomas D., Dominici F., Samet J.M., Schwartz J., Dockery D., and Cohen A. Exposure measurement error in timeseries studies of air pollution: Concepts and consequences. Environ Health Perspect 2000: 108(5): 419–426.
Author information
Rights and permissions
About this article
Received
Accepted
Published
Issue Date
DOI
Keywords
 measurement error
 Berkson error
 error models
 error sources
 radon
 case–control studies.
Further reading

A lowcost experiment to identify variations in the concentration of environmental radon
Physics Education (2019)

Methods to account for uncertainties in exposure assessment in studies of environmental exposures
Environmental Health (2019)

Exposure measurement error in air pollution studies: A framework for assessing shared, multiplicative measurement error in ensemble learning estimates of nitrogen oxides
Environment International (2019)

Shortterm air pollution exposure and emergency department visits for amyotrophic lateral sclerosis: A timestratified casecrossover analysis
Environment International (2019)

Development and validation of models to predict personal ventilation rate for air pollution research
Journal of Exposure Science & Environmental Epidemiology (2019)