Two dimensions of measurement error: Classical and Berkson error in residential radon exposure assessment

Abstract

Measurement error in exposure assessment is unavoidable. Statistical methods to correct for such errors rely upon a valid error model, particularly regarding the classification of classical and Berkson error, the structure and the size of the error. We provide a detailed list of sources of error in residential radon exposure assessment, stressing the importance of (a) the differentiation between classical and Berkson error and (b) the clear definitions of predictors and operationally defined predictors using the example of two German case–control studies on lung cancer and residential radon exposure. We give intuitive measures of error size and present evidence on both the error size and the multiplicative structure of the error from three data sets with repeated measurements of radon concentration. We conclude that modern exposure assessment should not only aim to be as accurate and precise as possible, but should also provide a model of the remaining measurement errors with clear differentiation of classical and Berkson components.

Introduction

Radon, a ubiquitous radioactive gas, is the second leading cause of lung cancer after smoking in the general population (NAS, 1994). Epidemiological studies on lung cancer and residential radon exposure have been conducted in many countries worldwide to obtain relative risk (RR) estimates and to describe the exposure–disease relationship. Despite the size and the quality of these studies, the results range from no increased risk to a significantly increased relative lung cancer risk of about 1.10 per 100 Bq/m2 radon gas concentration (Pershagen et al., 1994).

For example in Western and Eastern parts of Germany, two case–control studies were conducted during the 1990s including over 2500 patients diagnosed with lung cancer from hospitals (cases) and an adequate group of about 4000 disease-free participants recruited via population registry (population controls). The participants were interviewed with regard to their long-time residential, smoking, and occupational history. The radon concentrations in the bedroom and the living room of their homes were measured by alpha track detectors over 1 year. Based on these measurements and interview information on the time of the rooms’ occupancy and on each home's residency, the residential radon exposure was assessed retrospectively. The reported RR estimates per 100 Bq/m3 and 95% confidence intervals were 0.98 (0.82, 1.17) and 1.04 (0.96, 1.12) for the West and the East study, respectively, based on measurements in the homes inhabited at index date and 0.97 (0.82, 1.14) and 1.10 (0.98, 1.24) for the West and the East study, respectively, based on all measurements in homes inhabited up to 15 years prior to interview (Wichmann et al., 1998, 1999; Kreienbrock et al., 2001; Kreuzer et al., 2003).

Measurement errors in exposure assessment are unavoidable, with residential radon exposure being no exception (Bäverstam and Swedjemark, 1991; Lubin et al., 1995), and induce bias on RR estimates. Methods to correct for such errors are available, but require a model for the error in the assessed exposure, and quite different results emerge depending on the error type (classical or Berkson, i.e. error independent from true exposure or independent from observed exposure), on the structure (additive or multiplicative), or the size (Carroll et al., 1995). Using the example of residential radon exposure, we identify the error components and classify them with regard to classical- or Berkson-type error. Further, we show that the applicability of an error component depends on the practical variable chosen to represent the “predictor” of the disease in the specific study, the “operationally defined predictor” (Carroll et al., 1995). We provide intuitive measures of the size of the error and analyse three data sets with repeated radon concentration measurements to provide further information on error structure and size.

Methods

Predictor and Operationally Defined Predictor

The error in the exposure assessment is the difference between the “observed exposure” from the “true exposure”. The components of error that are applicable depend upon the “predictor” and the “operationally defined predictor”. Generally, the predictor of interest is given by the epidemiological objective of the study. However, each investigator has to define a practical variable, which is measurable and a valid surrogate for the predictor: the operationally defined predictor.

In the German radon studies, the objective was to quantify the RR of lung cancer due to “residential radon exposure”. From a methodological point of view, any variable can be plugged as “exposure” into the “exposure–disease-model”. However, from the biological point of view, several stages of the disease-causing process are distinguished and the term “exposure” is one of the three terms employed (Armstrong, 1990): (1) the concentration, c(t), a measure of the agent's density at time point t; (2) the exposure, a measure for the agent's mass accumulating during time period T in the environment of an individual, , or, if the exposure is to have the same unit as the concentration and is thus given per time unit, ; and (3) the effective organ dose from the exposure experienced during time period T.

Using the example of residential radon studies, we elucidate the complexity of this issue. The true radon gas concentration in an environment is the concentration of 222Rn at a certain time point. The unit is Bequerel per cubic metre (Bq/m3). However, what is actually measured is the average concentration during the exposure of the detector, RN(detector). In the German radon studies, the detectors were exposed for 1 year and the radon gas concentration in the ith home of a study participant is assessed as the average between bedroom and living room concentrations, weighted by the relative occupancy time, RNi(detector)=0.5(wiRNiB(detector)+(1−wi)RNiL(detector), where wi denotes the percentage of time spent in the bedroom. The lung cancer predictor true residential radon exposure is mostly defined as the environmental (external) exposure of an individual per year to radon gas in the residencies inhabited during a time period T, which is relevant for the cancerogenesis at the index date, RN(T). A measurable proxy for this, that is, an operationally defined predictor, is derived by using RNi(detector) as valid proxy for the average radon concentration during the residency of the ith home, RNi(residency), and by computing the time-weighted mean (TWM) concentration, that is, the mean across all homes inhabited during the relevant time period, T, weighted by the residency time in the ith home, Ti,

The unit is Bequerel-years per cubic metre and year, which equals Bequerel per cubic metre [Bqa/(m3a)=Bq/m3]. Another proxy is the cumulative radon exposure per year accounting for absolute home occupancy, that is, the percentage of time spent in the home, Oi,

The unit is, again, Bq/m3; however, due to the fact that Oi is on average 50%, RNCUM(T) is about half of TWM(T). A third proxy for residential radon exposure is the average radon concentration in the current home (i.e. the home inhabited at index date) during the time of residency, RN1(residency), abbreviated by RN1, if the residency time in the current home covers a good proportion of T. Naturally, the quantities RNi(residency), Ti, and Oi can only be observed with a certain error. With the observed quantities denoted by RNi(residency)*, Ti*, and Oi*, the “observed proxies for residential radon exposure” are

An alternative lung cancer predictor to residential radon exposure is the alpha dose, that is, the energy imparted to the lung tissue by alpha particles from the radioactive decay of radon and radon progeny in the residencies during the time period T, D(T), which differs from RN(T) by certain factors (Jacobi, 1989, 1964, ICRP, 1994): (a) the equivalence factor describing the equilibrium between radon and radon progeny given the environmental conditions (temperature, compression), (b) a factor describing the amount of radon progeny activity deposited in the lungs (depending on particles in the air and the individual's inhalation depth and frequency), and (c) a factor describing the dose delivered to the sensitive cells in the lungs (depending on where the progeny are deposited and the depth of the sensitive cells).

Most radon studies use TWM(T) as operationally defined predictor for residential radon exposure, where T covers the 5–35 years prior to index date. Appropriate weighting of the radon gas concentrations depending on time since exposure is then necessary, since exposures before 15 years are said to lose potential to induce lung cancer (Lubin et al., 1994). The German studies were analysed based on two operationally defined predictors: RN1 and RNCUM(T), where T covers 5–15 years prior to index date, a homogeneous time period regarding the potential to cause cancer. For these three variables, TWM(T), RNCUM(T) and RN1, as operationally defined predictors for the true predictors “residential radon exposure” or “alpha dose”, we establish a list of error components, formulate an error model, and provide, as far as possible, a sense of the plausible error size.

Error Models

In this work, we deal with random error (i.e. zero expectation), which is nondifferential towards disease status (i.e. structure and size independent of disease status) and homoscedastic (i.e. same structure and size for all observations). We elaborate particularly on the differences between classical-type error (i.e. statistically independent of the true variable) versus Berkson-type error (i.e. statistically independent from the observed variable) and between additive versus multiplicative structure. Figure 1 summarizes the notation.

Figure 1
figure1

Notation of error model.

Errors of the classical type arise when a quantity is measured by some device and repeated measurements vary around the true value. Error of the Berkson type is involved, when a group's average is assigned to each individual suiting the group's characteristics. The group's average is thus the “measured value”, that is, the value that enters the analysis, and the individual latent value is the “true value”. Examples of Berkson error include the use of job-exposure-matrix entries instead of individual exposure measurements or the use of environmental exposure measurements via fixed monitors instead of individual dose measured via personal dosimetres (Tosteson et al., 1989).

The difference between additive and multiplicative classical error is elucidated in Figure 2. For additive error, the spread of true exposure given measured exposure (vertical spread of dots) is constant for the full range of the exposure: The graph shows a “tube” (Figure 2a). For multiplicative error, the spread increases proportionally to measured exposure: The graph shows a “trumpet” (Figure 2b). Since the multiplicative error is additive on the log-scale, all characteristics of the additive error are valid for the multiplicative error on the log-scale: The plot of the log of true exposure versus the log of exposure measured with multiplicative error would provide the same picture as Figure 2a.

Figure 2
figure2

Error models: True predictor versus predictor measured with (a) additive or (b) multiplicative error (Fictitious data).

Measures of Error Size

The error size is usually given as the standard deviation (SD) of the error on the original scale for additive error, σEA, or on the log-scale for multiplicative error, σlog EM. For multiplicative error, alternatives are: (1) the geometric standard deviation (GSD) of the error, exp(σlog EM), or (2) the coefficient of variation (CV) defined (a) as the error's SD on the original scale divided by the mean on the original scale, , or (b) as , that is, the SD on the original scale divided by the geometric mean (i.e. the exponentiated mean of the log of exposure) (GM). We compute a conversion table for the different measures. Further, we provide an intuitive way to grasp the size of a classical error by presenting the range of measured predictor values that would most likely be observed given a true value, x. For multiplicative error, if error and true predictor are fairly lognormally distributed, 95% of the measured values on the original scale lie within [x(1/(GSD2)), x(GSD2)]. For additive error, if error and true predictor are fairly normally distributed, 95% of the observed values lie within [x−2σEA,x+2σEA]. Finally, we present the error variance as a proportion of the exposure variance observed in the German radon studies (on the original scale for additive error, on a log-scale for multiplicative error).

Replicate Data

Three data sets with replicate radon concentration measurements conducted by alpha track detectors in German dwellings are available addressing different issues. We exploit this information to provide evidence about the error structure (additive versus multiplicative) and error size by plotting the data and applying analysis-of-variance models (ANOVA) using PROC MIXED by SAS®.

Bedroom/living room measurements: For each study participant in the analysed sample of the German case-control studies, 1-year radon gas concentration measurements in bedroom and living room of the current home are available. These internal data allow the estimation of the between-measurement-variability, given that the differences between rooms can be controlled for. This analysis is solely based on the controls (i.e. about 8000 1-year measurements) to reflect the situation for the general public.

Year-by-year replicates: The German Federal Office for Radiation Protection has measured radon gas concentrations for several consecutive time periods, each covering about 1–2 months during 1995–2001 to monitor changes in radon concentrations over time. Two measurements were conducted under identical conditions for each time period in 11 arbitrarily selected houses including basements of laboratories and houses with very high radon levels in Schneeberg, an area of former uranium mining. We computed the time-weighted average radon concentration of consecutive time periods covering 12 months (i.e. 2·5·11=110 1-year measurements). This external data, cordially provided by R Lehmann, allows the estimation of the between-year-variability, of the between-measurement-variability, and of both combined.

Intercomparison study: In 1990/91, an intercomparison study was conducted to evaluate within- and between-laboratory-variability of laboratories from different European countries measuring radon gas concentrations in five houses with concentrations typical of those expected in the then on-going epidemiological radon studies (Poffijn et al., 1992; Kreienbrock et al., 1999). From this external data, the six-month measurements from five detectors placed in each of five houses conducted by the German laboratory, the Biophysics Department of the University of Saarland, are utilized in this analysis (i.e. 25 6-month measurements) to estimate the between-measurement-variability for the laboratory, which conducted all measurements of the German case–control studies.

Statistical Models to Analyse the Replicate Data

Based on the intercomparison data, the size of the error from between-measurement-variability is estimated by applying

where Zi,j denotes the jth measurement in the ith house and HOUSEi the effect of the ith house, and by computing the SD of the residuals ɛi,j (j=1, …5, i=1,…5). Based on the bedroom/living room measurements, the same error size is estimated by applying

where Zi,j denotes the jth measurement in the ith house, HOUSEi the effect for the ith house, and ROOMj the effect of the bedroom versus the living room, and by obtaining the SD of the residuals (j=0, living room, j=1, bedroom). We also explore whether the floor level difference explains most of the room effect. Based on the year-by-year data, both the size of the error from between-measurement-variability and the size of the error from between-year-variability are estimated by applying

where Zi,j,k denotes the kth measurement in the jth year for the ith house, HOUSEi the effect of the ith house, HOUSEi YEARj the effect of the jth year by house, and by deriving the SD of the residuals and the squareroot of the variance estimate of HOUSEi YEARj (k=1, 2, j=2, …, 5, i=1, …, 11). An estimate of both errors combined is derived from

by the SD of the residuals. By further including a fixed effect of the jth year, YEARj, in model (3), we test for a potential effect of the years independent of the house.

Results

Identification of Error Components

In the following, we present a detailed list of sources of error in radon exposure assessment with special consideration of their applicability for the operationally defined predictors used in the German analyses, RN1 and RNCUM(T), and for TWM(T). We propose to distinguish four stages for assessing the predictor “residential radon exposure”, plus an additional fifth stage if “alpha dose” is the predictor of interest:

  1. 1)

    Estimating the average radon gas concentration in the ith home during the exposure of the detector, RNi(detector).

  2. 2)

    Using (1) to estimate the average radon gas concentration in the ith home over the year in which the measurement took place, RNi(year), that is, extrapolation to one year.

  3. 3)

    Using (2) to estimate the radon gas exposure of an individual over a certain time period prior to the measurement, RNi(residency), that is, extrapolation to prior years.

  4. 4)

    Using (3) for the current home or for all homes inhabited during a certain time period T as operationally defined predictor for residential radon exposure, RN(T).

  5. 5)

    Using (4) as a surrogate for the true alpha dose, D(T).

The stages (1), (2) and (3) describe the deviation between the observed predictor and the operationally defined predictor, stage (4) the deviation between the operationally defined predictor and “exposure”, and stage (5) the deviation between “exposure” and “dose”.

Regarding stage (1), there is (a) the error from between-measurement-variability, that is the deviation between measurements obtained repeatedly at the same time and place.

A measurement by alpha track detectors involves the exposure of a small box of specific geometry containing a thin foil. The emitted alpha particles leave a small trace (track) on the foil. In order to count these tracks, the foil is etched. The specific number of counts of a randomly chosen area of the foil is obtained manually or by a computer program, and the number of counts per unit is then calculated. The exposure of the detector to radon is derived from the track density on the foil by taking into account the average background track densitiy on similar foils and the sensitivity of the foil to radon exposure, determined by calibration Thus, this error component includes the error from background track density (number of tracks observed on a detector not exposed to radon), the error from miscounting the number of tracks, the error from variations in track counting efficiency, the error from calibration, and the error from underestimating high exposure, when the tracks are so close together to cause difficulty in distinguishing them after etching.

Further, there are (b) the error from between-laboratory-variability (not applicable for the German studies, since all measurements were conducted by the same laboratory), (c) the error from between-detector-placement-variability due to the variation of the radon concentrations depending on the placement in the room, (d) the error from between-room-variability due to the fact that radon concentrations in the rooms without measurements differ from the radon concentration in the living room, which was used as proxy for the concentrations in the other rooms (except the bedroom).

Regarding stage (2), there is (a) the error from between-season-variability due to seasonal variation of the radon concentration and applicable, if a measurement of less than a year is used to estimate the 1-year-average (not applicable for the German studies, where only 1-year-measurements were used). If seasonal correction is applied, (2a*) an error from statistical uncertainty in estimating the correction factor remains and (2a**) an error from assigning a group-matched correction factor is introduced. (One factor is assigned to all individuals with a certain sesaonal pattern.)

Regarding stage (3), there is (a) the error from between-year-variability from radon concentrations” year-by-year variation due to differences in the weather and the habits of the occupants, and (b) the error from between-subphase-variability. We define the period of time that a house remains without radon-relevant changes as a subphase due to the fact that the radon concentration during the measurement differs from the concentrations before radon-relevant alterations to the home (Gunby et al., 1993). If the operationally defined predictor takes into account homes other than the current home (RNCUM(T) or TWM(T)), there is (c) the error from between-owner-variability arising from the different ventilation habit of the current owners of the study subject's previous homes, which leads to conditions in the home during the measurement different from the conditions during the residency of the study subject. If correction from information on the change of the average radon concentration by certain house alterations or ventilation habit is performed (Gerken et al., 2000), an error from the statistical uncertainty of estimating the correction factor, (3b*) or (3c*), and an error from assigning a group-matched correction factor, (3b**) or (3c**), is introduced. (A constant multiplicative effect on the radon concentration is assigned to all houses with a certain pattern in house characteristics or ventilation differences.)

Regarding stage (4), there is (a) the error from the differences in the ventilation habit depending on room and daytime.

This error is due to the fact that the detectors measure average radon concentration for the full day, but the bedroom is occupied during the night and the other rooms during the day. If the bedroom is ventilated more during the day than at night, the measured bedroom concentration underestimates the concentration during the bedroom's occupancy; if a participant sleeps with window open and the window is closed during the day, the measured radon concentration overestimate the concentration during the occupancy. This induces a random error, if it can be assumed that there is no systematic pattern in the day–night cycle of ventilating the rooms across all study participants (that is some participants sleep with window open, some with window closed).

Further, there is (b) the error from between-environment-variability due to the fact that the radon concentration in residential environments other than the principal home are usually not measured and assumed to be as high, on average, as the principal home. (This error is lessened in the German studies by including only subjects with home occupancy of at least 25%.) Note that it is “residential radon exposure” that is considered here, which does not include the exposure at the workplace. There are (c) the error from between-home-variability (for RN1) from using the radon concentration in the current home as proxy for the average radon concentration in all homes inhabited during the relevant time period, (d) the error from false recall of the residency time (for (RNCUM(T) or TWM(T)), (e) the error from false recall of the relative bedroom occupancy, (f) the error from false recall of the absolute house occupancy (for (RNCUM(T)), (g) the error from mis-specifying the relevant exposure-window (for (RNCUM(T) or TWM(T)) due to the fact that a time period other than T may be relevant for the lung cancer genesis at index date, and (h) the error from ignoring the absolute house occupancy (for RN1 and TWM(T)).

Regarding stage (5), there are (a) the uncertainty in determining the equilibrium factor and (b) the error from between-person-variability due to the fact that the lung doses of persons with the same radon and radon progeny exposure vary due to respiratory differences.

Classification of Error Components: Classical versus Berkson

In Table 1, the error components corresponding to each of the five stages are summarized indicating the dependence on the operationally defined predictor, the applicability to the German radon studies and the classification into Berkson or classical type. We used different arguments to classify errors as classical error or Berkson error:

Table 1 Components of error in assessing residential radon exposure or alpha dose indicating the applicability depending on the operationally defined predictor, RN1, TWM(T), RNCUM(T) by x.

Classical I: Repeated observations, given all other error components were nonexistent, would yield different values and vary about the true value: Applied for the components 1a, e, 2a, 3c, 4a, d–g.

If the radon gas concentration measurements were repeated under identical conditions (1a), if the measurement was repeated in a newly specified house (1e), if the time period of the detector exposure covered a different period of the year (2a), if the measurement was repeated in the same house with again a different owner (3c), if the measurement was repeated with different day–night variation in ventilation of the rooms (4a), if the participant was interviewed again (4d–f), if the observation was repeated with a different exposure-window (4g), the new observation would differ from the original.

Classical II: One measurement is used as proxy for the average (Repeated observations would vary about the average): Applied for the components 1d, 3a, b, 4b, c.

The measurement in the living room is a proxy for all rooms (1d); the measurement during 1 year is a proxy for the average over all relevant years (3a); the measurement of the current subphase is a proxy for the average over all subphases (3b); the measurement in the current principal home is a proxy for the average of all currently occupied homes (4b) or for the average of all principal homes inhabited during the relevant time period (4c).

Classical III: Uncertainty in the estimation of a correction factor (sampling error): Resampling, that is, the repetition of the observation with a different sample of participants, would yield different correction factors. Applied for 2a*, 3b*, c*.

Berkson I: A group's observation is assigned to each individual in the group, but the individual's values differ within each group. Applied for 4h, 5a, b.

A certain level of RN1 or TWA(T) is assigned to a group of persons regardless of their absolute home occupancy; a certain level of radon gas exposure is assigned to a group of persons regardless of the specific equilibrium factor in their environment (5a) or of the persons” specific respiratory characteristics (5b), which may cause differing exposure to radon progeny (for 5a) and differing lung dose (5a,b).

Berkson II: A correction factor derived for a group of individuals with certain characteristics in common is assigned to all individuals of this group: Applied for 2a**, 3b**, c**.

A certain factor is assigned to all individuals with the same seasonal pattern (2a**), to all individuals with the same radon-relevant house alterations (3b**), or to all individuals with the same changes in ventilation habit between current house owner and study participant.

Evidence of Multiplicative Error Structure

Information on the error from between-measurement-variability under epidemiological conditions is provided by the bedroom and living room measurements of the German case–control studies. Plotting the measurements versus their mean within house (Figure 3) shows the “trumpet” on the original scale and the “tube” on the log-scale indicating a multiplicative structure of this error component (compare Figure 2). The analogous graph of the year-by-year data (Figure 4) presents a similar picture of a rather multiplicative structure of the error from between-year-variability. However, the smaller number of measurements can clearly be viewed, and a glance at the unit of the axes labelling, 1000 Bq/m3 instead of 1 Bq/m3 in Figure 3, shows that the radon concentrations encountered in these houses do not reflect the epidemiological situation. The intercomparison data provides, again, information on the error from between-measurement- variability. Figure 5 displays the original data by house. (The analogous graph to Figures 3 and 4 is not displayed, since it would be un-informative due to sparse data.) It shows that the between-measurement-variability is small, but slightly larger for the two most highly exposed houses (houses 3 and 4) hinting at a multiplicative error structure.

Figure 3
figure3

Bedroom/living room measurements: Radon concentrations for controls of the German East study with both rooms at the same floor versus the mean of the two measurements (a) on the original scale and (b) on the log-scale.

Figure 4
figure4

Year-by-year data: Radon concentrations (“corrected” for between-measurement error by taking the mean of the two measurements in the same year and home) versus the mean of these measurements in one house (a) on the original scale and (b) on the log-scale.

Figure 5
figure5

Intercomparison data: radon concentrations by house.

Error Size

The conversion table of several measures of multiplicative error sizes, Table 2, shows that the two definitions of the CV yield similar results for small errors, but that the difference increases rapidly for errors larger than 0.3. The SD of the log of the error is close to the CV defined as SD divided by the mean with the difference increasing, again, with error size. Table 3 relates the SD of the log of the error to the percentage of the error variance compared to the observed radon exposure variance (on the log-scale for multiplicative error). These table entries are data-dependent, that is, for a given percentage, the corresponding SD depends upon the study data and is here given for the German West study. (Similar results are obtained from the East study.) Table 4 shows the range of radon exposure values that is likely to be observed given a certain true radon exposure level and a certain classical error. For example for an error of 0.4, measurements from 22 to 111 Bq/m3 can be expected to be observed given a true radon exposure of 50 Bq/m3.

Table 2 Several measures of size for multiplicative error.
Table 3 Error size for German case–control studies
Table 4 Error size: range of 95% of the observed radon exposures (on original scale) given a certain true radon exposure and error size.

The results of the ANOVA of the replicate data are summarized in Table 5. It can be seen that the estimated size of the error from between-measurement-variability is 0.07 (year-by-year data, ANOVA model (3), 0.10 (Intercomparison data, ANOVA model (1)), 0.28, or 0.33 (bedroom/living room measurements, ANOVA model (2)) depending on the analysed data set. In the case of the bedroom/living room measurements, we need to ascertain that the room difference is sufficiently controlled for. The measured radon concentrations in the bedrooms are overall about 10% (30%) higher than those in the living rooms in the West (East) study. Adding a fixed effect of the floor difference between the rooms did not influence the error size estimate, but reduced the effect of the bedroom to 5% (20%) in the West (East) study. Repeating the application of ANOVA model (2) for the sample reduced to only those individuals with bedroom and living room on the same floor yields similar results. The size of the error caused by between-year-variability is estimated as 0.58 (year-by-year data, ANOVA model (3)). The estimate of the size of both error components combined, 0.55, is smaller than the “sum” of the error sizes (year-by-year data, ANOVA model (4)), hinting at a correlation of the two error components. Including a fixed effect of the year and graphical inspection (not shown) shows no effect (P-value=0.2) of the years and certainly no trend over the years. The results without house 11, the most influential, reported in parentheses in Table 5, show smaller estimates.

Table 5 Error size estimates from replicate data

Discussion

The importance of high-quality exposure assessment and the need to minimize errors in the exposure is well acknowledged in epidemiology. Whereas the statistical methodology for dealing with such errors in the estimation of disease risk has been around for already 20 years (Carroll et al., 1984; Rosner et al., 1989), the increasing awareness of their practical implications in epidemiology called for “a renaissance of measurement error” (Michels, 2001). Frequently, the discussions of published epidemiological studies include a note-of-caution regarding the involvement of measurement error in the exposures hinting at a potential attenuation of risk estimates. However, there is often a lack of clear differentiation between classical and Berkson error despite their greatly different impact.

Figure 6 shows several theoretically observed dose–response curves. The x-axis shows the exposure — here mimicking the German radon study situation (range of 0–400 Bq/m3); however, this quantity can be any other epidemiological continuous exposure. The y-axis shows the RR that would be observed, if the exposure was measured with a certain error and the RR was not corrected for it. Each curve shows the increase of the RR under the logistic model for increasing exposure assuming various error models. The four curves correspond to four different error models, the additive classical error, the multiplicative classical error, the additive Berkson error, or the multiplicative Berkson error. The curve labelled “none=add Berk” shows the true dose–response curve with an RR of 1.12 per 100 units of the exposure, that is, the RR under a normal logistic model without errors in the exposure. The curves are drawn based on the expected exposure given the observed exposure and given a certain error model (following the reasoning of the regression calibration method). This figure clearly indicates that classical error attenuates the dose–response curve, in the case of multiplicative error even inducing a spurious curvature, that additive Berkson error has no effect on the risk estimate and that multiplicative Berkson error, if any, slightly intensifies the dose–response relationship. However, the fact that the impact of the Berkson error on the risk estimate (point estimate) is negligible does not mean that the Berkson error can be ignored, since the estimate's precision may suffer severely.

Figure 6
figure6

Theoretically observed dose–response when the dose is measured with additve (SD=50) or multiplicative (SD on log-scale=0.4) error of the classical or Berkson type and a true RR of 1.12 (see “none”).

In the statistical literature, new error correction methodology for error models combining both classical- and Berkson-type have recently been developed and applied (Reeves et al., 1998; Schafer et al., 2001; Heid, 2002). For the correct application of these methods, the foremost prerequisite is the meticulous identification of all sources of error in the exposure assessment, their correct classification (classical versus Berkson) and collection of information on error structure (additive versus multiplicative) and size, which we provided for residential radon exposure assessment via air measurements with particular reference to the German lung cancer studies. Random error in the exposure assessment from the physical process of measuring radon gas concentrations were previously studied in great detail (Wrixon et al., 1988; Hardcastle and Miles, 1996), but for the errors in the epidemiological setting, only a very crude list of such errors was described so far (Bäverstam and Swedjemark, 1991, Lubin et al., 1995). A detailed revalidation is of immediate interest in the light of the on-going implementation of a new assessment procedure of residential radon exposure by measuring polonium in glass objects (e.g. Lagarde et al., 2002).

Further, we showed the usefulness of the concepts “predictor of interest” and “operationally defined predictor”, their impact on the applicability of error components, on size and predominant type of the error. We found that “external residential radon exposure” as lung cancer predictor involves almost no Berkson error component, whereas the predictor “lung dose” introduces a Berkson error. Further, it became clear that, however the choice of the operationally defined predictor, there is a trade-off to be made: The TWM(T) radon concentration in all relevant homes, rather than the concentration in the current home, is “closer” to the predictor of interest, but more difficult to measure.

Next to the differentiation between classical- and Berkson- type error, it is the error size that is generally very influential on the impact of the measurement error on risk estimation. The computation formulas (Methods Section) and the conversion table (Table 2) should guide the reader through the various measures of error size. Additionally, the size of the error referred to as “percentage of the error variance compared to the observed exposure variance” (Table 3) is particularly valuable to put the error size in the right perspective when considering real data. For classical error, this percentage describes the proportion of the observed radon exposure variance that is explained by the error and which would disappear, if the variable was measured without error. For Berkson error, it is the proportion, by which the true exposure variance exceeds the observed exposure variance and by which the observed exposure variance would increase, if the error was eliminated. For example, a classical multiplicative error with SD (on the log-scale) of 0.4, which was suggested as reasonable for the German studies (Heid., 2002), explains 50% of the observed radon exposure variance (on the log-scale); a Berkson error of this size would indicate that the true exposure variance is 1.5 times as large as the one observed. For a classical error of 100%, all the observed exposure variance would be due to error, the true exposure variance would be zero, which is the reason for the classical error not exceeding 100%. The 300% Berkson error indicates that the true exposure variance trebles the observed exposure variance. This proportion is thus an indicator for what the error does to the exposure variance, but even more, for classical error, it is a measure for the impact of the error on risk estimates across data sets. For example, a multiplicative classical error with SD (on the log-scale) of 0.48, as estimated for the English radon study (Darby et al., 1998), explains about 20% of the observed radon exposure variance in this study, but would explain over 65% in the German studies. The impact of such an error is, hence, more severe in the German studies. Note that the Berkson error's impact does not depend on the exposure variance and is thus the same across data sets assuming the same underlying error size and risk (Heid et al., 2002), which is of interest in meta-analyses.

A real example of the error size of two error components, the errors from between-measurement-variability and from between-year-variability, is given by three data sets with repeated measurements: The size of the error from between-measurement-variability is 0.07 (year-by-year data), 0.10 (intercomparison data), and 0.3 (bedroom/living measurements) depending on the analysed data set. These differences indicate the importance of having a critical look at the replicate data. In the year-by-year data, the houses with very high radon concentrations were not representative for the epidemiological situation. The intercomparison data, while overcoming this problem, are rather sparse and the laboratory personnel were aware that their results were being evaluated. The value of 0.1 obtained in a controlled exercise with a limit number of measurements can thus be viewed as a minimum error size for the epidemiological studies, where over 10,000 measurements were conducted by this laboratory. The bedroom/living room measurements have the advantage of being available for nearly all study participants and of being conducted under epidemiological field conditions. However, we showed that there are differences between the radon concentrations in the rooms, which were beyond that due to different floor levels and which we might not have been able to completely correct for in the error size estimation. The error from between-year-variability, estimated as 0.58 (year-by-year data) is quite large compared to the between-measurement-error. However, again, the fact that these data are unrepresentative calls for caution. It should be noted that the estimated error variance of the combination of both between-year-variability and between-measurement-variability is smaller than the sum of both, which may well be due to correlations between the components. It is therefore of no use to estimate each error component's size separately and to take their sum as the total error size.

The multiplicative error model in the assessment of residential radon exposure is already established (Gunby et al., 1993; Lagarde et al, 1997; Darby et al., 1998). We find that the replicate measurements provide further evidence of the multiplicativity by graphically viewing the mean versus single measurements (Figures 3 and 4).

Conclusions

We conclude that, generally in epidemiology, clear differention between classical and Berkson error components is essential in the assessment of error sources and for establishing an error model, a fact which we believe is not fully acknowledged. This differentiation is crucial due to the different impact of these two error types. The classical error can induce severe bias on the risk estimate; multiplicative classical error can even distort the dose–response curve. This bias can be reduced by using the mean of multiple measurements in the analysis requiring internal replicates for each individual, or it can be corrected for by using the information from (internal or external) replicate measurements for a subgroup. Also, the spuriously narrow confidence intervals for the risk estimates in the presence of classical error in the exposure, which are yielded without error correction, can be corrected. The analysis of our replicate data, their usefulness, and the struggle with their limitations motivate our recommendation for more internal repeated measurements in future epidemiological studies (e.g., in radon studies, more than one detector per room and repeated measurements over a series of years). At first glance, the Berkson error is less problematic, since it does not induce notable bias on the risk estimates. However, it weakens the precision of the estimates, which is often more difficult to correct for than in the classical situation due to the problem of grasping the extent of the Berkson error. For example, the lung dose is hard to measure and such a measurement would reintroduce classical error. Just replicating measurements does not help in the Berkson case. Simplified, classical error is rather related to the measurement process, whereas Berkson error is often a matter of defining the exposure: Using fixed monitors (e.g. using the distance of a home to the next power station as predictor instead of individual measurements), using measurements in the environment (e.g. residential radon exposure instead of lung dose), or using a person's affiliation to a group in order to use the exposure assigned to this group (e.g. using job-exposure matrices) instead of personal monitors is a question of how to define the exposure; it induces Berkson error.

The general statement that “well-behaved” (random, non-differential, homoscedastic) errors attenuate regression coefficients applies only to the classical error. It should be kept in mind, that considering more precise (or more relevant) exposures and thus inducing more potential sources of error does not necessarily increase the bias of the risk estimates (compare to Lubin et al., 1995). For example, extending the definition of the predictor from “residential radon exposure” to “lung dose” induces Berkson error and does not attenuate the risk estimate.

To assume the sum of both error type's sizes as known and to vary the percentage of the Berkson error is no option in such situations (see Mallick et al., 2002). We support instead a two-dimensional view of measurement error, that is, a classical-type dimension and a Berkson-type dimension, where the size of each dimension needs to be studied separately. The full error is represented in the continuum of a two-dimensional space (compare with Zeger et al., 2000). Modern exposure assessment should therefore not only aim to be as accurate and precise as possible, but should also provide a model of the measurement errors that unavoidably remain with clear differentiation of classical and Berkson components.

References

  1. Armstrong B.G. The effects of measurement errors on relative risk regression. Am J Epi 1990: 132(6): 1176–1184.

  2. Bäverstam U., and Swedjemark G.A. Where are the errors when we estimate radon exposure in retrospect? Radiat Prot Dosim 1991: 36(2/4): 107–112.

  3. Carroll R.J., Ruppert D., and Stefanski L.A. Measurement Error in Nonlinear Models. Chapman & Hall, London, 1995.

  4. Carroll R.J., Spiegelmann C., Lan K.K., Bailey K.T., and Abbott R.D. On errors-in-variables for binary regression models. Biometrika 1984: 74: 19–26.

  5. Darby S., Whitley E., Silcocks P., Thakrar B., Green M., Lomas P., Miles J., Reeves G., Fearn T., and Doll R. Risk of lung cancer associated with residential radon exposure in south-west England: a case–control study. Br J Cancer 1998: 78(3): 394–408.

  6. Gerken M., Kreienbrock L., Wellmann J., Kreuzer M., and Wichmann H.E. Models for retrospective quantification of indoor radon exposure in case-control studies. Health Phys 2000: 78(3): 268–278.

  7. Gunby J.A., Darby S.C., Miles J.C.H., Green B.M.R., and Cox D.R. Factors affecting indoor radon concentration in the United Kingdom. Health Phys 1993: 64: 2–12.

  8. Hardcastle G.D., and Miles J.C.H. Ageing and fading of alpha particle tracks in CR-39 exposed to air. Radiat Prot Dosim 1996: 67: 295–298.

  9. Heid I.M. Measurement error in exposure assessment: an error model and its impact on studies of lung cancer and residential radon exposure in Germany. PhD Thesis, 2002. http://edoc.ub.uni-muenchen.de/archive/00000522/.

  10. Heid I.M., Küchenhoff H., Wellmann J., Gerken M., Kreienbrock L., and Wichmann H.E. On the potential of measurement error to induce differential bias on risk estimates: an example from radon epidemiology. Stat Med 2002: 21: 3261–3278.

  11. International Commission on Radiological Protection (ICRP). Lung cancer risk from indoor exposures to radon daughters. ICRP Publ Nr. 50. Pergamon Press, New York, 1994.

  12. Jacobi W. The dose to the human respiratory tract by inhalation of short-lived 222Rn-and 220 Rn-decay products. Health Phys 1964: 10: 1163–1174.

  13. Jacobi W. Dose to tissue and effective dose equivalent by inhalation of radon-222, radon-220 and their short-lived daughters. GSF-report S-626, Neuherberg,, 1989.

  14. Kreienbrock L., Kreuzer M., Gerken M., Dingerkus G., Wellmann J., Keller G., and Wichmann H.E. Case-control study on lung cancer and residential radon in West Germany. Am J Epidemiol 2001: 153(1): 42–52.

  15. Kreienbrock L., Poffijn A., Tirmarche M., Feider M., Kies A., and Darby S.C. Intercomparison of passive radon-detectors under field conditions in epidemiological studies. Health Phys 1999: 76(5): 558–563.

  16. Kreuzer M., Heinrich J., Wölke G., Schaffrath Rosario A., Gerken M., Wellmann J., Keller G., Kreienbrock L., and Wichmann H.E. Residential radon and risk of lung cancer in Eastern Germany. Epidemiology 2003: 14: 559–568.

  17. Lagarde F., Falk R., Almren K., Nyberg F., Svensson H., and Pershagen G. Glass-based radon-exposure assessment and lung cancer risk. J Expo Anal Environ Epidemiol 2002: 12: 344–354.

  18. Lagarde F., Pershagen G., Akerblom G., Axelson O., Bäverstam U., Damber L., Enflo A., Svartengren M., and Swedjemark G.A. Residential radon and lung cancer in Sweden: risk analysis accounting for random error in the exposure assessment. Health Phys 1997: 72: 269–276.

  19. Lubin J.H., Boice J.D., Edling C.H., Hornung R., Howe G., Kunz E., Kusiak A., Morrison H.I., Radford E.P., Samet J.M., Tirmarche M., Woodward A., Xiang Y.S., and Pierce D.A. Radon and lung cancer risk: a joint analysis of 11 underground miner studies. NIH publication no. 94-3644, Rockville, MD, 1994.

  20. Lubin J.H., Boice Jr J.D., and Samet J.M. Errors in exposure assessment, statistical power and the interpretation of residential radon studies. Radiat Res 1995: 144: 329–341.

  21. Mallick B., Hoffmann F.O., and Carroll R.J. Semiparametric regression modeling with mixtures of Berkson and classical error, with application to fallout from the Nevada test stite. Biometrics 2002: 58: 13–20.

  22. Michels K.B. A renaissance for measurement error. Int J Epidemiol 2001: 30: 421–422.

  23. National Academy of Sciences (NAS) National Research Council. Health effects of exposure to radon: time for reassessment? BEIR VI Report of the Committee on the Biological Effects of Ionizing Radiation, National Academy Press, Washington, DC, 1994.

  24. Pershagen G., Axelson O., Clavensjö B., Damber L., Desai G., Enflo A., Lagarde F., Mellander H., Svartengren M., Swedjemark G.A., and Akerblom G. Residential radon exposure and lung cancer in Sweden. N Engl J Med 1994: 330: 159–164.

  25. Poffijn A., Tirmarche M., Kreienbrock L., Kayser B., and Darby S.C. Radon and lung cancer: protocol and procedures of the multi-centre studies in the Ardennes-Eifel region, Brittany, and the Massiv Central. Radiat Prot Dosim 1992: 45(Suppl 1/4): 651–656.

  26. Reeves G.K., Cox D.R., Darby S.C., and Whitley E. Some aspects of measurement error in explanatory variables for continuous and binary regression models. Stat Med 1998: 17: 2157–2177.

  27. Rosner B., Willett W.C., and Spiegelman D. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Stat Med 1989: 8: 1051–1069.

  28. Schafer D.W., Lubin J.H., Ron E., Stovall M., and Carroll R.J. Thyroid cancer following scalp irradiation: a reanalysis accounting for uncertainty in dosimetry. Biometrics 2001: 57: 689–697.

  29. Tosteson T.D., Stefanski L.A., and Schafer D.W. A measurement-error model for binary and ordinal regression. Stat Med 1989: 8: 1139–1147.

  30. Wichmann H.E., Gerken M., Wellmann J., Kreuzer M., Kreienbrock L., Keller G., Wölke G., and Heinrich J. Lungenkrebsrisiko durch Radon in der Bundesrepublik Deutschland (Ost) - Thüringen und Sachsen (in German). Fortschritte in der Umweltmedizin. ecomed verlagsgesellschaft, 1999.

  31. Wichmann H.E., Kreienbrock L., Kreuzer M., Gerken M., Dingerkus G., Wellmann J., and Keller G. Lungenkrebsrisiko durch Radon in der Bundesrepublik Deutschland (West) (in German). Fortschritte in der Umweltmedizin. ecomed verlagsgesellschaft, 1998.

  32. Wrixon A.D., Green B.M.R., Lomas P.R.M., Miles J.C.H., Cliff K.D., Francis E.A., Driscoll C.M.H., James A.C., and O’Riordan M.X. Natural radiation exposure in UK dwellings. NRPB R-190, 1988.

  33. Zeger S.L., Thomas D., Dominici F., Samet J.M., Schwartz J., Dockery D., and Cohen A. Exposure measurement error in time-series studies of air pollution: Concepts and consequences. Environ Health Perspect 2000: 108(5): 419–426.

Download references

Author information

Correspondence to I M Heid.

Rights and permissions

Reprints and Permissions

About this article

Keywords

  • measurement error
  • Berkson error
  • error models
  • error sources
  • radon
  • case–control studies.

Further reading