Article

Journal of Exposure Analysis and Environmental Epidemiology (2001) 11, 414–421. 10.1038/sj.jea.7500182

Improved non-negative estimation of variance components for exposure assessment

CHAVA PERETZ1 and DAVID M STEINBERG2

  1. 1The Department of Epidemiology, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
  2. 2The Department of Statistics and Operations Research, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, Israel

Correspondence: Chava Peretz, The Department of Epidemiology, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel. Tel.: +972-3-640-9867. Fax: +972-3-641-0555. E-mail: cperetz@post.tau.ac.il

Received 24 July 2001.

Top

Abstract

Hygiene surveys of pollutants exposure data can be analyzed by analysis of variance (ANOVA) model with a random worker effect. Typically, workers are classified into homogeneous exposure groups, so it is very common to obtain a zero or negative ANOVA estimate of the between-worker variance (sigmaB2). Negative estimates are not sensible and also pose problems for estimating the probability (theta) that in a job group, a randomly selected worker's mean exposure exceeds the occupational exposure standard. Therefore, it was suggested by Rappaport et al. to replace a non-positive estimate with an approximate one-sided 60% upper confidence bound. This article develops an alternative estimator, based on the upper tolerance interval suggested by Wang and Iyer. We compared the performance of the two methods using real data and simulations with respect to estimating both the between-worker variance and the probability of overexposure in balanced designs. We found that the method of Rappaport et al. has three main disadvantages: (i) the estimated sigmaB2 remains negative for some data sets; (ii) the estimator performs poorly in estimating sigmaB2 and theta with two repeated measures per worker and when true sigmaB2 is quite small, which are quite common situations when studying exposure; (iii) the estimator can be extremely sensitive to small changes in the data. Our alternative estimator offers a solution to these problems.

Keywords:

ANOVA estimator, bias adjustment, exposure assessment, hygiene surveys, repeated measures, variance component

Top

Introduction

Recently, analysis of variance (ANOVA) random effects models have been applied to data sets consisting of repeated measurements of pollutants within factories in order to identify determinants of exposure and estimate within- and between-worker variance components. The within-worker variance in these studies reflects day-to-day variations in the levels of exposure to pollutants, which often vary greatly. Between-worker variance, on the other hand, is often rather small due to the use of homogeneous exposure groups. Thus, the variance ratio lambda(=sigmaB2/sigmaW2) may be quite small. As a result, when analyzing data using ANOVA random effects models, it is very common to obtain a zero or negative estimate of the between-worker variance. In many applications, it is common practice to report such negative values as zeros.

The occurrence of negative or zero between-worker ANOVA variance estimates causes a number of problems. First, zero between-worker variance appears to be an unrealistic result since it implies that all workers have the same mean exposure. This contradicts common industrial hygiene experience. Furthermore, in exposure assessment in epidemiological studies and for hazard control, the probability theta of overexposure is often of more interest than the variance components themselves. This is the probability that in a job group, a randomly selected worker's mean exposure exceeds the occupational exposure standard, where the worker's mean exposure is relevant to the risk of chronic adverse health effects (Rappaport et al., 1995). The probability of overexposure depends on both sigmaB2 and sigmaW2. Common practice is to adopt a "plug in" approach in which sigmaB2 and sigmaW2 are estimated and their estimates are inserted into the formula for theta. This approach is impossible to employ when the estimate of sigmaB2 is zero or negative. Finally, the variance ratio should have implications for planning future sampling design. Small variance ratios imply that it may be advantageous to sample fewer individuals but at more time points.

The estimation of the probability of overexposure (point estimator) becomes meaningless when a zero or negative between-worker variance estimate appears. Therefore, it was suggested by Rappaport et al. (1995) to replace a negative or zero estimate with an approximate one-sided 60% upper bound, as derived from formulas of Williams and cited in Searle et al. (1992). This practice is based on empirical evidence that such a procedure has minimal impact on significance levels and statistical power. This proposal does have some drawbacks. Many negative ANOVA estimates are not adjusted to positive values and the estimator is very sensitive to small changes in the data.

This article develops an alternative — the bias-corrected variance component estimator — based on the upper tolerance interval suggested by Wang and Iyer (1994) to deal with the problem of negative variance component estimates. We compare the performance of the two methods using real data and simulations, focusing on the estimation of probabilities of overexposure (beyond standards) in balanced designs.

Top

ANOVA method

We briefly review the ANOVA, or least squares (LS), method for estimating variance components in a balanced one-way random effects model. We denote: k=number of subjects in a group; n=number of repeated measurements obtained from each subject in the group:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The estimators of the between-subject (sigmaB2) and within-subject (sigmaW2) variance components are:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

For more details, see Searle et al. (1992).

Top

An example from real data: lead exposure

Nineteen workers at two Car Battery Producers in Israel were repeatedly measured to study their annual exposure to lead. They were randomly selected — 9 workers in the first factory and 10 in the second — to represent those exposed to the main processes (details can be found elsewhere; Peretz et al., 1997). Ten hygiene surveys, with intervals of 3–7 weeks, were performed in each factory over the course of a year. Due to missing data (absence of workers, etc.), each worker had 6–10 repeated measures. We have taken the first six measures of each worker, and estimated the variance components sigmaB2 and sigmaW2 factory. According to Israel's regulations for factories with exposure to lead, it is mandatory to conduct two hygiene surveys each year.

In order to highlight the sensitivity of the sigmaB2 estimator, we have created new data sets, each including just two repeated measures out of the six. In total, we had 15 sets of data with two repetitions for each factory. The exposure level was taken as a log transformation of the TLV 1 fraction (=log(concentration/TLV)) (Peretz et al., 1997). The TLV–TWA2 standard for occupational lead exposure according to Israel's Regulations is 0.1 mg/m3.

Table 1A shows summary measures of the estimators in each factory, in comparison to the original estimators (="accurate") based on six repetitions. It can be seen that a negative sigmaB2 estimate resulted from 40% of the series in the first factory (with true lambda=.17) and from 20% of the series in the second factory (with true lambda=.09). In addition, the ANOVA estimators for lambda were quite poor. This reinforces the importance of performing more than two repeated surveys per year. In practice, though, many surveys are limited to two measurements as mentioned above for lead exposure. So the example also highlights the need for statistical methods that can cope small samples. Table 1B shows summary measures of the estimators if four repeated surveys were performed in each factory. One can see the improvement in the estimation when doubling the number of repeated measurements per subject. The MSE Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author is reduced by about 75% in the two factories.


Top

Estimating theta in the presence of a negative ANOVA estimate of sigmaB2

Overexposure

For hazard control, the probabilitytheta of overexposure is very important. We present here the basic equations for overexposure as derived by Rappaport et al. (1995). They followed the common assumption that the exposure xij of worker i on day j follows a log normal distribution with:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where muy is the mean of the overall logged exposure distribution in the group, alphai is a random effect for the ith worker and alt epsilonij is the within-worker random error.

It was furthermore assumed that: alphaiapproxN(0,sigmaB2), where alphai's are all independent; alt epsilonijapproxN(0,sigmaW2), alt epsilonij's are all independent. sigma2=sigmaW2+sigmaB2; sigmaB2=variance between workers; sigmaW2=variance within workers.

This model is applied to homogeneous work groups consisting of workers who perform similar tasks and therefore should have similar exposures. A worker is considered overexposed if his mean value muxi (conditional on alphai) exceeds a standard limit (S). The probability theta that a randomly selected person from a work group is overexposed is thus:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The relationship between sigmaB2 and theta for different values of C=mux/S (for sigmaW2=.5) is presented in Figure 1. It can be seen that when .5less than or equal tocless than or equal to1.0, theta has a maximum and then decreases, with little sensitivity to sigmaB2. Therefore, as theta is calculated based on an estimate of sigmaB2, the estimate of theta is quite stable for sigmaB2 values, which are slightly larger than zero. There is a problem in the estimation of theta when sigmaB2 is near zero because theta is sensitive to sigmaB2 in that region and because the ANOVA estimate of sigmaB2 may be negative.

Figure 1.
Figure 1 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Relationship of the between-workers variance (Var-b) and the probability of overexposure (theta) for different values of c(=mux/Standard) for the group of workers (sigmaW2=.50).

Full figure and legend (3K)

Since nowadays there is an emphasis on making the exposure groups as homogeneous as possible, we may be faced with applications that have small values of lambda(=sigmaB2/sigmaW2).

Rappaport et al. Method

Rappaport et al. (1995) recognized the problems of negative between-worker variance component estimates for estimating overexposure probabilities and for testing for compliance to standards. They proposed the following alternative estimator.

Use the ANOVA estimate if Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author is positive. Otherwise, substitute Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author for Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author where Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author is an approximate 100(1-alpha)% upper confidence bound for sigmaB2, namely Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author. The upper bound derived by Williams (1962), and cited in Searle et al. (1992), is:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author and Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author represent random variables distributed as F with (k-1) numerator and k(n-1) denominator degrees of freedom and chi2 with (k-1) degrees of freedom, respectively. Rappaport et al. (1995) suggested using a 60% confidence bound. In a subsequent article, Lyles et al. (1997) used the same basic approach but with a 95%, rather than a 60%, approximate upper confidence bound for a negative Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author ANOVA estimate. Although the latter article dealt only with hypothesis testing, the 95% upper bound could also be used in estimating theta.

Some Drawbacks to the Rappaport et al. Estimator

We note here two problems with the between-worker variance component estimator proposed by Rappaport et al. First, the adjustment made to negative ANOVA estimates is often insufficient to produce a positive estimate. We illustrate this feature later in a simulation study.

Second, the fact that Rappaport et al.'s estimator only corrects negative ANOVA estimates makes it very sensitive to small changes in the data. According to Rappaport et al., when lambda(sigmaB2/sigmaW2)approximately0.1–0.2, negative estimated values of sigmaB2 could be observed as much as 30–40% of the time when k=10 and 2less than or equal tonless than or equal to4. This probability can be reduced by increasing the sample size; however, in reality, many occupational hygiene groups are of this order of magnitude, having two to four repeated measurements (Kromhout et al., 1993).

We illustrate the sensitivity of Rappaport et al.'s estimator with a simple example using simulated data with k=10, n=2, sigmaB2=.1 and sigmaW2=1. First, a random set was generated and, gradually, eight slight changes were made to create eight further sets, each with the same worker averages but with increasingly larger within worker residuals.

Table 2 presents the variance component estimates according to ANOVA and Rappaport et al.'s method with the 60% confidence bound.


From step 7 on, the ANOVA estimate, Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author, was negative. The Rappaport et al. estimate for sigmaB2 makes a sudden jump from very small to very large values at step 7 and thus is quite sensitive to small changes in the study data. A slight increase in the within-workers mean square could change a positive ANOVA estimate to a negative one, thus sharply increasing Rappaport et al.'s estimate. This change could lead to a much larger estimate of theta. Since the error term in exposure measurements is already known to vary greatly over time (contributing to the within worker variability), measuring the same exposure group at different times can easily produce negative Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author estimates.

Bias-Adjusted Variance Component Estimation (BAVCE)

We suggest an alternative estimate to overcome some of the limitations of the estimator proposed by Rappaport et al. Our method, which we call BAVCE, is based on the upper tolerance interval suggested by Wang and Iyer (1994).

It takes account of the fact that an upper confidence bound will typically be biased high and multiplies by a factor that attempts to adjust for this bias. The estimator is defined as follows:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where italic gamma is the confidence level (which we have taken to be .95), Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author and

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author.

The BAVCE estimator, like the others (Rappaport et al., 1995; Lyles et al., 1997), reduces the frequency of negative or zero estimates by subtracting less than the full value of MSW from MSB. However, their use of an upper confidence bound as an estimator almost guarantees an overestimate of sigmaB2. The factor omega2 in the BAVCE attempts to correct the upward bias. To see how the bias correction works, we present an approximation to the expected value of Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author by assuming that phi;=1-[sigmaW2/(nsigmaB2+sigmaW2)] is known. Then:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

and

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The bias correction is implemented by using a "plug-in" estimator of phi; in which the observed mean squares replace their expected values.

Top

Comparison of estimators on simulated data

Simulated Data

Simulations were run to compare the different estimators of sigmaB2 and theta. The estimators of sigmaB2 were the ANOVA estimator, the estimator of Rappaport et al. with a 60% bound (method 1) and with a 95% bound (method 1A) and the BAVCE proposed here (method 2). The estimators of theta were generated by plugging the estimators of sigmaB2 along with the ANOVA estimator of sigmaW2 and the sample average into Eq. (1) in Section 4. The simulations covered three different practical settings defined by the number of repetitions (n) and the number of subjects (k):

  1. 1000 data sets for k=10, n=2 (20000 observations);
  2. 1000 data sets for k=10, n=3 (30000 observations); and
  3. 1000 data sets for k=10, n=4 (40000 observations).

In addition, we examined several different values of sigmaB2. The within-subject variance sigmaW2 was held constant at 1 in all the simulations. Original values for phi; for n=2,3,4 were computed from Eq. (1). When the least squares estimate of the between-workers variance component sigmaB2 was negative, method 1 modified it to a larger value. The method 2 estimator increased all the sigmaB2 estimates, not just the negative ones.

Comparison of Estimators

Tables 3 and 4 present the estimators based on the simulated data for n=2,3,4, when original sigmaB2=.2 (Table 3) or sigmaB2=.05 (Table 4), which are representative of the results that we found for all the values of sigmaB2.



Tables 3A and 4A relate to the estimators when negative ANOVA estimates were found. Tables 3B and 4B relate to the estimators when positive ANOVA estimates were found. As was found previously, the ANOVA estimator of sigmaB2 was often negative for the cases we studied. In Table 3A (sigmaB2=.2, lambda=.2), we can see that more than 40% of the data sets for n=2,3,4 resulted in a negative sigmaB2 ANOVA estimate and in Table 4A (sigmaB2=.05, lambda=.05), we can see that the percentage was higher, over 50%.

A serious problem with Rappaport et al.'s method is that many negative estimates of sigmaB2 remained negative. The problem was especially acute with the 60% confidence bound. Even with sigmaB2=.20 and four replications per subject, almost 30% of the negative ANOVA estimates remained negative with this method. Using their method with a 95% confidence bound reduced the problem but did not eliminate it, with 7–10% of the negative ANOVA estimates remaining negative. Our method was much more successful in this regard. Negative estimates are automatically adjusted to 0 and these occurred in less than 4% of the cases with negative ANOVA estimates in all the settings we examined.

In conclusion, there is an estimation problem using method 1 when n=2 or 3 and sigmaB2/sigmaW2 is less than .20.

Top

An example

Survey on Pig Farmers' Exposure to Inhalable Endotoxins

In a study of 200 pig farmers from the south of the Netherlands, exposure to inhalable dust and endotoxins was monitored by personal sampling. Exposure was measured during one work shift on a randomly chosen day of the week, 1 day during the summer of 1991 and 1 day during the winter of 1992. Outdoor temperature was obtained from a monitoring station in the south of the Netherlands. Task activity patterns on the day of measurement and farm characteristics were also recorded (Preller et al., 1995). For the purpose of this paper, only the exposure data on endotoxins will be used on 153 farmers out of the 200 who had two measurements (the rest had some failure in the measuring process for one measurement). For the whole study population (n=153), the following estimates were calculated and they were considered to be the accurate parameters for the pig farmers: sigmaB2=.13, sigmaW2=.64, theta=.32 muy=7.81. We have taken the standard to be 8.29 (Standard=log(4000 ng/m3)=8.29) for this example.

We compared the different estimators of sigmaB2 by generating 100 subsamples. Each farmer was included/excluded from a particular sample by drawing a binomial random variable with probability 0.1 for inclusion.

For the 100 subsamples, meanplusminusSD of the muy values= 7.81plusminus0.15. The same parameters were estimated while sigmaB2 was estimated by the different methods (see Table 5).


In this example, only about 20% of the series resulted in negative ANOVA estimates. Thus, one might expect that our method, which always corrects Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author, might be less successful. Nonetheless, for theta, our estimate performed better than that of Rappaport et al., with a smaller SD especially for the samples with a positive ANOVA estimate. For Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author, Rappaport et al.'s estimate over the 100 samples seemed to perform better than our method.

This conclusion differs from our previous conclusion regarding the simulated data due to the different sample sizes. Here, on average, 15 subjects were included in each sample while in our previous samples, we had only 10 subjects per each sample.

Rappaport et al.'s estimator of sigmaB2 was more accurate with a 60% bound than with a 95% bound.

Top

Discussion

The use of Rappaport et al.'s approach for assessing compliance for hazard control is a new application. It has inherent statistical considerations and takes into account the variance components of the hazardous exposure based on real-life data sets and should be recommended for use. However, since it is a new tool, caution and further study are needed. In exposure data sets, ANOVA estimators for between-variance components are quite often negative (see sensitivity analysis).

The common practice of changing such negative values to zeroes prevents the application of popular plug-in estimators in compliance assessment and it also appears to be an unrealistic result since it implies that all workers have the same mean exposure.

The modified variance component estimator for negative values proposed by Rappaport et al. (1995) and Lyles et al. (1997) has three main disadvantages:

  1. It remains negative for some data sets.
  2. It performs poorly in estimating sigmaB2 and also theta when n=2 and when original sigmaB2 is quite small and 0.5timesStandardless than or equal tomuxless than or equal to1.0timesStandard, which are quite common situations when studying exposure.
  3. Discontinuous behavior: small changes in the data set can make the ANOVA estimator negative, resulting in the use of the modification, which may cause a large change in the conclusions of a study.

In this paper, we have proposed an alternative variance component estimator, the BAVCE, to cope with the problem of negative and zero between-worker ANOVA estimates. Our modification seems to react better than the estimator of Rappaport et al. as can be seen in the tables from our simulations and the simulated subsets of data.

We think that further thought should be given to analysis of data from unbalanced designs, which are common in real-life exposure data sets due to absence of workers and changes in work practices.

Here, exposure was measured in industry and agriculture. The same ideas can be applied to environmental exposure within the community.

Top

Notes

1 TLV=threshold limit value; a health-based concentration to which nearly all workers may be exposed without adverse effect.

2 TLV–TWA=threshold limit value, with respect to 8-h time-weighted average, that should not be exceeded during any part of the working day.

Top

References

  1. Kromhout, H., Symanski, E., and Rappaport, S. M. A comprehensive evaluation of within- and between- worker components of occupational exposure to chemical agents. Ann Occup Hyg. (1993) 37: 253–270. | Article | PubMed | ISI | ChemPort |
  2. Lyles, R. H., Kupper, L. L., and Rappaport, S. M. Assessing regulatory compliance via the balanced one-way random effects ANOVA model. J Agric, Biol Environ Stat. (1997) 2: 64–86.
  3. Peretz, C., Goldberg, P., Kahan, E., Grady, S., and Goren, A. The variability of exposure over time: a prospective longitudinal study. Ann Occup Hyg. (1997) 41: 485–500. | PubMed |
  4. Preller, L., Kromhout, H., Heederik, D., and Tielen, M. Modeling long-term average exposure in occupational exposure–response analysis. Scand J Work, Environ Health. (1995) 21: 504–512.
  5. Rappaport, S. M., Lyles, R. H., and Kupper, L. L. An exposure assessment strategy accounting for within- and between-worker sources of variability. Ann Occup Hyg. (1995) 39: 469–495. | PubMed | ISI | ChemPort |
  6. Searle, S. R., Casella, G., and McCulloch, C. E. Variance Components. Wiley, New York. 1992.
  7. Wang, C. M., and Iyer, H. K. Tolerance intervals for the distribution of true values in the presence of measurement errors. Technometrics. (1994) 36: 162–170.
  8. Williams, J. S. A confidence interval for variance components. Biometrica. (1962) 49: 278–281.
Top

Acknowledgements

We acknowledge Liesbeth Preller from Wageningen University, the Netherlands, for providing the pig farmers exposure data set.

Extra navigation

.
ADVERTISEMENT