Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A new method of longitudinal diary assembly for human exposure modeling

Abstract

Human exposure time-series modeling requires longitudinal time–activity diaries to evaluate the sequence of concentrations encountered, and hence, pollutant exposure for the simulated individuals. However, most of the available data on human activities are from cross-sectional surveys that typically sample 1 day per person. A procedure is needed for combining cross-sectional activity data into multiple-day (longitudinal) sequences that can capture day-to-day variability in human exposures. Properly accounting for intra- and interindividual variability in these sequences can have a significant effect on exposure estimates and on the resulting health risk assessments. This paper describes a new method of developing such longitudinal sequences, based on ranking 1-day activity diaries with respect to a user-chosen key variable. Two statistics, “D” and “A”, are targeted. The D statistic reflects the relative importance of within- and between-person variance with respect to the key variable. The A statistic quantifies the day-to-day (lag-one) autocorrelation. The user selects appropriate target values for both D and A. The new method then stochastically assembles longitudinal diaries that collectively meet these targets. On the basis of numerous simulations, the D and A targets are closely attained for exposure analysis periods >30 days in duration, and reasonably well for shorter simulation periods. Longitudinal diary data from a field study suggest that D and A are stable over time, and perhaps over cohorts as well. The new method can be used with any cohort definitions and diary pool assignments, making it easily adaptable to most exposure models. Implementation of the new method in its basic form is described, and various extensions beyond the basic form are discussed.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

References

  • Beaton G.H. Nutrient requirements and population data. Proc Nutr Soc 1988: 47: 63–78.

    Article  CAS  Google Scholar 

  • Chason-Tabor S., Rimm E.B., Stampfer M.J., Spiegelman D., Colditz G.A., Giovannucci E., Ascherio A., and Willett W.C. Reproducibility and validity of a self-administered physical activity questionnaire for male health professionals. Epidemiology 1996: 7: 81–86.

    Article  Google Scholar 

  • Geyh A.S., Xue J., Özkaynak H., and Spengler J.D. The Harvard Southern California Chronic Ozone Exposure study: assessing ozone exposure of grade-school-age children in two southern California communities. Environ Health Perspect 2000: 108: 265–270.

    Article  CAS  Google Scholar 

  • Graham S., and McCurdy T. Developing meaningful cohorts for human exposure models. J Expo Anal Environ Epidemiol 2004: 14: 23–43.

    Article  Google Scholar 

  • Harris I.R., Burch B.D., and St Laurent R.T. A blended estimator for a measure of agreement with a gold standard. J Agric Biol Environ Stat 2001: 6: 326–339.

    Article  Google Scholar 

  • Johnson N.L., Kotz S., and Balakrishnan N. Continuous Univariate Distributions, Volume 2. 2nd edn. John Wiley and Sons, New York, 1995.

    Google Scholar 

  • Johnson T. Recent advances in the estimation of population exposure to mobile source pollutants. J Expo Anal Environ Epidemiol 1995: 5: 551–571.

    CAS  PubMed  Google Scholar 

  • Koch G.G. Intraclass correlation coefficient. In: Kotz S., Johnson NL. (Eds.). Encyclopedia of Statistical Sciences, Vol. 4. John Wiley and Sons, New York, 1983.

    Google Scholar 

  • Lee K., Yanagisawa Y., Spengler J.D., and Davis R. Assessment of precision of a passive sampler by duplicate measurements. Environ Int 1995: 21: 407–412.

    Article  Google Scholar 

  • McCurdy T. Estimating human exposure to motor vehicle pollutants using the NEM series of models: lessons to be learned. J Expo Anal Environ Epidemiol 1995: 5: 533–550.

    CAS  PubMed  Google Scholar 

  • McCurdy T. Modeling the dose profile in human exposure assessments: ozone as an example. Rev Toxicol 1997: 1: 3–23.

    CAS  Google Scholar 

  • McCurdy T. Conceptual basis for multi-route intake dose modeling using an energy expenditure approach. J Expo Anal Environ Epidemiol 2000: 10: 86–97.

    Article  CAS  Google Scholar 

  • McCurdy T., Glen G., Smith L., and Lakkadi Y. The National Exposure Research Laboratory's consolidated human activity database. J Expo Anal Environ Epidemiol 2000: 10: 566–578.

    Article  CAS  Google Scholar 

  • Pas E.I. Weekly travel–activity behavior. Transportation 1988: 15: 89–109.

    Google Scholar 

  • Srivastava M.S. Estimation of the intraclass correlation coefficient. Ann Hum Genet 1993: 57: 159–165.

    Article  CAS  Google Scholar 

  • St Jeor S.T., Guthrie H.A., and Jones M.B. Variability in nutrient intake in a 28-day period. J Am Diet Assoc 1983: 83: 155–162.

    CAS  PubMed  Google Scholar 

  • St Laurent R.T. Evaluating agreement with a gold standard in method comparison studies. Biometrics 1998: 54: 537–545.

    Article  CAS  Google Scholar 

  • US Environmental Protection Agency. Total Risk Integrated Methodology (TRIM) — Air Pollutants Exposure Model Documentation (TRIM.Expo/APEX, Version 4) Volume I: User's Guide. Office of Air Quality Planning and Standards, US Environmental Protection Agency, Research Triangle Park, NC, 2006a. Available at: http://www.epa.gov/ttn/fera/human_apex.html.

  • US Environmental Protection Agency. Total Risk Integrated Methodology (TRIM) — Air Pollutants Exposure Model Documentation (TRIM.Expo/APEX, Version 4) Volume II: Technical Support Document. Office of Air Quality Planning and Standards, US Environmental Protection Agency, Research Triangle Park, NC, 2006b. Available at: http://www.epa.gov/ttn/fera/human_apex.html.

  • Xue J., McCurdy T., Spengler J., and Özkaynak H. Understanding variability in the time spent in selected locations for 7–12 year old children. J Expo Anal Environ Epidemiol 2004: 14: 222–233.

    Article  Google Scholar 

Download references

Acknowledgements

The work reported here was funded by the US Environmental Protection Agency under contract numbers EP-D-05-065 and 68-D-00-206 to Alion Science and Technology Inc. Its contents are solely the responsibility of the authors and do not necessarily represent official views of the agency. The paper has been subjected to the agency's review process and has been approved for publication. Mention of trade names or commercial products does not constitute an endorsement or recommendation for use. We gratefully acknowledge the input of Harvey Richmond of EPA's Office of Air Quality Planning and Standards and Dr. Jianping Xue of EPA's National Exposure Research Laboratory, as well as the careful consideration of the manuscript provided by two anonymous reviewers. We are also grateful to Dr. Jack D Spengler of Harvard University's School of Public Health for allowing us to analyze human activity data from the Harvard Southern California Chronic Ozone Exposure Study. We also acknowledge the monetary and intellectual support on this project provided to us by Dr. Larry Cupitt, associate director of EPA's National Exposure Research Laboratory.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Graham Glen.

Appendix A

Appendix A

Distributional requirements for matching D and A targets

The distributions in Eqs. (8), (9), (10) and (11) in the implementation section were selected to meet various criteria determined by the choices for D and A. The specific choices for these distributions may be altered, as long as the new ones also meet these criteria.

Beta distributions are a general family that is quite suitable for these purposes. Many properties of beta distributions are discussed in Johnson et al. (1995). The probability density function (pdf) for a beta distribution with bounds at “min” and “max,” and shape parameters “a” and “b” is:

In the text, this is referred to as “Beta (min, max, a, b).” The mean and variance of this beta distribution are

A uniform distribution is a special case of a beta distribution with both shape parameters equal to one (a=b=1). It follows that a uniform distribution has a mean and variance of

The x-scores are scaled ranks bounded by zero and one for any key variable. Over a large number of persons, the x-scores should match a uniform (0, 1) distribution as closely as possible, and therefore they should have a mean of 1/2 and a variance of 1/12.

From Eq. (2), the variance of the Ti across persons is σb2. Hence

Thus the distribution from which Ti is drawn requires a mean of 1/2 and a variance of (D/12). Considerations based on replacing the key variable by another whose rankings are exactly reversed require that the distribution of the Ti should be symmetric about the midpoint 1/2.

A beta distribution that is symmetric about its midpoint requires a=b. If the bounds are at (1−w)/2 and (1+w)/2 and both shape parameters equal “α,” the beta will have a mean of 1/2 and a variance of (D/12) if

The width of the distribution must satisfy w≤1. All choices satisfying Eq. (A-7) with 0<α<(3/(2D)−1/2) will produce longitudinal diaries that meet the stated constraints (mean=1/2 and variance=D/12). A convenient solution is to take α=1, in which the beta distribution reduces to the uniform distribution in Eq. (8). Other options for choosing “α” are possible, and in some cases have been implemented — for example, in the Air Pollutant Exposure model (APEX: US EPA, 2006a, 2006b). It is also possible to use distributions other than beta for the Ti.

The distribution from which the x-scores are drawn must satisfy several requirements. First, the mean for each person must equal Ti. The within-person variance, averaged over all persons, must equal σw2=σ2σb2=σ2(1−D)=(1−D)/12. Over all persons, the x-scores must have a mean of 1/2 and a variance of 1/12, so that each diary pool is sampled without bias. The distribution in Eq. (9) has shape parameters a=2Ti/(1−D) and b=2(1−Ti)/(1−D). Using equations (A-2) and (A-3), the mean is 1/2 and the variance is Ti (1−Ti) (1−D)/(3−D). To find σw2, the weighted average of this variance is needed:

For Ti uniformly distributed between (1−D½)/2 and (1+D½)/2, the integral in (A-8) is easily found to be (1−D)/12. In fact, it can be shown that any beta distribution satisfying Eq. (A-7) will produce the same average value for σw2.

The expected mean x-score over all persons is evidently 1/2, since the mean Ti is 1/2 and the mean x-score for each person is expected to be Ti. The variance of the x-scores can also be evaluated by direct integration, but it is sufficient to note that the total variance in x-scores must equal the sum of the within- and between-person variances, and these are (1−D)/12 and D/12, respectively. Therefore, the overall variance of the x-scores is 1/12 when Eq. (9) is used, the same as for a uniform distribution.

If autocorrelation is intended, then Ai must be chosen for each individual. The study in southern California discussed earlier resulted in the variance in Ai being 0.04 for all variables analyzed. For a symmetric beta distribution with equal shape parameters a=b, the variance is

Taking (max−min)=1 and a=21/8, the variance is 1/25=0.04. Thus, the distribution in Eq. (10) matches the observed variance in Ai in the southern California data, for all A between −0.5 and 0.5. This study had no examples of A outside this range, so the simplest assumption was made, namely that the shape parameters do not change.

Just as the x-scores correspond to diary rankings or percentiles rather than actual quantities of the key variable, the autocorrelation is also measured on ranks. Eq. (7) gives the definition of the autocorrelation. In this case, each xij is the rank of the x-score assigned on day “j,” relative to the set of x-scores assigned to that person. The actual ranks are called Rj, while the corresponding scaled ranks uj are limited to the range (0 to 1). The two are related by Rj=M (uj+½) and uj=(Rj−½)/M. The Ti in Eq. (7) is replaced by the mean scaled rank, which is 1/2. The variance in the scaled ranks is (1−M−2)/12, which for large M is very close to 1/12. The denominator in Eq. (7) is N times the variance, or about (N/12). Hence, the requirement becomes

The expectation value for the product (uj−½) (uj+1−½) should therefore be Ai/12, in the limit of large N. For a given uj, the expectation value of (uj−½)(uj+1−½) is given by

Consider the beta distribution in Eq. (11) in the main text. The expectation value of E[uj+1uj] for this beta is just its mean, which is the ratio of the first shape parameter to the sum of the two shape parameters:

Hence,

This needs to be averaged over all possible uj. The uj take on discrete values of (k−1/2)/M for all “k” from 1 to M, with each value being equally likely. The average uj is 1/2, and the average uj2 is 1/4+(1−M−2)/12, which is very close to 1/3 for any sizeable M. Hence,

as is necessary. Therefore, the beta distribution in Eq. (11) leads to the correct autocorrelation in the limit of long simulations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Glen, G., Smith, L., Isaacs, K. et al. A new method of longitudinal diary assembly for human exposure modeling. J Expo Sci Environ Epidemiol 18, 299–311 (2008). https://doi.org/10.1038/sj.jes.7500595

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/sj.jes.7500595

Keywords

This article is cited by

Search

Quick links