Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Compare the marginal effects for environmental exposure and biomonitoring data with repeated measurements and values below the limit of detection

Abstract

Background

Environmental exposure and biomonitoring data with repeated measurements from environmental and occupational studies are commonly right-skewed and in the presence of limits of detection (LOD). However, existing model has not been discussed for small-sample properties and highly skewed data with non-detects and repeated measurements.

Objective

Marginal modeling provides an alternative to analyzing longitudinal and cluster data, in which the parameter interpretations are with respect to marginal or population-averaged means.

Methods

We outlined the theories of three marginal models, i.e., generalized estimating equations (GEE), quadratic inference functions (QIF), and generalized method of moments (GMM). With these approaches, we proposed to incorporate the fill-in methods, including single and multiple value imputation techniques, such that any measurements less than the limit of detection are assigned values.

Results

We demonstrated that the GEE method works well in terms of estimating the regression parameters in small sample sizes, while the QIF and GMM outperform in large-sample settings, as parameter estimates are consistent and have relatively smaller mean squared error. No specific fill-in method can be deemed superior as each has its own merits.

Impact

  • Marginal modeling is firstly employed to analyze repeated measures data with non-detects, in which only the mean structure needs to be correctly provided to obtain consistent parameter estimates. After replacing non-detects through substitution methods and utilizing small-sample bias corrections, in a simulation study we found that the estimating approaches used in the marginal models have corresponding advantages under a wide range of sample sizes. We also applied the models to longitudinal and cluster working examples.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Comparisons of empirical biases.
Fig. 2: Comparisons of empirical mean squared errors.
Fig. 3: Comparisons of relative efficiencies.
Fig. 4: Comparisons of coverage probabilities.

Similar content being viewed by others

Data availability

Detailed information of the two working examples can be found in the selected articles [32, 33]. The simulation and application R code and functions for implementing the proposed approaches in this manuscript are presented in Supplementary Material or can be addressed to I-Chen Chen.

References

  1. Hornung RW, Reed LD. Estimation of average concentration in the presence of nondetectable values. Appl Occup Environ Hyg. 1990;5:46–51.

    Article  CAS  Google Scholar 

  2. Burstyn I, Teschke K. Studying the determinants of exposure: a review of methods. Am Ind Hyg Assoc J. 1999;60:57–72.

    Article  CAS  PubMed  Google Scholar 

  3. Lubin JH, Colt JS, Camann D, Davis S, Cerhan JR, Severson RK, et al. Epidemiologic evaluation of measurement data in the presence of detection limits. Environ Health Perspect. 2004;112:1691–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Huybrechts T, Thas O, Dewulf J, Van Langenhov H. How to estimate moments and quantiles of environmental data sets with nondetected observations? A case study on volatile organic compounds in marine water samples. J Chromatogr A. 2002;975:123–33.

    Article  CAS  PubMed  Google Scholar 

  5. Baccarelli A, Pfeiffer R, Consonni D, Pesatori AC, Bonzini M, Patterson DG Jr, et al. Handling of dioxin measurement data in the presence of nondetectable values: overview of available methods and their application in the Seveso chloracne study. Chemosphere. 2005;60:898–906.

    Article  CAS  PubMed  Google Scholar 

  6. Amemiya T. Regression analysis when the dependent variable is truncated normal. Econometrica. 1973;41:997–1016.

    Article  Google Scholar 

  7. Helsel DR. Fabricating data: how substituting values for nondetects can ruin results, and what can be done about it. Chemosphere. 2006;65:2434–9.

    Article  CAS  PubMed  Google Scholar 

  8. Hewett P, Ganser GH. A comparison of several methods for analyzing censored data. Ann Occup Hyg. 2007;51:611–32.

    PubMed  Google Scholar 

  9. Gilliom RJ, Helsel DR. Estimation of distributional parameters for censored trace level water quality data 1. estimation techniques. Water Resour Res. 1986;22:135–46.

    Article  Google Scholar 

  10. Helsel DR, Cohn TA. Estimation of descriptive statistics for multiply censored water quality data. Water Resour Res. 1988;24:1997–2004.

    Article  CAS  Google Scholar 

  11. Shoari N, Dubé JS, Chenouri S. Estimating the mean and standard deviation of environmental data with below detection limit observations: Considering highly skewed data and model misspecification. Chemosphere. 2015;138:599–608.

    Article  CAS  PubMed  Google Scholar 

  12. Ganser GH, Hewett P. An accurate substitution method for analyzing censored data. J Occup Environ Hyg. 2010;7:233–44.

    Article  PubMed  Google Scholar 

  13. Pleil JD. QQ-plots for assessing distributions of biomarker measurements and generating defensible summary statistics. J Breath Res. 2016;10:035001.

    Article  PubMed  Google Scholar 

  14. Pleil JD. Imputing defensible values for left-censored ‘below level of quantitation’ (LoQ) biomarker measurements. J Breath Res. 2016;10:045001.

    Article  PubMed  Google Scholar 

  15. Thi´ebaut R, Jacqmin-Gadda H. Mixed models for longitudinal left-censored repeated measures. Comput Methods Prog Biomed. 2004;74:255–60.

    Article  Google Scholar 

  16. Thi´ebaut R, Guedj J, Jacqmin-Gadda H, Chenê G, Trimoulet P, Neau D, et al. Estimation of dynamical model parameters taking into account undetectable marker values. BMC Med Res Methodol. 2006;6:38.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Vaida F, Liu L. Fast implementation for normal mixed effects models with censored response. J Comput Graph Stat. 2009;18:797–817.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Jin Y, Hein MJ, Deddens JA, Hines CJ. Analysis of lognormally distributed exposure data with repeated measures and values below the limit of detection using SAS. Ann Occup Hyg. 2011;55:97–112.

    CAS  PubMed  Google Scholar 

  19. Leidel NA, Busch KA, Lynch JR. Occupational exposure sampling strategy manual (DHEW [NIOSH] publication no. 77-173). Cincinnati, OH: National Institute for Occupational Safety and Health; 1977.

  20. Helsel DR. Less than obvious: statistical treatment of data below the detection limit. Environ Sci Technol. 1990;24:1766–74.

    Article  CAS  Google Scholar 

  21. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.

    Article  Google Scholar 

  22. Wang YG, Carey V. Working correlation structure misspecification, estimation and covariate design: implications for generalised estimating equations performance. Biometrika. 2003;90:29–41.

    Article  Google Scholar 

  23. Hansen LP. Large sample properties of generalized method of moments estimators. Econometrica. 1982;50:1029–54.

    Article  Google Scholar 

  24. Qu A, Lindsay BG, Li B. Improving generalised estimating equations using quadratic inference functions. Biometrika. 2000;87:823–36.

    Article  Google Scholar 

  25. Mancl LA, DeRouen TA. A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001;57:126–34.

    Article  CAS  PubMed  Google Scholar 

  26. Westgate PM. A bias-corrected covariance estimate for improved inference with quadratic inference functions. Stat Med. 2012;31:4003–22.

    Article  PubMed  Google Scholar 

  27. Westgate PM. A bias correction for covariance estimators to improve inference with generalized estimating equations that use an unstructured correlation matrix. Stat Med. 2013;32:2850–8.

    Article  PubMed  Google Scholar 

  28. Westgate PM. A covariance correction that accounts for correlation estimation to improve finite-sample inference with generalized estimating equations: A study on its applicability with structured correlation matrices. J Stat Comput Simul. 2016;86:1891–1900.

    Article  PubMed  Google Scholar 

  29. Chen IC, Westgate PM. Improved methods for the marginal analysis of longitudinal data in the presence of time-dependent covariates. Stat Med. 2017;36:2533–46.

    Article  PubMed  Google Scholar 

  30. Ford WP, Westgate PM. Improved standard error estimator for maintaining the validity of inference in cluster randomized trials with a small number of clusters. Biometrical J. 2017;59:478–95.

    Article  Google Scholar 

  31. Ford WP, Westgate PM. A comparison of bias-corrected empirical covariance estimators with generalized estimating equations in small-sample longitudinal study settings. Stat Med. 2018;37:4318–29.

    Article  PubMed  Google Scholar 

  32. Hines CJ, Deddens JA. Determinants of chlorpyrifos exposures and urinary 3,5,6-trichloro-2-pyridinol levels among termiticide applicators. Ann Occup Hyg. 2001;45:309–21.

    Article  CAS  PubMed  Google Scholar 

  33. Estill CF, Slone J, Mayer AC, Chen IC, Zhou M, La Guardia MJ, et al. Assessment of Triphenyl Phosphate (TPhP) exposure to nail salon workers by air, hand wipe, and urine analysis. Int J Hyg Environ Health. 2021;231:113630.

    Article  CAS  PubMed  Google Scholar 

  34. Windmeijer F. A finite sample correction for the variance of linear efficient two-step GMM estimators. J Econ. 2005;126:25–51.

    Article  Google Scholar 

  35. Kauermann G, Carroll RJ. A note on the efficiency of sandwich covariance matrix estimation. J Am Stat Assoc. 2001;96:1387–96.

    Article  Google Scholar 

  36. Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2nd ed. Hoboken, New Jersey: John Wiley and Sons; 2002.

  37. Hargarten PM, Wheeler DC. miWQS: Multiple Imputation Using Weighted Quantile Sum Regression. R package version 0.4.4; 2021. https://CRAN.R-roject.org/package=miWQS

  38. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2022. https://www.R-project.org

    Google Scholar 

  39. Jacqmin-Gadda H, Thi´ebaut R. Analysis of left-censored longitudinal data with application to viral load in HIV infection. Biostatistics. 2000;1:355–68.

    Article  CAS  PubMed  Google Scholar 

  40. Hin LY, Wang YG. Working-correlation-structure identification in generalized estimating equations. Stat Med. 2009;28:642–58.

    Article  PubMed  Google Scholar 

  41. Westgate PM. Criterion for the simultaneous selection of a working correlation structure and either generalized estimating equations or the quadratic inference function approach. Biometrical J. 2014;56:461–76.

    Article  Google Scholar 

  42. Diggle PJ, Heagerty PJ, Liang KY, Zeger SL. The Analysis of Longitudinal Data, 2nd ed. New York: Oxford University Press; 2002.

  43. Newey WK, Smith RJ. Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica. 2004;72:219–55.

    Article  Google Scholar 

  44. Westgate PM. A bias-corrected covariance estimator for improved inference when using an unstructured correlation with quadratic inference functions. Stat Probab Lett. 2013;83:1553–8.

    Article  Google Scholar 

  45. SAS Institute Inc. SAS/STAT 9.3 Users Guide. SAS Institute Inc., Cary, NC; 2011.

  46. Chen IC, Westgate PM. A novel approach to selecting classification types for time-dependent covariates for the marginal analysis of longitudinal data. Stat Methods Med Res. 2018;28:3176–86.

    Article  PubMed  Google Scholar 

  47. Chen LS, Prentice RL, Wang P. A penalized EM algorithm incorporating missing data mechanism for gaussian parameter estimation. Biometrics. 2014;70:312–22.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We would like to thank the people from the Division of Field Studies and Engineering at CDC’s National Institute for Occupational Safety and Health who assisted in the study. The findings and conclusions in this manuscript are those of the authors and do not necessarily represent the official position of the National Institute for Occupational Safety and Health, Centers for Disease Control and Prevention.

Author information

Authors and Affiliations

Authors

Contributions

ICC was responsible for designing statistical methods, conducting a simulation study, analyzing two real-world datasets, interpreting simulation and application results, producing tables and figures, drafting the initial manuscript, revising the manuscript, and approving the final version of manuscript. SJB contributed to interpretations of simulation and application results, revised manuscript, provided feedback, and approved the final version. CFE contributed to data curation and extraction, revised manuscript, provided feedback, and approved the final version. Additionally, Whitney F. Tanner and Yu-Cheng Chen reviewed the paper and provided helpful feedback.

Corresponding author

Correspondence to I-Chen Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, IC., Bertke, S.J. & Estill, C.F. Compare the marginal effects for environmental exposure and biomonitoring data with repeated measurements and values below the limit of detection. J Expo Sci Environ Epidemiol (2024). https://doi.org/10.1038/s41370-024-00640-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41370-024-00640-7

Keywords

Search

Quick links