Abstract
Background
Environmental exposure and biomonitoring data with repeated measurements from environmental and occupational studies are commonly right-skewed and in the presence of limits of detection (LOD). However, existing model has not been discussed for small-sample properties and highly skewed data with non-detects and repeated measurements.
Objective
Marginal modeling provides an alternative to analyzing longitudinal and cluster data, in which the parameter interpretations are with respect to marginal or population-averaged means.
Methods
We outlined the theories of three marginal models, i.e., generalized estimating equations (GEE), quadratic inference functions (QIF), and generalized method of moments (GMM). With these approaches, we proposed to incorporate the fill-in methods, including single and multiple value imputation techniques, such that any measurements less than the limit of detection are assigned values.
Results
We demonstrated that the GEE method works well in terms of estimating the regression parameters in small sample sizes, while the QIF and GMM outperform in large-sample settings, as parameter estimates are consistent and have relatively smaller mean squared error. No specific fill-in method can be deemed superior as each has its own merits.
Impact
-
Marginal modeling is firstly employed to analyze repeated measures data with non-detects, in which only the mean structure needs to be correctly provided to obtain consistent parameter estimates. After replacing non-detects through substitution methods and utilizing small-sample bias corrections, in a simulation study we found that the estimating approaches used in the marginal models have corresponding advantages under a wide range of sample sizes. We also applied the models to longitudinal and cluster working examples.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 6 print issues and online access
$259.00 per year
only $43.17 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Detailed information of the two working examples can be found in the selected articles [32, 33]. The simulation and application R code and functions for implementing the proposed approaches in this manuscript are presented in Supplementary Material or can be addressed to I-Chen Chen.
References
Hornung RW, Reed LD. Estimation of average concentration in the presence of nondetectable values. Appl Occup Environ Hyg. 1990;5:46–51.
Burstyn I, Teschke K. Studying the determinants of exposure: a review of methods. Am Ind Hyg Assoc J. 1999;60:57–72.
Lubin JH, Colt JS, Camann D, Davis S, Cerhan JR, Severson RK, et al. Epidemiologic evaluation of measurement data in the presence of detection limits. Environ Health Perspect. 2004;112:1691–6.
Huybrechts T, Thas O, Dewulf J, Van Langenhov H. How to estimate moments and quantiles of environmental data sets with nondetected observations? A case study on volatile organic compounds in marine water samples. J Chromatogr A. 2002;975:123–33.
Baccarelli A, Pfeiffer R, Consonni D, Pesatori AC, Bonzini M, Patterson DG Jr, et al. Handling of dioxin measurement data in the presence of nondetectable values: overview of available methods and their application in the Seveso chloracne study. Chemosphere. 2005;60:898–906.
Amemiya T. Regression analysis when the dependent variable is truncated normal. Econometrica. 1973;41:997–1016.
Helsel DR. Fabricating data: how substituting values for nondetects can ruin results, and what can be done about it. Chemosphere. 2006;65:2434–9.
Hewett P, Ganser GH. A comparison of several methods for analyzing censored data. Ann Occup Hyg. 2007;51:611–32.
Gilliom RJ, Helsel DR. Estimation of distributional parameters for censored trace level water quality data 1. estimation techniques. Water Resour Res. 1986;22:135–46.
Helsel DR, Cohn TA. Estimation of descriptive statistics for multiply censored water quality data. Water Resour Res. 1988;24:1997–2004.
Shoari N, Dubé JS, Chenouri S. Estimating the mean and standard deviation of environmental data with below detection limit observations: Considering highly skewed data and model misspecification. Chemosphere. 2015;138:599–608.
Ganser GH, Hewett P. An accurate substitution method for analyzing censored data. J Occup Environ Hyg. 2010;7:233–44.
Pleil JD. QQ-plots for assessing distributions of biomarker measurements and generating defensible summary statistics. J Breath Res. 2016;10:035001.
Pleil JD. Imputing defensible values for left-censored ‘below level of quantitation’ (LoQ) biomarker measurements. J Breath Res. 2016;10:045001.
Thi´ebaut R, Jacqmin-Gadda H. Mixed models for longitudinal left-censored repeated measures. Comput Methods Prog Biomed. 2004;74:255–60.
Thi´ebaut R, Guedj J, Jacqmin-Gadda H, Chenê G, Trimoulet P, Neau D, et al. Estimation of dynamical model parameters taking into account undetectable marker values. BMC Med Res Methodol. 2006;6:38.
Vaida F, Liu L. Fast implementation for normal mixed effects models with censored response. J Comput Graph Stat. 2009;18:797–817.
Jin Y, Hein MJ, Deddens JA, Hines CJ. Analysis of lognormally distributed exposure data with repeated measures and values below the limit of detection using SAS. Ann Occup Hyg. 2011;55:97–112.
Leidel NA, Busch KA, Lynch JR. Occupational exposure sampling strategy manual (DHEW [NIOSH] publication no. 77-173). Cincinnati, OH: National Institute for Occupational Safety and Health; 1977.
Helsel DR. Less than obvious: statistical treatment of data below the detection limit. Environ Sci Technol. 1990;24:1766–74.
Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.
Wang YG, Carey V. Working correlation structure misspecification, estimation and covariate design: implications for generalised estimating equations performance. Biometrika. 2003;90:29–41.
Hansen LP. Large sample properties of generalized method of moments estimators. Econometrica. 1982;50:1029–54.
Qu A, Lindsay BG, Li B. Improving generalised estimating equations using quadratic inference functions. Biometrika. 2000;87:823–36.
Mancl LA, DeRouen TA. A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001;57:126–34.
Westgate PM. A bias-corrected covariance estimate for improved inference with quadratic inference functions. Stat Med. 2012;31:4003–22.
Westgate PM. A bias correction for covariance estimators to improve inference with generalized estimating equations that use an unstructured correlation matrix. Stat Med. 2013;32:2850–8.
Westgate PM. A covariance correction that accounts for correlation estimation to improve finite-sample inference with generalized estimating equations: A study on its applicability with structured correlation matrices. J Stat Comput Simul. 2016;86:1891–1900.
Chen IC, Westgate PM. Improved methods for the marginal analysis of longitudinal data in the presence of time-dependent covariates. Stat Med. 2017;36:2533–46.
Ford WP, Westgate PM. Improved standard error estimator for maintaining the validity of inference in cluster randomized trials with a small number of clusters. Biometrical J. 2017;59:478–95.
Ford WP, Westgate PM. A comparison of bias-corrected empirical covariance estimators with generalized estimating equations in small-sample longitudinal study settings. Stat Med. 2018;37:4318–29.
Hines CJ, Deddens JA. Determinants of chlorpyrifos exposures and urinary 3,5,6-trichloro-2-pyridinol levels among termiticide applicators. Ann Occup Hyg. 2001;45:309–21.
Estill CF, Slone J, Mayer AC, Chen IC, Zhou M, La Guardia MJ, et al. Assessment of Triphenyl Phosphate (TPhP) exposure to nail salon workers by air, hand wipe, and urine analysis. Int J Hyg Environ Health. 2021;231:113630.
Windmeijer F. A finite sample correction for the variance of linear efficient two-step GMM estimators. J Econ. 2005;126:25–51.
Kauermann G, Carroll RJ. A note on the efficiency of sandwich covariance matrix estimation. J Am Stat Assoc. 2001;96:1387–96.
Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2nd ed. Hoboken, New Jersey: John Wiley and Sons; 2002.
Hargarten PM, Wheeler DC. miWQS: Multiple Imputation Using Weighted Quantile Sum Regression. R package version 0.4.4; 2021. https://CRAN.R-roject.org/package=miWQS
R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2022. https://www.R-project.org
Jacqmin-Gadda H, Thi´ebaut R. Analysis of left-censored longitudinal data with application to viral load in HIV infection. Biostatistics. 2000;1:355–68.
Hin LY, Wang YG. Working-correlation-structure identification in generalized estimating equations. Stat Med. 2009;28:642–58.
Westgate PM. Criterion for the simultaneous selection of a working correlation structure and either generalized estimating equations or the quadratic inference function approach. Biometrical J. 2014;56:461–76.
Diggle PJ, Heagerty PJ, Liang KY, Zeger SL. The Analysis of Longitudinal Data, 2nd ed. New York: Oxford University Press; 2002.
Newey WK, Smith RJ. Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica. 2004;72:219–55.
Westgate PM. A bias-corrected covariance estimator for improved inference when using an unstructured correlation with quadratic inference functions. Stat Probab Lett. 2013;83:1553–8.
SAS Institute Inc. SAS/STAT 9.3 Users Guide. SAS Institute Inc., Cary, NC; 2011.
Chen IC, Westgate PM. A novel approach to selecting classification types for time-dependent covariates for the marginal analysis of longitudinal data. Stat Methods Med Res. 2018;28:3176–86.
Chen LS, Prentice RL, Wang P. A penalized EM algorithm incorporating missing data mechanism for gaussian parameter estimation. Biometrics. 2014;70:312–22.
Acknowledgements
We would like to thank the people from the Division of Field Studies and Engineering at CDC’s National Institute for Occupational Safety and Health who assisted in the study. The findings and conclusions in this manuscript are those of the authors and do not necessarily represent the official position of the National Institute for Occupational Safety and Health, Centers for Disease Control and Prevention.
Author information
Authors and Affiliations
Contributions
ICC was responsible for designing statistical methods, conducting a simulation study, analyzing two real-world datasets, interpreting simulation and application results, producing tables and figures, drafting the initial manuscript, revising the manuscript, and approving the final version of manuscript. SJB contributed to interpretations of simulation and application results, revised manuscript, provided feedback, and approved the final version. CFE contributed to data curation and extraction, revised manuscript, provided feedback, and approved the final version. Additionally, Whitney F. Tanner and Yu-Cheng Chen reviewed the paper and provided helpful feedback.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, IC., Bertke, S.J. & Estill, C.F. Compare the marginal effects for environmental exposure and biomonitoring data with repeated measurements and values below the limit of detection. J Expo Sci Environ Epidemiol (2024). https://doi.org/10.1038/s41370-024-00640-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41370-024-00640-7