Abstract
Background
In the first stage of a two-stage study, the researcher uses a statistical model to impute the unobserved exposures. In the second stage, imputed exposures serve as covariates in epidemiological models. Imputation error in the first stage operate as measurement errors in the second stage, and thus bias exposure effect estimates.
Objective
This study aims to improve the estimation of exposure effects by sharing information between the first and second stages.
Methods
At the heart of our estimator is the observation that not all second-stage observations are equally important to impute. We thus borrow ideas from the optimal-experimental-design theory, to identify individuals of higher importance. We then improve the imputation of these individuals using ideas from the machine-learning literature of domain adaptation.
Results
Our simulations confirm that the exposure effect estimates are more accurate than the current best practice. An empirical demonstration yields smaller estimates of PM effect on hyperglycemia risk, with tighter confidence bands.
Significance
Sharing information between environmental scientist and epidemiologist improves health effect estimates. Our estimator is a principled approach for harnessing this information exchange, and may be applied to any two stage study.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 6 print issues and online access
$259.00 per year
only $43.17 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The data are not available for replication because of privacy issues.
References
Montero J-M, Fernández-Avilés G, Mateu J. Spatial and spatio-temporal geostatistical modeling and kriging. Chichester: John Wiley & Sons; 2015.
Hodges JS. Richly parameterized linear models: additive, time series, and spatial models using random effects. BocaRaton, FL: CRC Press; 2013.
Szpiro AA, Sheppard L, Lumley T. Efficient measurement error correction with spatially misaligned data. Biostatistics. 2011;12:610–23.
Shtein A, Karnieli A, Katra I, Raz R, Levy I, Lyapustin A, et al. Estimating daily and intra-daily pm10 and pm2. 5 in israel using a spatio-temporal hybrid modeling approach. Atmos Environ. 2018;191:142–52.
Sarafian R, Kloog I, Just AC, Rosenblatt JD. Gaussian markov random fields versus linear mixed models for satellite-based pm2. 5 assessment: evidence from the Northeastern USA. Atmos Environ. 2019;205:30–35.
Szpiro AA, Paciorek C. Measurement error in twostage analyses, with application to air pollution epidemiology. Environmetrics. 2013;24:501–17.
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models: a modern perspective. Boca Raton, FL: CRC Press; 2006.
Gretton A, Smola A, Huang J, Schmittfull M, Borgwardt K, Schölkopf B. Covariate shift by kernel mean matching. Dataset Shift Mach Learn. 2009;3:5.
Spiegelman D. Approaches to uncertainty in exposure assessment in environmental epidemiology. Annu Rev Public health. 2010;31:149–63.
Lopiano KK, Young LJ, Gotway CA. A comparison of errors in variables methods for use in regression models with spatially misaligned data. Stat Methods Med Res. 2011;20:29–47.
Just AC, Carli MMD, Shtein A, Dorman M, Lyapustin A, Kloog I. Correcting measurement error in satellite aerosol optical depth with machine learning for modeling pm2. 5 in the northeastern usa. Remote Sens. 2018;10:803.
Diao M, Holloway T, Choi S, O'Neill SM, Al-Hamdan MZ, Van Donkelaar A, et al. Methods, availability, and applications of pm2.5 exposure estimates derived from ground measurements, satellite, and atmospheric models. J Air Waste Manag Assoc. 2019;69:1391–414.
Szpiro AA, Paciorek CJ, Sheppard L. Does more accurate exposure prediction necessarily improve health effect estimates? Epidemiology (Camb, MA). 2011;22:680.
Just AC, Arfer KB, Rush J, Dorman M, Shtein A, Lyapustin A, et al. Advancing methodologies for applying machine learning and evaluating spatiotemporal models of fine particulate matter (pm2.5) using satellite data over large regions. Atmos Environ. 2020;239:117649.
Park Y, Kwon B, Heo J, Hu X, Liu Y, Moon T. Estimating pm2. 5 concentration of the conterminous united states via interpretable convolutional neural networks. Environ Pollut. 2020;256:113395.
Hough I, Just AC, Zhou B, Dorman M, Lepeule J, Kloog I. A multi-resolution air temperature model for France from modis and landsat thermal data. Environ Res. 2020;183:109244.
Dean A, Morris M, Stufken J, Bingham D. Handbook of design and analysis of experiments, vol. 7. Boca Raton, FL: CRC Press; 2015.
Fedorov VV, Leonov SL. Optimal design for nonlinear response models. Boca Raton, FL: CRC Press; 2013.
Shimodaira H. Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan inference. 2000;90:227–44.
Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND. Dataset shift in machine learning. Cambridge, MA: The MIT Press; 2009.
Sarafian R, Kloog I, Sarafian E, Hough I, Rosenblatt JD. A domain adaptation approach for performance estimation of spatial predictions. IEEE Trans Geosci Remote Sens 2020;59.6:5197–5205.
Park SK, Wang W. Ambient air pollution and type 2 diabetes mellitus: a systematic review of epidemiologic research. Curr Environ Health Rep. 2014;1:275–86.
Peng C, Bind MC, Colicino E, Kloog I, Byun HM, Cantone L, et al. Particulate air pollution and fasting blood glucose in nondiabetic individuals: associations and epigenetic mediation in the normative aging study, 2000–2011. Environ Health Perspect. 2016;124:1715–21.
Yitshak Sade M, Kloog I, Liberty IF, Schwartz J, Novack V. The association between air pollution exposure and glucose and lipids levels. J Clin Endocrinol Metab. 2016;101:2460–7.
Pukelsheim F. Optimal design of experiments. Philadelphia, PA: SIAM; 2006.
Wu Y, Hoffman FO, Apostoaei AI, Kwon D, Thomas BA, Glass R, et al. Methods to account for uncertainties in exposure assessment in studies of environmental exposures. Environ Health. 2019;18:31.
Sheppard L, Burnett RT, Szpiro AA, Kim SY, Jerrett M, Pope CA, et al. Confounding and exposure measurement error in air pollution epidemiology. Air Qual, Atmos Health. 2012;5:203–16.
Bickel PJ. One-step huber estimates in the linear model. J Am Stat Assoc. 1975;70:428–34.
Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big Data. 2016;3:1–40.
Acknowledgements
The authors wish to thank Dr. Raanan Raz, and Dr. Lena Novack, for their comments and ideas.
Funding
The results reported herein correspond to specific aims of grant no. 900/16 to JDR from the Israel Science Foundation.
Author information
Authors and Affiliations
Contributions
RS conceived of the presented idea. RS developed the theory with support from JDR, and performed the computations. JDR and IK verified the analytical methods, and supervised the findings of this work. RS wrote the manuscript with support from JDR. and IK. All authors discussed the results and contributed to the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sarafian, R., Kloog, I. & Rosenblatt, J.D. Optimal-design domain-adaptation for exposure prediction in two-stage epidemiological studies. J Expo Sci Environ Epidemiol 33, 963–970 (2023). https://doi.org/10.1038/s41370-022-00438-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41370-022-00438-5
Keywords
This article is cited by
-
Deep multi-task learning for early warnings of dust events implemented for the Middle East
npj Climate and Atmospheric Science (2023)