Abstract
Unmeasured confounding threatens the validity of observational studies. Negative control variables (NCs) are variables that either do not cause the outcome of interest or are not caused by the exposure of interest and are increasingly available from emerging sensing technologies and digitized health records. Under appropriate assumptions, NCs can be used to adjust for unmeasured confounding bias. This Primer explains the assumptions and implementation of NCs for unmeasured confounding bias adjustment. Among the method’s broad applications in public health research, time series studies of environmental exposures — air pollution, wildfires and heat — and health outcomes are focused on. Three types of unmeasured confounding in time series studies are considered: time-invariant confounders with time-invariant confounding effects; time-invariant confounders with time-modified confounding effects; and time-varying confounders with immediate and/or lagged confounding effects. For each type of confounding, guidance is provided on how to select NCs using several case studies. Finally, challenges and opportunities are described, to help catalyse additional methodological developments.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 1 digital issues and online access to articles
$119.00 per year
only $119.00 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Polack, F. P. et al. Safety and efficacy of the BNT162b2 mRNA COVID-19 vaccine. N. Engl. J. Med. 383, 2603–2615 (2020).
Voysey, M. et al. Safety and efficacy of the ChAdOx1 nCoV-19 vaccine (AZD1222) against SARS-CoV-2: an interim analysis of four randomised controlled trials in Brazil, South Africa, and the UK. Lancet 397, 99–111 (2021).
Baden, L. R. et al. Efficacy and safety of the mRNA-1273 SARS-CoV-2 vaccine. N. Engl. J. Med. 384, 403–416 (2021).
Lopalco, P. L. & DeStefano, F. The complementary roles of phase 3 trials and post-licensure surveillance in the evaluation of new vaccines. Vaccine 33, 1541–1548 (2015).
Vasileiou, E. et al. Interim findings from first-dose mass COVID-19 vaccination roll-out and COVID-19 hospital admissions in Scotland: a national prospective cohort study. Lancet 397, 1646–1657 (2021).
Dean, N. E., Hogan, J. W. & Schnitzer, M. E. COVID-19 vaccine effectiveness and the test-negative design. N. Engl. J. Med. 385, 1431–1433 (2021).
Lewnard, J. A. et al. Theoretical framework for retrospective studies of the effectiveness of SARS-CoV-2 vaccines. Epidemiology 32, 508–517 (2021).
Shi, X., Miao, W. & Tchetgen, E. T. A selective review of negative control methods in epidemiology. Curr. Epidemiol. Rep. 7, 190–202 (2020).
Zeger, S. L., Irizarry, R. & Peng, R. D. On time series analysis of public health and biomedical data. Annu. Rev. Public. Health 27, 57–79 (2006). This article gives an overview of time series study designs and methods used in public health and biomedical research.
Samet, J. M., Dominici, F., Curriero, F. C., Coursac, I. & Zeger, S. L. Fine particulate air pollution and mortality in 20 U.S. cities, 1987–1994. N. Engl. J. Med. 343, 1742–1749 (2000).
Dominici, F., McDermott, A. & Hastie, T. J. Improved semiparametric time series models of air pollution and mortality. J. Am. Stat. Assoc. 99, 938–948 (2004).
Maclure, M. The case-crossover design: a method for studying transient effects on the risk of acute events. Am. J. Epidemiol. 133, 144–153 (1991).
Neas, L. M., Schwartz, J. & Dockery, D. A case-crossover analysis of air pollution and mortality in Philadelphia. Environ. Health Perspect. 107, 629–631 (1999).
Flanders, W. D., Strickland, M. J. & Klein, M. A new method for partial correction of residual confounding in time-series and other observational studies. Am. J. Epidemiol. 185, 941–949 (2017).
Miao, W. & Tchetgen Tchetgen, E. Invited commentary: bias attenuation and identification of causal effects with multiple negative controls. Am. J. Epidemiol. 185, 950–953 (2017).
Levintow, S. N. et al. Use of negative control outcomes to assess the comparability of patients initiating lipid-lowering therapies. Pharmacoepidemiol. Drug. Saf. 31, 383–392 (2022).
McGrath, L. J. et al. Using negative control outcomes to assess the comparability of treatment groups among women with osteoporosis in the United States. Pharmacoepidemiol. Drug. Saf. 29, 854–863 (2020).
Crabtree, B. F., Ray, S. C., Schmidt, P. M., O’Connor, P. T. & Schmidt, D. D. The individual over time: time series applications in health care research. J. Clin. Epidemiol. 43, 241–260 (1990).
Miettinen, O. Confounding and effect-modification. Am. J. Epidemiol. 100, 350–353 (1974).
Miettinen, O. S. & Cook, E. F. Confounding: essence and detection. Am. J. Epidemiol. 114, 593–603 (1981).
Robins, J. M. & Greenland, S. The role of model selection in causal inference from nonexperimental data. Am. J. Epidemiol. 123, 392–402 (1986).
Kleinbaum, D. G., Kupper, L. L. & Morgenstern, H. Epidemiologic Research: Principles and Quantitative Methods (Wiley, 1991).
Pearl, J. Causality (Cambridge Univ. Press, 2009). Chapters 1–3 of this book provide a systematic account of the concepts in causal inference, the relevant mathematical tools and the assumptions needed for drawing causal claims from data.
VanderWeele, T. J. & Shpitser, I. On the definition of a confounder. Ann. Stat. 41, 196–220 (2013).
Pearl, J. Causal diagrams for empirical research. Biometrika 82, 669–688 (1995).
Shrier, I. & Platt, R. W. Reducing bias through directed acyclic graphs. BMC Med. Res. Methodol. 8, 70 (2008).
Lipsitch, M., Tchetgen Tchetgen, E. & Cohen, T. Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology 21, 383–388 (2010). This article introduces and distinguishes two types of negative controls, exposure and outcome controls, and describes how to use them to detect confounding in epidemiological studies.
Angrist, J. D., Imbens, G. W. & Rubin, D. B. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91, 444–455 (1996).
Schwartz, J., Bind, M.-A. & Koutrakis, P. Estimating causal effects of local air pollution on daily deaths: effect of low levels. Environ. Health Perspect. 125, 23–29 (2017).
Lousdal, M. L. An introduction to instrumental variable assumptions, validation and estimation. Emerg. Themes Epidemiol. 15, 1 (2018).
Hernán, M. A. & Robins, J. M. Instruments for causal inference: an epidemiologist’s dream? Epidemiology 17, 360–372 (2006).
Meyer, B. D. Natural and quasi-experiments in economics. J. Bus. Economic Stat. 13, 151–161 (1995).
Abadie, A. Semiparametric difference-in-differences estimators. Rev. Economic Stud. 72, 1–19 (2005).
Sofer, T., Richardson, D. B., Colicino, E., Schwartz, J. & Tchetgen Tchetgen, E. J. On negative outcome control of unobserved confounding as a generalization of difference-in-differences. Stat. Sci. 31, 348–361 (2016).
Carroll, R. J., Ruppert, D., Stefanski, L. A. & Crainiceanu, C. M. Measurement Error in Nonlinear Models: A Modern Perspective (Chapman and Hall/CRC, 2006).
Splawa-Neyman, J., Dabrowksa, D. M. & Speed, T. P. On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statist. Sci. 5, 465–472 (1990).
Holland, P. W. Causal inference, path analysis, and recursive structural equations models. Sociological Methodol. 18, 449–484 (1988).
Pearl, J. in Proc. Workshop on Causality: Objectives and Assessment at NIPS 2008 (eds Guyon, I., Janzing, D. & Schölkopf, B.) 39–58 (PMLR, 2010).
Rosenbaum, P. R. & Rubin, D. B. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983).
Tchetgen, E. J. T., Ying, A., Cui, Y., Shi, X. & Miao, W. An introduction to proximal causal learning. Preprint at https://doi.org/10.48550/arXiv.2009.10982 (2020).
Miao, W., Geng, Z. & Tchetgen Tchetgen, E. Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika 105, 987–993 (2018). This article shows that with at least two independent proxy variables Z and W of an unmeasured confounder U, the causal effect is non-parametrically identified, satisfying certain conditions without identifying Pr(W | U).
Armstrong, B. G. Effect of measurement error on epidemiological studies of environmental and occupational exposures. Occup. Environ. Med. 55, 651–656 (1998).
Kuroki, M. & Pearl, J. Measurement bias and effect restoration in causal inference. Biometrika 101, 423–437 (2014). This article shows that graphical techniques can be harnessed to address the problem of measurement errors, and discusses how, when only a proxy variable W of an unmeasured confounder U is observed, to estimate causal effects by identifying the error mechanism Pr(W | U) from prior knowledge or from another proxy measurement of U.
Glynn, R. J., Knight, E. L., Levin, R. & Avorn, J. Paradoxical relations of drug treatment with mortality in older persons. Epidemiology 12, 682–689 (2001).
Stürmer, T., Rothman, K. J., Avorn, J. & Glynn, R. J. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution — a simulation study. Am. J. Epidemiol. 172, 843–854 (2010).
Lawlor, D. A., Davey Smith, G., Kundu, D., Bruckdorfer, K. R. & Ebrahim, S. Those confounded vitamins: what can we learn from the differences between observational versus randomised trial evidence? Lancet 363, 1724–1727 (2004).
Schuemie, M. J., Ryan, P. B., DuMouchel, W., Suchard, M. A. & Madigan, D. Interpreting observational studies: why empirical calibration is needed to correct P-values. Stat. Med. 33, 209–218 (2014).
Dominici, F. et al. Fine particulate air pollution and hospital admission for cardiovascular and respiratory diseases. JAMA 295, 1127–1134 (2006).
Bell, M. L. et al. Seasonal and regional short-term effects of fine particles on hospital admissions in 202 US counties, 1999–2005. Am. J. Epidemiol. 168, 1301–1310 (2008).
Kloog, I. et al. Short term effects of particle exposure on hospital admissions in the Mid-Atlantic states: a population estimate. PLoS ONE 9, e88578 (2014).
Reid, J. S., Koppmann, R., Eck, T. F. & Eleuterio, D. P. A review of biomass burning emissions part II: intensive physical properties of biomass burning particles. Atmos. Chem. Phys. 5, 799–825 (2005).
O’Dell, K. et al. Hazardous air pollutants in fresh and aged western US wildfire smoke and implications for long-term exposure. Environ. Sci. Technol. 54, 11838–11847 (2020).
Lill, E. et al. Wildfire-driven changes in the abundance of gas-phase pollutants in the city of Boise, ID during summer 2018. Atmos. Pollut. Res. 13, 101269 (2022).
Liu, J. C., Pereira, G., Uhl, S. A., Bravo, M. A. & Bell, M. L. A systematic review of the physical health impacts from non-occupational exposure to wildfire smoke. Environ. Res. 136, 120–132 (2015).
Wood, L., Hooper, P., Foster, S. & Bull, F. Public green spaces and positive mental health — investigating the relationship between access, quantity and types of parks and mental wellbeing. Health Place. 48, 63–71 (2017).
Cavanagh, J.-A. E., Zawar-Reza, P. & Wilson, J. G. Spatial attenuation of ambient particulate matter air pollution within an urbanised native forest patch. Urban. For. Urban Green. 8, 21–30 (2009).
Trethewey, S. P. & Reynolds, E. K. M. Exposure to green spaces and all-cause mortality: limitations in measurement and definitions of exposure. Lancet Planet. Health 5, e501 (2021).
Holland, I. et al. Measuring nature contact: a narrative review. Int. J. Env. Res. Public. Health 18, 4092 (2021).
Gascon, M. et al. Mental health benefits of long-term exposure to residential green and blue spaces: a systematic review. Int. J. Environ. Res. Public. Health 12, 4354–4379 (2015).
Tarpley, J. D., Schneider, S. R. & Money, R. L. Global vegetation indices from the NOAA-7 meteorological satellite. J. Appl. Meteorol. Climatol. 23, 491–494 (1984).
Kogan, F. N. Droughts of the late 1980s in the United States as derived from NOAA polar-orbiting satellite data. Bull. Am. Meteorol. Soc. 76, 655–668 (1995).
Clark, C. and Stansfeld, S. A. The effect of transportation noise on health and cognitive development: a review of recent evidence. Int. J. Comp. Psychol. 20, 145–158 (2007).
Schneising, O., Buchwitz, M., Reuter, M., Bovensmann, H. & Burrows, J. P. Severe Californian wildfires in November 2018 observed from space: the carbon monoxide perspective. Atmos. Chem. Phys. 20, 3317–3332 (2020).
Bobb, J. F., Obermeyer, Z., Wang, Y. & Dominici, F. Cause-specific risk of hospital admission related to extreme heat in older adults. JAMA 312, 2659 (2014).
Simonson, A. et al. in Big Data Analytics in Earth, Atmospheric, and Ocean Sciences (eds Huang, T., Vance, T. & Lynnes, C.) 65–94 (American Geophysical Union (AGU), 2022).
Ramapriyan, H. K. The role and evolution of NASA’s earth science data systems. https://ntrs.nasa.gov/citations/20150018076 (NASA, 2015).
Flanders, W. D. et al. A method for detection of residual confounding in time-series and other observational studies. Epidemiology 22, 59–67 (2011). This article pioneers the use of future exposure measurements in time series studies to detect unmeasured confounding — pointing out that in the absence of unmeasured confounding, future exposure should be independent of past outcome, but should not otherwise.
Goodman, S. N., Fanelli, D. & Ioannidis, J. P. A. What does research reproducibility mean? Sci. Transl. Med. 8, 341ps12 (2016).
Peng, R. D. & Hicks, S. C. Reproducible research: a retrospective. Annu. Rev. Public. Health 42, 79–93 (2021).
Murray-Rust, P. Open data in science. Nat. Prec. https://doi.org/10.1038/npre.2008.1526.1 (2008).
Ghassami, A., Ying, A., Shpitser, I. & Tchetgen, E. T. in Proc. 25th Int. Conf. Artificial Intelligence and Statistics (eds Camps-Valls, G., Ruiz, F. J. R. & Valera, I.) 7210–7239 (PMLR, 2022).
Mansournia, M. A., Etminan, M., Danaei, G., Kaufman, J. S. & Collins, G. Handling time varying confounding in observational research. BMJ 359, j4587 (2017).
Ying, A., Miao, W., Shi, X. & Tchetgen Tchetgen, E. J. Proximal causal inference for complex longitudinal studies. J. R. Stat. Soc. Series B Stat. Methodol. 85, 684–704 (2023).
Greenland, S. Confounding and exposure trends in case-crossover and case–time–control designs. Epidemiology 7, 231–239 (1996).
Perkins, S. E. & Alexander, L. V. On the measurement of heat waves. J. Clim. 26, 4500–4517 (2013).
Klompmaker, J. O. et al. Green space definition affects associations of green space with overweight and physical activity. Environ. Res. 160, 531–540 (2018).
Faurot, K. R. et al. Using claims data to predict dependency in activities of daily living as a proxy for frailty. Pharmacoepidemiol. Drug. Saf. 24, 59–66 (2015).
Acknowledgements
J.K.H. thanks the National Institute of Environmental Health Sciences (T32 ES 7069), Sloan Foundation (G-2020-13946) and Environmental Protection Agency (CR-83467701) for financial support. E.J.T.T. thanks the National Institutes of Health (NIH) (R01AG065276), National Cancer Institute (NCI) (R01CA222147), General Medical Sciences (R01GM139926) and National Institute of Allergy and Infectious Diseases (R01AI27271) for financial support. F.D. thanks the NIH (R01ES026217, R01MD012769, R01ES028033, 5R01AG060232-03, 1R01ES030616, 1R01AG066793, 1R01ES029950, 1R01ES 034373-01) and Sloan Foundation (G-2020-13946) for financial support.
Author information
Authors and Affiliations
Contributions
Introduction (J.K.H. and F.D.); Experimentation (J.K.H., E.J.T.T. and F.D.); Results (J.K.H. and E.J.T.T.); Applications (J.K.H. and F.D.); Reproducibility and data deposition (J.K.H. and F.D.); Limitations and optimizations (J.K.H. and E.J.T.T.); Outlook (J.K.H., E.J.T.T. and F.D.); Overview of the Primer (all authors).
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Reviews Methods Primers thanks Sara Levintow; W. Dana Flanders; and William Henley, who co-reviewed with Sharlene Alauddin, for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
AirNow: https://gispub.epa.gov/airnow
GRIDMET: https://developers.google.com/earth-engine/datasets/catalog/IDAHO_EPSCOR_GRIDMET
Highest September temperatures in Napa: https://www.extremeweatherwatch.com/cities/napa/month-september/highest-temperatures
NASA: https://data.nasa.gov/
National Center of Health Statistics: https://data.cdc.gov/
NOAA: https://data.noaa.gov/
Supplementary information
Glossary
- Asymptotically unbiased
-
An estimator for a parameter is asymptotically unbiased if its expectation converges to the true value of the parameter when the sample size is large enough.
- Backdoor criterion
-
A graphic test in which a set of variables U satisfies the backdoor criterion relative to an ordered pair of variables (A, Y) in a directed acyclic graph (DAG) if no node in U is a descendant of A; and U blocks every path between A and Y that contains an arrow into A.
- Categorical variable
-
A characteristic that cannot be quantifiable. Categorical variables can be either nominal or ordinal.
- Causal inference
-
The process of using data for uncovering causal relationships between variables.
- Directed acyclic graph
-
(DAG). A graph contains a set of vertices (nodes) and a set of edges that connect some pairs of vertices. If every edge in a graph is an arrow that points from the first to the second vertex, we have a directed graph. A DAG is a graph that is directed and without directed cycles.
- Negative control exposure
-
(NCE). A variable Z is an NCE if it is known a priori not to cause outcome Y, and the association between Z and Y is subject to the same unmeasured confounding mechanism as between exposure A and outcome Y.
- Negative control outcome
-
(NCO). A variable W is an NCO if it is known a priori not to be caused by exposure A and the association between A and W is subject to the same unmeasured confounding mechanism as between exposure A and outcome Y.
- Non-differential error
-
The measurement error of a confounder is said to be non-differential if the measured confounder is conditionally independent of the exposure and outcome, given the true confounder.
- Statistical inference
-
The process of using a sample to make inferences about a population.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, J.K., Tchetgen Tchetgen, E.J. & Dominici, F. Using negative controls to adjust for unmeasured confounding bias in time series studies. Nat Rev Methods Primers 3, 66 (2023). https://doi.org/10.1038/s43586-023-00249-4
Accepted:
Published:
DOI: https://doi.org/10.1038/s43586-023-00249-4