Latent classes for chemical mixtures analyses in epidemiology: an example using phthalate and phenol exposure biomarkers in pregnant women


Latent class analysis (LCA), although minimally applied to the statistical analysis of mixtures, may serve as a useful tool for identifying individuals with shared real-life profiles of chemical exposures. Knowledge of these groupings and their risk of adverse outcomes has the potential to inform targeted public health prevention strategies. This example applies LCA to identify clusters of pregnant women from a case–control study within the LIFECODES birth cohort with shared exposure patterns across a panel of urinary phthalate metabolites and parabens, and to evaluate the association between cluster membership and urinary oxidative stress biomarkers. LCA identified individuals with: “low exposure,” “low phthalates, high parabens,” “high phthalates, low parabens,” and “high exposure.” Class membership was associated with several demographic characteristics. Compared with “low exposure,” women classified as having “high exposure” had elevated urinary concentrations of the oxidative stress biomarkers 8-hydroxydeoxyguanosine (19% higher, 95% confidence interval [CI] = 7, 32%) and 8-isoprostane (31% higher, 95% CI = −5, 64%). However, contrast examinations indicated that associations between oxidative stress biomarkers and “high exposure” were not statistically different from those with “high phthalates, low parabens” suggesting a minimal effect of higher paraben exposure in the presence of high phthalates. The presented example offers verification of latent class assignments through application to an additional data set as well as a comparison to another unsupervised clustering approach, k-means clustering. LCA may be more easily implemented, more consistent, and more able to provide interpretable output.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1
Fig. 2

Code availability

Accompanying code for the LCA methods is available at GitHub repository “LCAmix” from user “carrollrm.” This is available as an R markdown file to lead viewers through a simple example of performing these methods.


  1. 1.

    Taylor KW, Joubert BR, Braun JM, Dilworth C, Gennings C, Hauser R, et al. Statistical approaches for assessing health effects of environmental chemical mixtures in epidemiology: lessons from an innovative workshop. Environ Health Perspect. 2016;124:A227–A9.

    Article  Google Scholar 

  2. 2.

    Braun JM, Gennings C, Hauser R, Webster TF. What can epidemiological studies tell us about the impact of chemical mixtures on human health? Environ Health Perspect. 2016;124:A6–9.

    Article  Google Scholar 

  3. 3.

    Agresti A. Other mixture models for categorical data. In: Balding DJ, Bloomfield P, Cressie NAC, Fisher NI, Johnstone IM, Kadane JB, et al. eds. Categorical data analysis. Hoboken, NJ: Wiley; 2002. p. 538–75.

    Google Scholar 

  4. 4.

    Lazarevic N, Barnett AG, Sly PD, Knibbs LD. Statistical methodology in studies of prenatal exposure to mixtures of endocrine-disrupting chemicals: a review of existing approaches and new alternatives. Environ Health Perspect. 2019;127:026001.

    CAS  Article  Google Scholar 

  5. 5.

    Kalloo G, Wellenius GA, McCandless L, Calafat AM, Sjodin A, Karagas M, et al. Profiles and predictors of environmental chemical mixture exposure among pregnant women: the health outcomes and measures of the environment Study. Environ Sci Technol. 2018;52:10104–13.

    CAS  Article  Google Scholar 

  6. 6.

    Zanobetti A, Austin E, Coull BA, Schwartz J, Koutrakis P. Health effects of multi-pollutant profiles. Environ Int. 2014;71:13–9.

    CAS  Article  Google Scholar 

  7. 7.

    Ferguson KK, Cantonwine DE, McElrath TF, Mukherjee B, Meeker JD. Repeated measures analysis of associations between urinary bisphenol-A concentrations and biomarkers of inflammation and oxidative stress in pregnancy. Reprod Toxicol. 2016;66:93–8.

    CAS  Article  Google Scholar 

  8. 8.

    Ferguson KK, McElrath TF, Chen YH, Mukherjee B, Meeker JD. Urinary phthalate metabolites and biomarkers of oxidative stress in pregnant women: a repeated measures analysis. Environ Health Perspect. 2015;123:210–6.

    CAS  Article  Google Scholar 

  9. 9.

    McElrath TF, Lim KH, Pare E, Rich-Edwards J, Pucci D, Troisi R, et al. Longitudinal evaluation of predictive value for preeclampsia of circulating angiogenic factors through pregnancy. Am J Obstet Gynecol. 2012;207:407 e1–7.

    Article  Google Scholar 

  10. 10.

    Ferguson KK, McElrath TF, Meeker JD. Environmental phthalate exposure and preterm birth. JAMA Pediatr 2014;168:61–7.

    Article  Google Scholar 

  11. 11.

    Ferguson KK, Meeker JD, Cantonwine DE, Mukherjee B, Pace GG, Weller D, et al. Environmental phenol associations with ultrasound and delivery measures of fetal growth. Environ Int. 2018;112:243–50.

    CAS  Article  Google Scholar 

  12. 12.

    Ferguson KK, McElrath TF, Ko YA, Mukherjee B, Meeker JD. Variability in urinary phthalate metabolite levels across pregnancy and sensitive windows of exposure for the risk of preterm birth. Environ Int. 2014;70:118–24.

    Article  Google Scholar 

  13. 13.

    Wei T, Simko V. R package “corrplot”: visualization of a correlation matrix. 0.84 ed 2017.

  14. 14.

    Linzer DA, Lewis JB. poLCA: an R package for polytomous variable latent class analysis. J Stat Softw. 2011;42:1–29.

    Article  Google Scholar 

  15. 15.

    McCutcheon AL. Latent class analysis. Thousand Oaks, California: Sage Publications; 1987.

    Google Scholar 

  16. 16.

    Lin TH, Dayton CM. Model selection information criteria for non-nested latent class models. J Educ Behav Stat. 2016;22:249–64.

    Article  Google Scholar 

  17. 17.

    Forster MR. Key concepts in model selection: performance and generalizability. J Math Psychol. 2000;44:205–31.

    CAS  Article  Google Scholar 

  18. 18.

    Calafat AM, Ye X, Wong LY, Bishop AM, Needham LL. Urinary concentrations of four parabens in the U.S. population: NHANES 2005-2006. Environ Health Perspect. 2010;118:679–85.

    CAS  Article  Google Scholar 

  19. 19.

    Silva MJ, Barr DB, Reidy JA, Malek NA, Hodge CC, Caudill SP, et al. Urinary levels of seven phthalate metabolites in the U.S. population from the National Health and Nutrition Examination Survey (NHANES) 1999–2000. Environ Health Perspect. 2004;112:331–8.

    CAS  Article  Google Scholar 

  20. 20.

    Centers for Disease Control and Prevention. National Health and Nutrition Examination Survey: Sample design, 2007-2010. Available from: Accessed 16 Oct 2019.

  21. 21.

    MacQueen J ed. Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability; 1967: Oakland, CA, USA.

  22. 22.

    Brusco MJ, Shireman E, Steinley D. A comparison of latent class, K-means, and K-median methods for clustering dichotomous data. Psychol Methods. 2017;22:563–80.

    Article  Google Scholar 

  23. 23.

    Leisch F. A toolbox for k-centroids cluster analysis. J. Comput Stat. 2006;51:526–44.

    Article  Google Scholar 

  24. 24.

    Cohen J. A coefficient agreement for nominal scales. J Educ Psychol Meas. 1960;20:37–46.

  25. 25.

    Ferguson KK, Lan Z, Yu Y, Mukherjee B, McElrath TF, Meeker JD. Urinary concentrations of phenols in association with biomarkers of oxidative stress in pregnancy: Assessment of effects independent of phthalates. Env Int. 2019;131:104903.

    CAS  Article  Google Scholar 

  26. 26.

    Hendryx M, Luo J. Latent class analysis to model multiple chemical exposures among children. Environ Res. 2018;160:115–20.

    CAS  Article  Google Scholar 

  27. 27.

    Kordas K, Ardoino G, Coffman DL, Queirolo EI, Ciccariello D, Mañay N, et al. Patterns of exposure to multiple metals and associations with neurodevelopment of preschool children from Montevideo, Uruguay. J Environ Public Health. 2015;2015:493471.

    Article  Google Scholar 

  28. 28.

    Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. Boca Raton, FL: Chapman and Hall/CRC; 1984.

    Google Scholar 

  29. 29.

    Papathomas M, Molitor J, Richardson S, Riboli E, Vineis P. Examining the joint effect of multiple risk factors using exposure risk profiles: lung cancer in nonsmokers. Environ Health Perspect. 2011;119:84–91.

    Article  Google Scholar 

  30. 30.

    Stafoggia M, Breitner S, Hampel R, Basagana X. Statistical approaches to address multi-pollutant mixtures and multiple exposures: the state of the science. Curr Environ Health Rep. 2017;4:481–90.

    CAS  Article  Google Scholar 

  31. 31.

    Zhao S, Yu Y, Yin D, He J, Liu N, Qu J, et al. Annual and diurnal variations of gaseous and particulate pollutants in 31 provincial capital cities based on in situ air quality monitoring data from China National Environmental Monitoring Center. Environ Int. 2016;86:92–106.

    CAS  Article  Google Scholar 

  32. 32.

    White AJ, Keller JP, Zhao S, Kaufman JD, Sandler DP. Air pollution, clustering of particulate matter components and breast cancer. Cancer Epidemiol Biomark Prev. 2019;28:624.2–5.

    Article  Google Scholar 

  33. 33.

    Wang X, Mukherjee B, Batterman S, Harlow SD, Park SK. Urinary metals and metal mixtures in midlife women: the Study of Women's Health Across the Nation (SWAN). Int J Hyg Environ Health. 2019;222:778–89.

    Article  Google Scholar 

  34. 34.

    Snowden JM, Reid CE, Tager IB. Framing air pollution epidemiology in terms of population interventions, with applications to multipollutant modeling. Epidemiology. 2015;26:271–9.

    Article  Google Scholar 

Download references


This research was supported by the Intramural Research Program of the National Institute of Environmental Health Sciences (NIEHS), National Institute of Health (Z1AES103321). Additional funding was provided by NIEHS (R01ES018872 and R01ES029531).

Author information



Corresponding author

Correspondence to Kelly K. Ferguson.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Carroll, R., White, A.J., Keil, A.P. et al. Latent classes for chemical mixtures analyses in epidemiology: an example using phthalate and phenol exposure biomarkers in pregnant women. J Expo Sci Environ Epidemiol 30, 149–159 (2020).

Download citation


  • Latent class models
  • Mixtures methods
  • Phthalates
  • Phenols
  • Oxidative stress


Quick links