Introduction

Non-Hispanic Black Americans have longer leukocyte telomere length (LTL) than non-Hispanic White Americans, despite having greater exposure to risk factors for short LTL (e.g., low socioeconomic status (SES) and interpersonal discrimination), as well as increased risk of health-related outcomes associated with short telomeres (e.g., cardiovascular disease and premature mortality)1. While some have argued that race differences in LTL are due, at least in part, to genetic variation2,3,4, emerging evidence reviewed below (and discussed in1) points to a potential environmental explanation for the counterintuitive finding that Black Americans have longer LTL than White Americans. Using data from the National Health and Nutrition Examination Survey (NHANES), we examine whether Black/White differences in LTL are partially mediated by differences in exposure to persistent organic pollutants.

Persistent organic pollutants and LTL

Telomeres are the protective caps at the ends of chromosomes that promote chromosomal stability5. Due to the end replication problem, telomeres shorten every time a cell divides6. DNA replication stress and oxidative damage also contribute to telomere shortening5,7. Telomerase can counteract shortening by elongating and protecting telomeres8, but this enzyme is kept downregulated in normal human cells5. Once telomeres become critically shortened, cellular senescence is triggered, causing cells to lose the ability to grow and divide9,10. Thus, telomere shortening is considered a hallmark indicator of cellular aging11,12. Recent Mendelian randomization studies suggest that short LTL may be a causal determinant of degenerative diseases, including cardiovascular disease and Alzheimer’s disease, while long LTL may be a causal determinant of some types of cancer, including lung, bladder, endometrial, and testicular cancer13,14,15,16,17,18,19,20,21.

A growing body of epidemiologic and experimental evidence suggests that LTL can be a sensitive endpoint for environmental chemicals, such as persistent organic pollutants (POPs), that are capable of promoting carcinogenesis. Polychlorinated biphenyls (PCBs), furans, and dioxins are ubiquitous POPs that bioconcentrate in the food chain and accumulate in adipose tissue. PCBs were once widely manufactured and used as coolants or lubricants, whereas furans and dioxins are unintentionally produced as industrial byproducts in a variety of commercial and industrial settings. Although PCB production and use were banned in the mid-1970s in the United States, most people continue to be exposed at low levels due to the long half-lives of these chemicals both in the environment and in the human body. Human exposure to PCBs can occur by living close to PCB-contaminated waste sites, ingestion of contaminated food, ingestion and/or inhalation of contaminated indoor dust and air, and from occupational environments22. Furans and dioxins are currently regulated as hazardous air pollutants, but they continue to be produced during most forms of combustion, including burning of municipal and medical waste and as part of industrial processes23.

To date, most epidemiologic studies have observed a positive association between exposure to POPs and LTL. For example, previous research using nationally representative data from NHANES has shown that exposure to PCBs, which are classified as human carcinogens by the International Agency for Research on Cancer22,24, is associated with longer LTL25,26,27,28. These cross-sectional associations were observed both for individual chemicals (i.e., PCB congeners, dioxins, furans) as well as their mixtures27. Associations between POPs and longer LTL have also been observed in more highly exposed populations, such as residents of Anniston, Alabama, where PCBs were historically manufactured29,30. Population-based studies in Korea31 and Iran32,33 have reported similar findings. In contrast to these results, some previous studies have found that exposure to POPs is associated with shorter LTL. Age-adjusted median LTL in lymphocytes, but not granulocytes, was shorter in German workers occupationally exposed to PCBs compared to healthy controls34. In the only longitudinal study to date, serum concentrations of PCB 153 were associated with increased relative LTL shortening over ten years of follow-up in a subset of elderly respondents from the Helsinki Birth Cohort35.

A recent experimental study is consistent with the epidemiologic evidence suggesting that exposures to PCBs and dioxins are associated with longer LTL. In the only in vivo study to date, researchers found that exposure to 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) and several PCB congeners, both alone and in combination, resulted in an increased LTL. Specifically, rats exposed to PCBs and TCDD had increased LTL in their liver and lung tissues as well as altered expression of genes related to telomere maintenance in their livers36.

Although the mechanisms linking exposure to POPs with longer LTL have yet to be fully elucidated, previous in vitro studies suggest that upregulation of telomerase may play a role. Telomerase is an enzyme that elongates telomeres and is hypothesized to confer uncontrolled replicative ability on a cell37. Two previous studies found that exposure to PCBs activates the proto-oncogene c-myc38,39, which is involved in activation of the telomerase reverse transcriptase (TERT) gene40. In another study, expression of TERT was upregulated in human choriocarcinoma cells treated with dioxin41. Drawing on previous work demonstrating that dioxins are potent aryl hydrocarbon receptor (AhR) agonists42, the authors suggested that AhR activation may mediate the association between exposure to dioxins and upregulation of telomerase41.

Race differences in exposure to carcinogens

Evidence from multiple sources indicates that Black Americans have greater exposure to POPs than White Americans. Analysis of nationally representative data from NHANES indicates that Black participants have the highest PCB exposures of all racial/ethnic groups, and racial disparities in exposure are particularly pronounced for older adults who may have been exposed before the ban on production and use43. While fish consumption is an important source of PCB exposure in the general population, this exposure route does not explain the relatively high PCB body burden in Black adults44. Among 765 adults from Anniston, Alabama, where PCBs were historically manufactured, the geometric mean of the summed PCB congeners was more than 2.5 times higher for Black participants than for White participants (866 ng/g lipid vs. 331 ng/g lipid); this difference was not explained by age, sex, body mass index (BMI), or smoking status30. Results from the Anniston Community Health Survey also revealed that Black participants had a significantly higher average total dioxin toxic equivalencies (TEQ) score than White participants (33.1 vs. 19.2 pg/g lipid) after adjusting for age and sex45. Finally, in a preliminary analysis of data from the NIH-AARP Diet and Health Study linked with the US EPA database of 4,478 historical sources of polychlorinated dibenzo-p-dioxins and dibenzofurans (PCDD/F), investigators found that Black Americans were nearly three times as likely as White Americans to live within 5 km of a PCDD/F emitting facility46.

These data suggest that social-structural drivers may contribute to racial disparities in POP exposures. Residential segregation in the US remains deeply entrenched, despite laws prohibiting discrimination in housing on the basis of race47, and residents of predominantly Black neighborhoods are significantly more likely to be exposed to environmental hazards, including carcinogens48,49. Research suggests that Black Americans with high SES are only slightly less likely than those with low SES to live in residentially segregated, low-income areas50,51. Previous research has also shown that Black Americans are more likely to be exposed to carcinogens at work52,53.

Hypotheses

This study brings together three separate lines of evidence to examine the hypothesis that Black/White differences in LTL are explained by differences in exposure to carcinogens. First, previous research has consistently shown that Black Americans have longer LTL than White Americans, although the reasons why remain unclear (see Needham et al.1 for a review). Next, there is growing evidence that exposure to POPs is associated with longer LTL26,27,28,29,30,31,32,33. Finally, prior studies have found that Black adults have greater exposure to POPs than White adults35,44,45,46. We use mediation analysis to quantify the indirect effect of race/ethnicity on LTL through exposure to PCBs, furans, and dioxins. We present single mediator models, followed by five multivariate-mediator methods, including four that use different approaches to account for the highly correlated nature of environmental chemicals.

Methods

Data collection and study population

NHANES consists of cross-sectional national surveys conducted by the U.S. Centers for Disease Control and Prevention (CDC) to monitor the health and nutritional status of the population55. NHANES provides information about their Ethics Review Board Approval, and all experimental protocols were approved by the Institutional Review Board at the CDC55. Informed consent was obtained during the interviews, and all methods were carried out in accordance with relevant guidelines and regulations. Detailed documentation for the NHANES protocol and methods can be found online. We use data from the 1999–2000 and 2001–2002 cycles of NHANES for this secondary data analysis. Data are anonymized by NHANES prior to use, and the data are freely available for the public. Because our study used a publicly available data set that does not include information that can be used to identify individuals, it is not considered human subjects research and, therefore, does not require approval from the University of Michigan IRB.

NHANES uses a complex, multistage, probability sampling design to obtain a sample that is representative of the US civilian noninstitutionalized population. Older adults (aged 60+) and Black and Hispanic Americans are oversampled in order to produce reliable statistics for these groups. NHANES 1999–2002 includes 21,004 respondents aged two months and older. PCBs, furans, and dioxins were measured in a subset of 4821 respondents. For this analysis, we excluded respondents whose self-reported race/ethnicity was not Black or White (n = 1743) or who were under the age of 20 (n = 910), since DNA samples are only available for those aged 20 and older. We also removed participants with at least one missing exposure among the subset of POPs of interest (n = 838; described further below). Respondents with missing blood cell count and distribution variables (n = 8), serum cotinine (n = 13), LTL (n = 57), and education (n = 1) were also removed. Our final analytic sample consists of 1,251 (321 Black and 930 White) study participants (see Supplemental Fig. 1).

Measures

Outcome variable: telomere length (Y)

Aliquots of purified DNA from respondents who consented specifically to future genetic research were provided by the laboratory at the Division of Health and Nutrition Examination Surveys, National Center for Health Statistics, Centers for Disease Control and Prevention. Five 96-well quality control plates representing 5% of the complete set were also provided, and duplicate samples were blinded. DNA was extracted from purified whole blood using the Puregene (D-50 K) kit protocol (Gentra Systems, Inc., Minneapolis, Minnesota) and stored at –80° C. The telomere length assay was performed in the Blackburn Laboratory at the University of California, San Francisco, using the quantitative polymerase chain reaction (qPCR) method to measure telomere length relative to standard reference DNA (T/S ratio)56,57.

The telomere thermal cycling profile consisted of cycling for T (telomic) PCR: 96 °C for 1 min; denature at 96 °C for 1 s, anneal/extend at 54 °C for 60 s, with fluorescence data collection, 30 cycles. Cycling for S (single copy gene) PCR consisted of the following: 96 °C for 1 min; denature at 95 °C for 15 s, anneal at 58 °C for 1 s, extend at 72 °C for 20 s, 8 cycles; followed by denature at 96 °C for 1 s, anneal at 58 °C for 1 s, extend at 72 °C for 20 s, hold at 83 °C for 5 s with data collection, 35 cycles. The primers for the telomere PCR were tel1b [5′-CGGTTT(GTTTGG)5GTT-3′], used at a final concentration of 100 nM, and tel2b [5′-GGCTTG(CCTTAC)5CCT-3′], used at a final concentration of 900 nM. The primers for the single-copy gene (human beta-globin) PCR were hbg1 (5′ GCTTCTGACACAACTGTGTTCACTAGC-3′), used at a final concentration of 300 nM, and hbg2 (5′-CACCAACTTCATCCACGTTCACC-3′), used at a final concentration of 700 nM. The final reaction mix contained 20 mM Tris-hydrochloride (HCl), pH 8.4; 50 mM potassium chloride (KCl); 200 μM each deoxynucleotide (dNTP); 1% dimethyl sulfoxide (DMSO); 0.4 × SYBR Green I; 22 ng Escherichia coli DNA per reaction; and 0.4 units of Platinum Taq DNA polymerase (Invitrogen Inc.) per 11-μL reaction.

Each sample was assayed three times on three different days. Samples were assayed on duplicate wells, producing six data points. Sample plates were assayed in groups of three plates, and no two plates were grouped together more than once. Each assay plate contained 96 control wells with eight control DNA samples. Assay runs with eight or more invalid control wells were excluded from further analysis (< 1% of runs). Control DNA values were used to normalize between-run variability. Runs with more than four control DNA values falling outside 2.5 standard deviations from the mean for all assay runs were excluded from further analysis (< 6% of runs). For each sample, potential outliers were identified and excluded from further analysis (< 2% of samples). Finally, the mean and standard deviation of the T/S ratio were calculated normally, excluding potential outliers. DNA samples were coded, and the lab was blinded to all other measurements in the study. The CDC conducted a quality control review before linking the telomere data to the NHANES public use data files. If more than 5% of the duplicate samples on the quality control plates were discordant with their pair in the complete set, then the variant failed quality control58. The measure of LTL used in this study is an average across all leukocyte cell types, and LTL is log-transformed prior to analysis.

Exposure variable: race/ethnicity (A)

Race and ethnicity were self-reported and NHANES staff recoded these into five categories: non-Hispanic Black, non-Hispanic White, Mexican American, other Hispanic, and other race, including multi-racial. In line with previous work59,60, we consider self-reported race/ethnicity to be a variable that reflects an individual’s position within the racialized social hierarchy. As such, it is an important determinant of exposure to a broad array of health-related risks and protective factors, including exposures related to environmental racism. Though racial and ethnic categories are socially constructed, we acknowledge that these categories may also reflect differences in genetic ancestry. We chose to limit this analysis to individuals who self-identify as Black or White because differences in LTL by race/ethnicity are greatest for these groups and because the counterintuitive Black/White differences have yet to be explained.

Mediator variables: PCBs, furans, and dioxins (M)

Analytic methods for quantification of PCBs, furans, and dioxins in serum have been described in detail elsewhere26. Briefly, congeners were extracted from serum specimens using a C18 solid phase extraction and measured using high-resolution mass gas chromatography/spectrometry61,62. The analytical runs for each chemical were blinded and included quality control samples. Limits of detection (LOD) were typically around 2 ng/g but varied by serum volume. The LOD range for each chemical is reported in Mitro et al.26 The NHANES dataset includes flag variables indicating whether each observation was above or below the sample specific limit of detection. Since NHANES does not provide a single lipid concentration variable for each individual, we calculated serum lipids using Phillip’s short formula based on cholesterol and triglycerides data63.

Adjustment covariates (Z)

We considered adjustment for the following potential confounders of the exposure-outcome (A-Y) and mediator-outcome (M-Y) relationships: standardized age (linear and quadratic terms), sex, educational attainment, serum cotinine (log-scale), and lipids (log-scale)64. We also adjusted for white blood cell count, percent lymphocytes, percent neutrophils, percent eosinophils, percent basophils, and percent monocytes. Finally, due to potential batch effects across survey years, we controlled for the survey cycle (1999–2000 vs. 2001–2002) in all models.

Statistical analysis

Multiple imputation procedure

The NHANES dataset includes an indicator variable for whether the concentration was above or below the LOD. In this analysis, we excluded environmental chemicals when more than 50% of the observations were below their respective detection limits in either the 1999–2000 or the 2001–2002 cycle. Thus, we considered 17 chemicals with less than 50% below their respective detection limits in the following analyses (as was done for PCBs in Mitro et al.26). Given the high percent of non-detects, we performed multiple imputation instead of using the simple imputation of non-detects with LOD divided by square root of two, which was provided by the CDC. Non-detects for the 17 environmental chemicals were multiply imputed using an iterative application of censored likelihood multiple imputation65. The imputation strategy generated 10 imputed datasets for each survey cycle separately to account for potential batch effects between the two NHANES cycles. Environmental chemicals were ordered by the percent below their respective detection limits in the 1999–2000 cycle and imputed sequentially from the lowest percent below LOD to the highest percent below LOD. In the imputation and mediation models described next, all chemicals were log-transformed66. Imputation models for each log-transformed environmental chemical were conditional on log-transformed LTL, race, education, age, sex, log-transformed serum cotinine, log-transformed blood composition variables and lipids, and all previously imputed, log-transformed environmental chemicals. Imputation quality for each chemical was visually assessed. Of the 17 imputed environmental chemicals, the final two imputed chemicals, PCB 156 and PCB 99, had unusually low imputed concentrations in the 1999–2000 cycle. We therefore decided to exclude PCB 156 and PCB 99 from all analyses. This left 15 total environmental chemicals for all subsequent analyses.

Descriptive statistics and exploratory analysis

We started by summarizing the marginal distributions of LTL, environmental chemicals, and covariates, in the full sample and stratified by race. The goal of exploratory analyses stratified by race was to check that the unadjusted associations between variables of interest and race obtained from our study population were concordant with associations previously reported in the literature. Next, we conducted exploratory analysis of the existing correlation structure among the POPs. The non-negligible pairwise correlations between the POPs are seen in the heat map shown in Supplemental Fig. 2, which suggest that an appropriate approach to mediation will need to handle multicollinearity issues.

Single mediator approach

Our primary analytical goal was to estimate the indirect effect of race on LTL that is mediated through PCB, furan, and dioxin exposure (Fig. 1). First, we assessed the relationship between race and LTL to ensure there existed a non-null total effect (direct effect + indirect effect). More rigorously, we fit the Y|A,Z model, \({Y}_{i} = {\mu }_{0} + {\mu }_{a }{A}_{i} + {{\varvec{Z}}}_{{\varvec{i}}}^{{\varvec{T}}}{{\varvec{\mu}}}_{\boldsymbol{ }{\varvec{Z}}}+ {\epsilon }_{i\mu }\), where \({Y}_{i}\) is log-transformed LTL for the \(i\)-th individual, \({A}_{i}\) is the binary indicator of Black/White race for the \(i\)-th individual, \({{\varvec{Z}}}_{{\varvec{i}}}\) is the vector of adjustment covariates for the \(i\)-th individual, and \({\epsilon }_{i\mu }\sim N\left(0,{\sigma }_{\mu }^{2}\right)\) is the residual error for the \(i\)-th individual. We tested whether \({\mu }_{a}\), the total effect of race, is significantly different from zero using a Wald test. The direct and indirect effect estimates were then used to ascertain how much of the race and LTL association is explained through exposure to PCBs, furans, and dioxins.

Figure 1
figure 1

Conceptual models being considered in this work. Panel (A) shows the total effect model (conditional on adjustment covariates). In panel (B) we show the mediation model. The bolded line and corresponding coefficient show the direct effect. \({\alpha }_{aj}\) and \({\beta }_{mj}\)(j = 1,…, 15 contaminants) show the relationships between race and the chemical and the chemical and LTL for each chemical respectively.

To calculate the direct and indirect effects, we used the approach of Baron and Kenny67. We first constructed single mediator mediation models for race and each environmental mediator separately, adjusting for the potential confounding variables and adjustment covariates. More specifically, we first constructed the M|A,Z models, which are of the form \({M}_{ij }={\alpha }_{0j} + {\alpha }_{aj }{A}_{i}+{{\varvec{Z}}}_{{\varvec{i}}}^{{\varvec{T}}}{\boldsymbol{\alpha }}_{{\varvec{z}}{\varvec{j}}} + {\epsilon }_{ij\alpha }\) for the \(j\)-th environmental mediator, where \({M}_{ij}\) denotes the \(j\)-th environmental mediator for the \(i\)-th individual and \({\epsilon }_{ij\alpha }\sim N\left(0,{\sigma }_{j\alpha }^{2}\right)\) is the residual error for the \(i\)-th individual and \(j\)-th environmental mediator. The second set of models were the Y|M,A,Z models, which are of the form \({Y}_{i} = {\beta }_{0j} + {\beta }_{aj }{A}_{i}+{\beta }_{mj}{M}_{ij} + {{\varvec{Z}}}_{{\varvec{i}}}^{{\varvec{T}}}{{\varvec{\beta}}}_{{\varvec{z}}{\varvec{j}}}+ {\epsilon }_{ij\beta }\) for the \(j\)-th environmental mediator, where \({\epsilon }_{ij\beta }\sim N(0,{\sigma }_{j\beta }^{2})\) is the residual error for the \(i\)-th individual. Substituting the M|A,Z models into the corresponding Y|M,A,Z models, the total effect can be expressed as, \({\mu }_{a}={\beta }_{aj}+{\alpha }_{aj}{\beta }_{mj}\), where \({\beta }_{aj}\) denotes the direct effect of the \(j\)-th environmental mediator and \({\alpha }_{aj}{\beta }_{mj}\) denotes the indirect effect of the \(j\)-th environmental mediator. Sobel’s tests with a Benjamini–Hochberg correction to adjust for multiple testing were used to determine whether the indirect effect estimates \({\widehat{\alpha }}_{aj}{\widehat{\beta }}_{mj}\) were significantly different from zero, providing evidence that part of the total effect is mediated through environmental mediator \(j\)68,69. Moreover, we also calculated the percent mediated as \(100\times {\alpha }_{aj}{\beta }_{mj}/{\mu }_{a}\). The Y|A,Z model, the M|A,Z models, and the Y|M,A,Z models all accounted for the NHANES sampling design, including survey weights and stratified cluster sampling, using the survey package in R70.

Note that, in order for the direct and indirect estimates from the Baron and Kenny method to have a causal interpretation, the following assumptions must hold71,72:

  1. 1.

    No unmeasured A \(\to\) Y (race-telomere) confounding.

  2. 2.

    No unmeasured M \(\to\) Y (POPs-telomere) confounding conditional on A.

  3. 3.

    No unmeasured A \(\to\) M (race-POPs) confounding.

  4. 4.

    No M \(\to\) Y confounder that is caused by A (POP-telomere caused by race).

We will discuss the plausibility of these assumptions and the interpretation of our results in the causal context in the discussion section. The assumption of no exposure-mediator interaction is not strictly required for estimation of indirect effects, though we explore this relationship in sensitivity analyses.

Multivariate mediation methods

Following the single mediator analysis, we then considered multiple multivariate mediation analysis methods, which quantify a global mediation effect of all POPs (see Table 2 for a comparison of all multivariate mediation methods used in this article). The most common of these approaches involves fitting an unpenalized linear regression outcome model using least squares, \({Y}_{i} = {\beta }_{0} + {\beta }_{a }{A}_{i}+{{\varvec{M}}}_{{\varvec{i}}}^{{\varvec{T}}}{{\varvec{\beta}}}_{{\varvec{m}}} + {{\varvec{Z}}}_{{\varvec{i}}}^{{\varvec{T}}}{{\varvec{\upbeta}}}_{\mathbf{z}}+{\epsilon }_{i\beta }\), along with the mediator model \({{\varvec{M}}}_{{\varvec{i}}\boldsymbol{ }}={\boldsymbol{\alpha }}_{0\boldsymbol{ }}+{A}_{i} {\boldsymbol{\alpha }}_{{\varvec{a}}\boldsymbol{ }}+{\boldsymbol{\alpha }}_{{\varvec{z}}}{{\varvec{Z}}}_{{\varvec{i}}} + {{\varvec{\epsilon}}}_{{\varvec{i}}\boldsymbol{\alpha }}\), where \({{\varvec{M}}}_{{\varvec{i}}}\) is a vector of candidate environmental mediators for the \(i\)-th individual, \({\epsilon }_{i\beta }\sim N(0,{\sigma }_{\beta }^{2})\), and \({{\varvec{\epsilon}}}_{{\varvec{i}}\boldsymbol{\alpha }\boldsymbol{ }}\sim MVN(0,{\boldsymbol{\Sigma }}_{{\varvec{m}}})\). Here \({\boldsymbol{\alpha }}_{{\varvec{a}}}^{{\varvec{T}}}{{\varvec{\beta}}}_{{\varvec{m}}}\) is the indirect effect \(, {\beta }_{a}\) is the direct effect, and \({\beta }_{a }+{\boldsymbol{\alpha }}_{{\varvec{a}}}^{{\varvec{T}}}{{\varvec{\beta}}}_{{\varvec{m}}}\) is the total effect where \({\boldsymbol{\alpha }}_{0}\) and \({\boldsymbol{\alpha }}_{{\varvec{a}}}\) are vectors and \({\boldsymbol{\alpha }}_{{\varvec{z}}}\) is a matrix. However, this approach does not assuage variance inflation induced by the highly collinear structure of serum POP concentrations in the outcome model. To address the issue of variance inflation, we next considered a ridge-penalized linear regression outcome model, which utilizes a penalty term to shrink estimated coefficients toward zero73. For both of these outcome models, there were several challenges. One challenge specific to the ridge-penalized outcome model is the lack of consensus for obtaining analytical inferential quantities such as confidence intervals and p-values when the tuning parameter in the ridge penalty is selected via cross-validation. To circumvent this issue, we used the bootstrap to obtain confidence intervals and p-values for both the unpenalized outcome model and the ridge-penalized outcome model. The other major issue is that, to our knowledge, there is no existing R software to implement multivariate regression models fully adjusted for the complex survey design of NHANES. Therefore, we were not able to account for complex survey design for these two methods.

Environmental mediator summary scores

To deal with the challenge of incorporating survey design elements, we considered three additional analytical methods based on summary score constructions, where a linear combination of the environmental chemicals was used as the mediating variable. By collapsing the information contained in all environmental chemical measures into summary score metrics, the scores were then used as candidate mediators in the single mediator framework to fully incorporate the NHANES complex survey design. The purpose of constructing three distinct environmental chemical scores was to check the sensitivity of the results to different methods of constructing the summary scores. The environmental chemical scores were defined as a linear combination of the chemical concentrations \({\sum }_{j=1}^{15}{w}_{j}{M}_{ij}\), where the weights \({w}_{j}\) were determined using the following three approaches (i) principal components analysis (PCA), (ii) the first principal direction of mediation (PDM)74, and (iii) the Toxic Equivalency Quotient (TEQ) score.

PCA is a dimension reduction technique that constructs linear combinations of environmental chemicals such that the linear combinations are uncorrelated with one another. Each linear combination is called a principal component, with the first principal component explaining the most variability. All subsequent principal components explain the most remaining variability that is unexplained by the previous principal components. The weights for the PCA-based score were determined by the first principal component in the environmental chemical space. PDM is similar to PCA, with the difference being that the first principal direction of mediation also takes the outcome model into account while deriving the weights. Rather than being derived in a data adaptive manner, the weights for the TEQ score are based on a priori toxicologic information75. In the latter, the World Health Organization’s (WHO) well-established toxic equivalency factor (TEF) broadly incorporates additional biological information76. These scores are a measure of potency in reference to the chemical 2,3,7,8-tetrachlorodibenzo-p-dioxin. From these relative scores, we used the TEQ as created in Mitro et al. which represents a composite score of potency-weighted exposure26. The relevant contaminants with a TEQ score were weighted using these values, and each individual’s weighted average was used as a single mediator in the model. Contaminants without an available TEQ were excluded from the construction of the TEQ score.

Sensitivity analyses

We repeated the single mediator models with the inclusion of an exposure-mediator interaction term. In the multivariate setting, we also looked at the interaction between the PCA-based score and race. The purpose of this sensitivity analysis was to check if a potential interaction exists and to verify the mediation model mean structure assumptions. Next, we restricted the PCA summary score method to each subclass of POPs in our dataset (PCBs, non-ortho PCBs, non-dioxin-like PCBs, dioxins, furans). The purpose of this analysis was to understand which chemical classes most contribute to the global indirect effect. Lastly, we assessed the sensitivity of unmeasured confounding on our results using the mediation E-value for our PCA summary score method which can be calculated using the "EValue” R package or corresponding online calculator77.

Results

We present descriptive statistics for all study variables in Table 1. In total, there were 930 White respondents and 321 Black respondents in our analytic sample. Of these, 53.3% of the White respondents and 54.8% of the Black respondents were female, and the Black respondents (Mean: 45.9; SD: 16.7) were younger than the White respondents (Mean: 52.5; SD: 19.8). Supplemental Fig. 2 shows the high collinearity among POPs both within and across chemical classes. The pairwise correlations between the PCBs range between 0.51 and 0.98. The dioxins and furans had pairwise correlations ranging from 0.33 to 0.80. The pairwise correlations between PCBs and dioxins/furans range from 0.25 to 0.72.

Table 1 NHANES sample characteristics, variables, and notation to be used in the analyses. Counts (%) and mean (SD) or median and interquartile range (IQR) are presented for the exposure and outcome variables.

To determine whether environmental chemicals mediate Black/White differences in LTL, we began by estimating the total effect of race on LTL. As shown in Supplemental Table 1, Black respondents had significantly longer LTL than White respondents in covariate-adjusted models (b = 0.054; CI 0.009, 0.099; p = 0.018). Next, we estimated indirect effects of the 15 potential mediators using the single mediator approach, with correction for multiple testing. In the M|A,Z models, Black respondents had significantly higher levels of 11 out of 15 environmental chemicals examined, including seven PCBs, two furans, and two dioxins (see Fig. 2). The chemicals that showed the greatest differences between racial groups were PCB 187, PCB 138, PCB 153, PCB 118, and PCB 126. In the Y|M,A,Z models, 10 of the 15 environmental chemicals were significantly associated with LTL (see Fig. 3). While each of the 10 PCBs were significantly associated with longer LTL, only one of the three dioxins was significantly associated with longer LTL, and none of the three furans were significantly associated with LTL.

Figure 2
figure 2

Results of the effect of self-reported Black race on exposure to environmental chemicals. Point estimates and 95% confidence intervals are shown.

Figure 3
figure 3

Results of the effect of environmental chemicals on leukocyte telomere length, controlling for self-reported race. Point estimates and 95% confidence intervals are shown.

Indirect effects are calculated from the M|A,Z and Y|M,A,Z models. The results for our mediation analysis, as shown in Fig. 4, showed that there were significant indirect effects (IE) of race on LTL through PCB 118 (IE = 0.015; 95% CI 0.005, 0.026; 28.6% mediated), PCB 138 (IE = 0.018; CI 0.008, 0.029; 34.0% mediated), PCB 153 (IE = 0.019; CI 0.009, 0.029; 34.5% mediated), PCB 170 (IE = 0.011; CI 0.003, 0.019; 20.5% mediated), PCB 180 (IE = 0.010; CI 0.002, 0.018; 18.4% mediated), and PCB 187 (IE = 0.024; CI 0.012, 0.035; 43.4% mediated) after correction for multiple testing. The direct effect of race on LTL was not statistically significant in models adjusting for PCBs that had significant indirect effects (see Fig. 5). We did not observe any significant interaction effects between the environmental chemical mediators and race from the sensitivity analysis (results not shown).

Figure 4
figure 4

Results of the indirect effect of self-reported Black race on leukocyte telomere length through environmental chemicals. Point estimates and 95% confidence intervals are shown.

Figure 5
figure 5

Results of the direct effect of self-reported Black race on leukocyte telomere length, controlling for environmental chemicals. Point estimates and 95% confidence intervals are shown.

Given the limitations of the single mediator approach when examining a large number of highly collinear environmental chemicals as potential mediators, we also estimated indirect effects using multivariate and summary score mediator approaches. As shown in Tables 2 and 3, the unpenalized linear regression model estimated that 20.9% of the total effect of race on LTL was mediated by exposure to PCBs, furans, and dioxins (IE = 0.011; CI −0.004, 0.025), while the ridge penalized model estimated that 26.0% of the total effect was mediated by the full set of 15 environmental chemicals (IE = 0.013; CI 0.001, 0.023).

Table 2 Overview of analytical mediation models and methods considered in this analysis.
Table 3 Indirect effect and direct effect summary. Presented results include the indirect effect, direct effect, percent indirect effect, and corresponding confidence interval and p-values for the indirect effects across different mediation models.

Next, we examined three summary score methods (see Table 4 for a list of weights corresponding to each summary score). Since all of the weights for the PCA-based and TEQ-based exposure scores were non-negative, we can then conclude that higher values represent higher cumulative POP exposure. However, because the weights for the PDM-based exposure score are in both positive and negative directions, the interpretation of the exposure score is less straightforward. PCA (IE = 0.019; CI 0.009, 0.029; p = 0.001; 34.8% mediated) and TEQ (IE = 0.016; CI 0.005, 0.026; p = 0.003; 28.8% mediated) showed significant indirect effects, while PDM showed a non-significant indirect effect (IE = 0.000; CI −0.001, 0.002; 0.6% mediated). In the PCA score model, the inclusion of an interaction term between race and the PCA-based score in the outcome model did not result in a better model fit compared to an outcome model without the race by PCA score interaction term (p = 0.980). In sensitivity analyses, we examined the PCA approach within environmental chemical subclasses (see Table 4 for weights). We found a significant indirect effect of PCBs (IE = 0.020; CI 0.010, 0.029; p = 0.001; 36.2% mediated) but no significant indirect effects of furans or dioxins (see Table 5). Furthermore, we subdivided the PCBs into non-ortho-PCBs (IE = 0.008; CI 0.001, 0.015; p = 0.025; 14.4% mediated) and non-dioxin-like PCBs (IE = 0.018; CI 0.009, 0.028; p = 0.001; 25.9% mediated), which revealed that both subclasses were associated with a significant mediation effect.

Table 4 Exposure score weights for different analytic models considered.
Table 5 PCA subclass sensitivity analysis and results including direct, indirect effect, percent indirect effect, and confidence intervals and p-values for each subclass mediation model.

Our sensitivity analysis of the PCA mediation model included a calculation of the E-value, which estimates the strength of association that would be needed with unmeasured confounders to attenuate the mediation effect to a null level. Based on our estimates, the E-value was 1.35 (lower bound 1.22), meaning an unmeasured confounder would need to have a risk ratio of 1.35 in association with LTL and the summary score of environmental chemicals from PCA to diminish the indirect effect that we estimated down to zero.

Discussion

The purpose of this study was to determine whether exposure to POPs helps explain why Black Americans have longer LTL than White Americans, despite having more risk factors for short LTL1. Previous research using nationally representative NHANES data has shown that Black Americans are exposed to higher levels of POPs than White Americans44. While differences are not explained by behavioral factors, such as diet or smoking30,44, a growing body of evidence suggests social structural factors like residential48,49 and occupational52,53 segregation are important contributors to racial disparities in exposure to environmental chemicals78. Given recent observational and experimental studies linking exposure to POPs with longer LTL26,27,28,29,30,31,32,33,36, we hypothesized that Black/White differences in LTL are explained by differences in exposure to PCBs, furans, and dioxins. Using various analytic methods, we found support for the hypothesis that exposure to POPs, and PCBs in particular, partially mediates Black/While differences in LTL.

Although previous research has established that Black Americans have higher exposure to POPs35,44,45,46 and that exposure to POPs is associated with longer LTL26,27,28,29,30,31,32,33, no prior studies have considered the extent to which differences in exposure to environmental chemicals contribute to race differences in LTL. In single-pollutant models, we found significant IEs of race on LTL through five non-dioxin-like PCBs (138, 153, 170, 180, and 187) and one that is not dioxin-like, PCB 118. The estimated percent mediated ranged from 18 to 43%. Given the highly correlated nature of environmental chemicals, we also explored multivariate and summary score mediator approaches. We found evidence of significant IEs in ridge regression and models examining summative exposure scores (from PCA and the TEQ score). The estimated percent mediated ranged from 26 to 35% using these approaches. Sensitivity analyses examining principal components within chemical subclasses revealed significant IEs of PCBs (both non-ortho PCBs and non-dioxin-like PCBs) but not furans or dioxins. These findings suggest that exposure to PCBs is a potentially modifiable mechanism underlying Black/White differences in LTL in the US. Given evidence of causal associations between longer telomere length and some types of cancer13,14,15,16,17,18,19,20,21, it is important to identify modifiable risk factors for long LTL.

Although the various statistical approaches used in this study produced similar results overall, we observed some potentially important differences across methods. Of the multivariate and summary score mediation approaches, four out of the five methods estimated a percent mediated of greater than 20%, indicating a general agreement across methods. That being said, we also observed that multivariate mediation approaches that do not directly address the impact of variance inflation due to correlated POPs (i.e., unpenalized linear regression and PDM) show either no global mediation effect or an attenuated global mediation effect compared to other methods. As a result, OLS-based approaches should be used with caution when jointly modeling highly collinear environmental chemicals. Alternatively, when considering a joint analysis on the mediation effect attributable to exposure mixtures, we recommend either penalizing the regression coefficients (through a ridge, adaptive elastic net, or sparse group lasso penalty, for example), or collapsing exposures into summary scores with the weights selected in a data adaptive way (as with principal components analysis). We did not consider high-dimensional multivariate mediation methods79,80,81,82 because we only had 15 mediators in our analysis.

Strengths, limitations, and directions for future research

This study has several strengths, including the data source and analysis methods. First, the analysis was conducted on nationally representative data, which minimizes selection bias and helps ensure that results are generalizable to Black and White adults living in the US. Historically, much research regarding differences in LTL has been based on smaller samples with less sample diversity. NHANES is one of few studies with data on LTL and POPs for Black and White adults. The Anniston, Alabama sample is another, but it includes a smaller, more highly exposed population29. Another strength of our analysis is the methodological rigor of our mediation analysis. While not all existing mediation models could incorporate survey weights and sampling design elements, we have clearly laid out which methods have the capability to account for the weighting, strata, and PSU variables. It is possible that generalizability and correct inference are more limited when these variables cannot be included. Despite each method having limitations, the concordant results across methods with and without survey design elements strengthen our overall conclusions.

Despite its strengths, this study has several limitations. First, NHANES is a cross-sectional study, which means that we only have measures of POPs and LTL at one point in time. Given the long half-life of POPs in blood, it is possible that our measurements represent chronic exposure. However, these measures may not capture chemical exposures in early life, which may be an etiologically relevant time period. Studies with longitudinal data—including studies with children and adolescents—are needed to determine whether change in exposure to POPs is associated with change in LTL.

Additional limitations related to our exposure data include an inability to account for exposure measurement error at the statistical modeling stage and detection limit issues, which substantially reduce the set of exposures that can be used in the imputation and mediation models. Here we measured POPs, but as a future direction, it is important to include other chemicals (e.g., metals) that have previously been associated with LTL83 and race/ethnicity84,85.

Other potential limitations of this study relate to the mediation analyses. Mediation methods have been proposed under the traditional causal steps approach67 and the potential outcomes formulation71,72. To interpret the results of mediation analysis as causal effects, a number of assumptions must be met. Although we explored the possibility of exposure-mediator interaction, accounted for measured confounders of all relationships, and assessed the robustness of our results using the E-value, failure to account for unmeasured confounders could lead to non-identifiability of causal mediation effects in our observed data. For example, genetic factors that are causally related to telomere length may also be associated with self-reported race. A recent study using data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program identified 59 genetic variants associated with telomere length (estimated from whole-genome sequences) in an ancestrally diverse sample including people of European and African ancestry86. There was little evidence of effect size heterogeneity across ancestry groups, but the frequency of the variants differed for Black and White participants, suggesting that some of the observed Black/White differences in telomere length may be due to genetic factors. Unfortunately, NHANES does not have all of the genetic data needed to construct the TOPMed telomere length polygenic trait score. Thus, we cannot rule out the possibility of unmeasured confounding due to genetic determinants of telomere length.

In addition to the possibility of unmeasured confounding by genetic factors, our models may have violated the assumption that no confounders of mediator-outcome associations were affected by the exposure. Given that self-reported race is an important determinant of educational attainment in the US context, controlling for education as a potential confounder of associations between POPs and LTL risks violating one of the assumptions of causal mediation.

A causal interpretation also requires correctly specifying the temporal order of the variables. Here race precedes POP exposure, and it is unlikely that LTL would impact serum chemical concentrations. Thus, we are confident that the temporal order is correctly specified, despite the use of cross-sectional data. A final potential limitation of the mediation analysis concerns our exposure variable. Some causal inference scholars have argued that race coefficients cannot be interpreted causally because there is no reasonable hypothetical intervention on race87,88,89. However, this argument is not universally accepted. Because race is a social construction rather than a biological category, a number of scholars reject the claim that race is not manipulable90,91,92. The meaning of race is culturally defined90, and both race classifications and race relations vary across place and time91,92. Others have called into question the assumption that causal claims are only reasonable in the context of well-defined interventions90,91,93. Although we believe that the use of self-reported race is justified in this analysis, we also recognize the importance of expanding our model to include more proximate causes of telomere length that are amenable to standard policy interventions94.

Conclusions

Black/White differences in LTL are partially explained by differences in exposure to POPs, a potentially modifiable environmental risk factor. More work is needed to understand the complex relationships between environmental racism, chemical exposures, LTL, and cancer disparities. Although our results suggest that mediation analysis is a useful technique for identifying environmental mechanisms underlying race differences in LTL, differences across methods arising from the way in which variance inflation is handled suggest that OLS-based approaches should be used with caution for correlated exposure data. Researchers who wish to study environmental chemicals as mediators should explore appropriate analytic tools for their research question, including considerations about the collinear nature of their data and the capability of various methods to account for survey design elements when necessary.