Gaussian graphical modeling of the serum exposome and metabolome reveals interactions between environmental chemicals and endogenous metabolites

Given the complex exposures from both exogenous and endogenous sources that an individual experiences during life, exposome-wide association studies that interrogate levels of small molecules in biospecimens have been proposed for discovering causes of chronic diseases. We conducted a study to explore associations between environmental chemicals and endogenous molecules using Gaussian graphical models (GGMs) of non-targeted metabolomics data measured in a cohort of California women firefighters and office workers. GGMs revealed many exposure-metabolite associations, including that exposures to mono-hydroxyisononyl phthalate, ethyl paraben and 4-ethylbenzoic acid were associated with metabolites involved in steroid hormone biosynthesis, and perfluoroalkyl substances were linked to bile acids—hormones that regulate cholesterol and glucose metabolism—and inflammatory signaling molecules. Some hypotheses generated from these findings were confirmed by analysis of data from the National Health and Nutrition Examination Survey. Taken together, our findings demonstrate a novel approach to discovering associations between chemical exposures and biological processes of potential relevance for disease causation.


Exposures to PFHxS and metabolic syndrome (MetS).
We observed that exposures to PFHxS were associated with changes in bile acid metabolism in women FF, and so we assessed whether exposure to this environmental chemical was linked to changes in clinical measures related to cholesterol homeostasis and energy balance in NHANES. We tested for associations between serum PFHxS concentrations and MetS and individual components of MetS (Fig. 4) in women 20-79 years of age enrolled in NHANES 2003-2014.
Of the adult women in the NHANES sample, 28.3% with measured PFHxS met NCEP/ATPIII criteria for MetS. No clear associations were observed for PFHxS and MetS. Since we found that exposures to this chemical were associated with a microbial-derived bile acid in our GGMs, we also tested whether the differences in gut Blue and red nodes represent endogenous metabolites and environmental chemicals, respectively. Edges connecting nodes represent significant partial correlations at FDR < 10%. www.nature.com/scientificreports/ microbiota can confound associations with MetS. However, further adjusting for recent use of antibiotics as a proxy of microbial composition had no effect on the odds ratios (ORs) for MetS (data not shown). Weak associations were found for PFHxS and individual components of MetS, with only the association with low HDL cholesterol reaching statistical significance (adj. OR for Q4 versus Q1: 0.43; 95% CI 0.21, 0.88). Further adjusting for recent use of antibiotics had no effect on the odds ratios (ORs) for MetS components and did not change the effect estimates (data not shown).

Exposures to PFAS and parabens and inflammatory responses.
Because we observed significant partial correlations between PFOS, PFHxS, ethyl paraben and butyl paraben and inflammatory signaling molecules in women FF and OW, we tested whether these exposures are associated with clinical markers of inflammatory responses in adult women in NHANES by exploring associations between serum or urinary concentration of these chemicals and clinical markers of inflammation and immune system function ( Table 2).

Discussion
To our knowledge, this is the first study to combine the serum exposome and metabolome using GGMs. The main goal of this study was to explore the links between the chemical exposome and the metabolome and generate hypotheses about possible health effects of exposures to a complex mixture of environmental chemicals. We www.nature.com/scientificreports/ computed GGMs of non-targeted LC-HRMS data to map direct associations between small molecules. After controlling for multiple testing (FDR < 0.1), we observed many direct associations, including metabolite-metabolite, chemical-metabolite and chemical-chemical associations. We found that most of the significant associations between metabolites corresponded to known biochemical reactions, supporting previous studies on the ability of GGMs to reconstruct metabolic pathways from MS-based metabolomics data [17][18][19] . Some metabolite-metabolite associations seem to also reflect molecules originating from the same source of exposures. For example, tryptophan was significantly correlated with arachidonic acid and myristic acid. Table 1. Summary of chemical-metabolite associations, biological changes and biological role of metabolites. a Metabolomics Standard Initiative (MSI) level of identification; level 1: match based on accurate mass (± 10 ppm), fragmentation pattern and retention time with authentic standards; level 2: match based on accurate mass (± 10 ppm) and fragmentation pattern using mass spectra from public metabolomics libraries or in-silico fragmentation software; level 3: match based on accurate mass (± 10 ppm) only. b Biological role of metabolite was inferred from the Metab2MeSH web application that annotates compounds with MeSH terms and HMDB database. c Partial correlation coefficient. www.nature.com/scientificreports/ Although these metabolites are involved in different metabolic pathways, they possibly originated from dietary sources since they can co-occur within the same food component, such as eggs or meat 25 . When stratifying by occupation, most of metabolite-metabolite associations remained unchanged, while several chemical exposure-metabolite associations were only significant in FF. For example, the association between PFHxS and sulfolithocholylglycine was significant in FF (PCC = 0.12, p = 2.6 × 10 -5 ), but not observed in the whole cohort (PCC = 0.05, p = 0.01) or in OW (PC C = − 0.03, p = 0.02). The instability of certain associations is possibly due to inter-occupation variability in the levels of chemical exposures as well as the small sample size. LC-MS/MS analysis of PFAS revealed that serum levels of PFHxS were higher in FF [median = 4.1 ng/mL and interquartile range (IQR) = 5.7 ng/mL] compared to OW (median = 2.5 ng/mL and IQR = 3.8 ng/mL) 20 .
GGMs revealed that certain environmental chemicals were linked to endogenous metabolites involved in cholesterol homeostasis, energy metabolism, inflammation and steroid hormones biosynthesis. We found direct associations between 4-hydroxyactophenone, PFHxS and bile acids, including sulfolithocholylglycine, lithocholic acid glycine and chenodeoxycholic acid (CDCA) in women FF. PFHxS is a perfluoroalkyl substance (PFAS) used as an additive in a wide range of consumer products and food packaging due to surfactant and stain resistant properties 26 . PFAS are also components of firefighting foam and firefighter's protective clothing, so these are possible sources of exposure for women FF. A previous study of firefighters found positive associations between serum concentrations of PFOS and PFHxS and the number of years using firefighting foam 27 . As previously mentioned, we also observed higher serum PFHxS in women FF in this cohort, compared to the control group of OW 20 .
Bile acids (BAs) are cholesterol-derived signaling molecules with hormonal actions that regulate lipid, glucose and energy metabolism by activating several nuclear receptors, including the constitutive androstane receptor . Exposure-metabolites subnetworks identified by the GGMs inferred from the women FF metabolomics dataset. The subnetwork A represents associations of phenols and phthalates with metabolites involved in steroid hormone biosynthesis. The subnetwork B shows associations between 4-hydroxyacetophenone, PFOS and PFHxS and metabolites involved bile acids, arachidonic and vitamin D metabolism. Blue and red nodes represent endogenous metabolites and environmental chemicals, respectively. Plain and dashed edges connecting nodes represent positive partial correlations and negative partial correlations, respectively. Edges connecting nodes represent significant partial correlations at FDR < 10%. www.nature.com/scientificreports/ (CAR), pregnane X receptor (PXR), farnesoid X receptor (FXR) and vitamin D receptor (VDR), as well as G-protein coupled receptors 24,28,29 . For example, FXR activation has been found to induce expression of peroxisome proliferator activated receptor (PPAR)α and promote oxidation of lipids 30,31 . Previous research suggests that modulation of BAs levels may be an effective strategy for the treatment of components of MetS 32 . Based on the correlations between PFHxS and BAs in FF, we hypothesized that exposure to PFHxS is associated with MetS in women. When assessing components of MetS in NHANES, we found higher levels of PFHxS to be associated with decreased odds of HDL cholesterol < 50 mg/dL in adult women 20-79 years of age enrolled in NHANES. However, PFHxS had neutral effects on the prevalence of the MetS in adult women. A previous study showed that increased levels of serum PFHxS were associated with a lower prevalence of central obesity among male and female adults participants of NHANES 1999-2000 and 2003-2004 33 , but we did not see that association in our NHANES analysis of female adults. Although the study by Lin et al. 33 found that increased serum PFHxS were correlated with decreased odds of HDL cholesterol < 50 mg/dL, the association did not reach statistical significance. Another study found that early-life exposure to PFHxS was inversely associated with serum concentrations of leptin-a marker of adiposity and metabolic dysfunction -among children 34 . Similar to our results, these studies observed no evidence for an adverse effect of PFHxS on the prevalence of MetS.
BAs also mediate inflammatory and immunomodulatory actions 35,36 . BAs have detergent properties, in that they dissolve fats, and they can cause membrane perturbations, resulting in production of pro-inflammatory molecules and reactive oxygen and nitrogen species which damage DNA, contributing to mutation and apoptosis 37,38 .
The immune system orchestrates a network of biological processes to recognize pathogens and fight infections. The deregulation of immune functions can lead to increased susceptibility to infectious, autoimmune (e.g. rheumatoid arthritis, type 1 diabetes, lupus…) and inflammatory diseases (e.g. asthma, eczema, allergic rhinitis…), and cancer 39 . In addition to finding PFHxS correlated with BAs in our cohort, we also observed significant associations between PFOS, ethyl paraben and butyl paraben, and endogenous molecules involved in inflammatory reactions. Consistent with our GGM findings for PFOS, we found that among adult women in NHANES, serum PFOS concentrations were positively associated with inflammatory and immune reactions (positive associations  www.nature.com/scientificreports/ with CRP and lymphocyte count), while PFHxS was negatively associated with the absolute neutrophil count. Also consistent with our GGM findings, ethyl paraben and butyl paraben were negatively associated with inflammation markers in NHANES. Butyl paraben was also associated with a decrease in white blood cell count, while ethyl paraben did not affect markers of immunity. In previous cross-sectional studies, no significant associations were found between serum PFOS and CRP, and conflicting results were observed for other immune-related health conditions 40 . However, a recent prospective study has observed that elevated PFHxS concentrations at age 7 decreased anti-diphtheria and anti-tetanus antibody concentrations at age 13 41 . Parabens are widely used as antimicrobial preservatives in personal care products, pharmaceuticals, and food and beverages 42,43 . Parabens can interact with hormone receptors and have been detected in breast tissue 44,45 . Some parabens have demonstrated adverse effects on male reproduction at higher doses 46 . A previous study among pregnant women found that urinary concentrations of propyl paraben were associated with lower CRP concentrations 47 . Propyl paraben and butyl paraben have also been reported to increase the prevalence of aeroallergen sensitization in children 48,49 . The underlying mechanisms that mediate the effects of PFAS on cholesterol and glucose homeostasis, inflammation and immune response are thought to be related to PPAR-α,γ activation and fatty oxidation pathways 50-52 . www.nature.com/scientificreports/ The sub-networks of direct associations found in the present study seem to reinforce these relationships, which may possibly be mediated by PPAR-induced disruption of BAs metabolism and transport. Studies on the association between PFAS and BAs are limited. A previous animal study has demonstrated that mice treated with PFDA resulted in increased serum BAs, probably due to PPAR-α-driven down-regulation of BA transporters involved in the enterohepatic circulation, including the Na + -taurocholate cotransporting polypeptide (NTCP) 53 .
Recent studies have shown that PFHxS is a substrate for NTCP in both human and rat hepatocytes 54,55 . A case study involving eight subjects has found that oral treatment with BAs sequestrant increased fecal elimination of PFHxS 56 . Although no human study supporting the paraben-BAs association was found, in vitro studies have observed that paraben acts as a strong obesogenic compound through PPAR-γ-activation at high concentrations (high µM range) in murine 3T3-L1 cells 57,58 .
Since the BAs associated with PFHxS are produced by intestinal bacteria 38 , we hypothesized that the gut microflora composition may confound associations between these chemicals and effects on metabolic function and inflammation. But adjusting for recent use of antibiotics-as a proxy for loss of bacterial diversity-did not modify the associations. Further studies employing direct measurements of gut bacteria diversity (e.g. targeted measurements of microbial-derived metabolites or 16S rRNA sequencing to characterize microbial communities) are needed to decipher the possible role of the microbiome on the association between chemical exposures and health outcomes.
Many exposure-metabolite associations were not further tested because chemical exposures and/or health outcomes hypothesized were not measured in NHANES. This included direct associations of phthalates with steroid hormones, and metabolites involved in oxidative stress and inflammatory response ( Fig. 3A and Table 1). The associations we observed are discussed below in the context of relevant studies from the literature. Phthalates are primarily used as plasticizers in automotive, building materials and consumers products 59 and known endocrine disrupting compounds that disrupt fetal testosterone and insulin-like hormone 3 (insl3) synthesis by an unknown mechanism 60 . In our study, mono(hydroxyisononyl) phthalate (MHINP)-a major oxidative metabolite of diisononyl phthalate (DINP)-was negatively associated with one steroid hormone metabolite (11β-hydroxyprogesterone). Epidemiological studies have reported that phthalate exposure is associated with decreased levels of steroid hormones in males and females [61][62][63][64] . Meeker and Ferguson observed a suggestive negative association between DINP exposure and serum testosterone in women 40-60 years of age enrolled in NHANES 63 . Several phthalates affected steroid hormone synthesis pathways in human adrenocortical carcinoma cell lines (H295R assay), and different phthalates affected different parts of the pathway 65 . Phthalate exposure has also been linked with inflammation and oxidative stress [66][67][68][69][70] . Among phthalate metabolites associated with lipid peroxidation in our study, urinary levels of mono-(2-ethyl)-hexyl phthalate (MEHP) [metabolite of di(2ethylhexyl) phthalate (DEHP)] have been positively associated with increased serum levels of gamma glutamyltransferase (GGT)-a sensitive biomarker of oxidative stress-while mono-isobutyl phthalate (MiBP) [metabolite of dibutyl phthalate (DBP)] was linked to serum levels of CRP in participants enrolled in NHANES 66 .
This study has several limitations. First, results are correlative in nature and from a small sample size. Second, due to the cross-sectional study design, we are unable to determine whether higher exposures to these chemicals precede changes in levels of endogenous metabolites or are a consequence of metabolic perturbations due to changes in health conditions, diet or exposures to other environmental factors. Third, because no health outcomes were measured in the occupational cohort, exposure-metabolite-outcome associations were tested in another cohort (i.e. NHANES). These two populations probably exhibit significant differences in terms of chemical exposure levels. For example, median serum concentrations of PFHxS in both FF (4.05 ng/mL) and OW (2.55 ng/mL) were twice as high as those measured in NHANES (1.6 ng/mL for cycles 2003-2014) 20 . As such, our NHANES analysis may underestimate or overestimate the associations between chemical exposures and hypothesized health outcomes (e.g. MetS or inflammation) in the occupational cohort. Fourth, due to the LC-HRMS analysis in negative ionization mode, most of endogenous molecules tentatively identified are lipids or hormones. Also, we only identified potential chemicals in the cohort by using an in-house database of 700 environmental chemicals 12 , and so we may have missed other chemical-metabolite interactions that are present in the cohort. We may have missed other important molecules that are more easily detected in positive ionization mode, such as amino acids or vitamins, and involved in other metabolic pathways that would have generated additional exposure-outcome hypotheses.
Despite these limitations, the present study has several strengths. First, this study employed a data-driven approach that combines simultaneous measurements of a wide spectrum of environmental chemicals and endogenous metabolites detected in serum to generate hypotheses related to possible health effects of chemical exposures. This approach provides a comprehensive exploration of the impact of chemicals, individually or in combination, on critical metabolic processes. Second, exposomics and metabolomics data were combined using GGMs that provide identification of direct associations between chemicals and metabolic pathways. Pearson correlation coefficients are generally high in omics data and may lead to identification of many indirect associations that are not biologically relevant. GGMs that are based on partial correlation coefficients provide an estimate of the conditional dependencies between variables and limit the selection of indirect associations. Third, some hypotheses generated from the non-targeted GGM analyses were tested using data from a representative sample of the U.S. population (i.e. NHANES) comprising adult women from diverse racial/ethnic background and socio-economic status, and some of these findings were consistent. For example, analysis of NHANES data confirmed associations found in GGMs between exposures to PFHxS and PFOS and cholesterol metabolism and inflammation. Fourth, GGMs revealed that exposures to previously unstudied chemicals (e.g. 4-hexyloxyphenol or certain phthalates metabolites) may potentially have adverse effects on critical metabolic pathways such as metabolism of arachidonic acid and linoleic acid involved in lipid peroxidation and inflammatory and immune responses as well as steroid hormone biosynthesis. www.nature.com/scientificreports/

Conclusion
In conclusion, we used GGMs, which have previously been shown to identify high partial correlations between endogenous small molecules that correspond to known metabolic reactions, to identify associations between tentatively identified exogenous chemicals and endogenous molecules in serum from a cohort of California women firefighters and office workers. These GGMs revealed many exposure-metabolite associations, including that exposures to mono-hydroxyisononyl phthalate, ethyl paraben and 4-ethylbenzoic acid were associated with metabolites involved in steroid hormone biosynthesis, and PFASs were linked to BAs and inflammatory signaling molecules. Some hypotheses generated from these findings were further supported by analysis of data from NHANES, specifically the cross-sectional associations between some PFASs and cholesterol metabolism and inflammation. However, many hypotheses generated from this work were not further tested because chemical exposures and/or health outcomes were not measured in NHANES. Taken together, our findings demonstrate a novel approach to discovering associations between chemical exposures and upstream biological processes potentially involved in disease. We encourage further studies to apply these techniques in other cohorts and to further evaluate associations between chemical exposures and health outcomes that we have reported here.

Methods
Study population. This study was conducted using LC-QTOF/MS data available from an established cohort of California women workers known as the Women Firefighter Biomonitoring Collaborative (WFBC). The WFBC is a cross-sectional study designed to measure and compare exposures to potential breast carcinogens and other endocrine disrupting compounds (EDCs) in 69 San Francisco (SF) women firefighters (FF) and 74 female controls among office workers (OW) from the City of San Francisco. Detailed description of this cohort has been published elsewhere 12,20 . Demographic characteristics and breast cancer risk factor information, including menopausal status, history of hormone replacement therapy, and reproductive history were collected using questionnaires during in-person interviews. Body Mass Index (BMI; kg m −2 ) was calculated from participants' height and weight measured at the time of the in-person interview. A 50 mL blood sample was collected by a certified phlebotomist. Written informed consent was obtained from all participants. All research was performed in accordance with relevant guidelines/regulation. The study following protocols were approved by the Institutional Review Board of the University of California, Berkeley (# 2013-07-5512).
Overall, the FF and OW cohorts were similar in terms of age, race/ethnicity, BMI, parity, and hormone use, as previously reported 12 . However, the household income for women FF was significantly higher compared to OW. There were significantly more pre-menopausal women in the FF group and women FF had a higher proportion with a 4-year college as their highest degree, while a higher proportion of OW had additional degrees beyond 4-year college degrees.

Non-targeted LC-QTOF/MS metabolomics analysis.
Non-targeted analysis of serum samples was performed as previously described 71 . Briefly, 250 µL of serum sample was spiked with 2.5 µL of 1 µg/mL of internal standard (2.5 ng BPA-d16) and centrifuged at 3000 rpm for 10 min. Analytes were extracted using solidphase extraction (SPE; Waters Oasis HLB 10 mg, 1 cc). Extracts were dried under a stream of nitrogen gas and reconstituted in 250 µL of 10% methanol.
Extracts were analyzed on a LC-QTOF/MS system consisting of an LC 1260 autosampler, pumps and a QTOF/MS 6550 (Agilent, Santa Cruz, CA, USA). Analytes were separated by a reversed-phase method using a C18 column (Agilent Poroshell 120, 2.1 mm × 100 mm, 2.7 µm particle size) maintained at 55 °C. Mobile phase A consisted of water with 0.05% ammonium acetate (pH = 7.8) and mobile phase B consisted of methanol with 0.05% ammonium acetate (pH = 7.8). The elution gradient employed was: 0-0.5 min, 5% B; 1.5 min, 30% B; 4.5 min, 70% B; 7.5-10 min, 100% B; 10.01-14 min, 5% B. The injection volume was 50 µL. Analyses were performed with a QTOF/MS operating in negative electrospray ionization mode (ESI-). Ions were collected in the m/z 80-600 range at high resolution for eluates coming out of the LC from 1 to 12 min. Using the Auto MS/MS mode (information-dependent acquisition), a product ion scan (MS/MS) of the three most abundant peaks at high resolution was triggered each time a precursor ion with an intensity of ≥ 500 counts/second was generated in the TOF-MS scan using a collision voltage ranging from 0 to 40 V depending of ions m/z. The LC-QTOF/MS analysis produces a total ion chromatogram for each sample, which includes the following: the accurate mass of each unique compound (expressed as m/z of their respective anion), peak area, retention time and spectral data on the parent and fragment ions, including isotopic pattern. Data processing. Exposome annotation. Chemical exposures were identified from non-targeted LC-QTOF/MS data as previously described 12,71 . Briefly, all detected m/z were matched with those from an in-house MS database of environmental chemicals with a mass tolerance value of 10 ppm. The in-house database consists of more 700 chemicals: (1) environmental organic acids including parabens and paraben metabolites, phthalates and phthalate metabolites and pesticides and pesticide metabolites; (2) chemicals that increase breast cancer risk including mammary carcinogens and mammary gland developmental disruptors 72,73 ; (3) known firefightingrelated occupational exposures including perfluorinated compounds found in firefighting foams, polychlorinated and polybrominated dioxins and furans and other flame retardants. A list of tentatively annotated chemicals was generated for all samples, with corresponding exact mass, retention time, mass error, peak area, chemical formula, and match scores. Then, we performed retention time correction using an in-house R script in order to align LC-QTOF/MS data. We assigned the chemicals with the same formula to be two different entities if the difference in retention time of two adjacent chemicals was greater than 0.16 min. The exposome data matrix consisted of peak area and m/z of 620 unique chemicals which matched to 300 chemical formulas 12  www.nature.com/scientificreports/ Metabolome annotation. LC-QTOF/MS metabolomics data were pre-processed using the R package "XCMS" 74 for peak detection, retention time correction and peak alignment (the R script used for data pre-processing can be found in Supplementary Information). After pre-processing, a data matrix containing retention time, massto-charge ratio (m/z) and intensity of features was generated. Metabolomics features were annotated using the R package "xMSannotator" 75 . We used the Human Metabolome Database 3.6 (HMDB) that contains detailed molecular information about 42,632 small molecules 76 as the reference library for annotation of metabolomics features. We retained only annotated molecules with a confidence score equal to 3 (the highest score of confidence) and classified by HMDB as previously detected and quantified in biological matrices (the R script used for annotation of metabolomics features can be found in Supplementary Information). After this process, the metabolome data matrix contained m/z and intensity of 90 annotated molecules (Supplementary Table S5). The identification level of environmental chemicals and endogenous metabolites was reported as proposed by the Metabolic Standards Initiative 21 : level 1: match based on accurate mass (± 10 ppm), fragmentation pattern and relative retention time with authentic standards; level 2: match based on accurate mass (± 10 ppm) and fragmentation pattern using mass spectra from public metabolomics libraries or in-silico fragmentation software 77 .; level 3: match based on accurate mass (± 10 ppm) using public metabolomics libraries. Most small molecules tentatively identified in this study corresponded to level 2 or level 3 annotation. Eight environmental chemicals were validated as level 1 annotation (Supporting Information Table S1).
Statistical analysis. Gaussian graphical modeling (GGM). GGMs were built by combining the exposome data matrix with the metabolome data matrix. We first excluded molecules with more than 50% missing values. The filtered data matrix contained 69 samples × 145 molecules (55 environmental chemicals and 90 endogenous metabolites), 74 samples × 139 molecules (49 environmental chemicals and 90 endogenous metabolites), and 143 samples × 142 molecules (52 environmental chemicals and 90 endogenous metabolites) for women FF, OW and the whole cohort, respectively. Missing values imputation, data normalization and transformation were conducted using MetaboAnalyst 3.0 78 . Remaining missing values were imputed using the k-nearest neighbor (KNN) method 79 . Exposome data (i.e. peak area) and metabolome data (i.e. metabolite intensity) were normalized, separately, by the sum of peak area or peak intensity of each sample (i.e. sample normalization. Peak area or peak intensity of each molecule was divided by the sum of peak area or peak intensity of each sample) to reduce analytical variations. Then, normalized data were generalized log-transformed (i.e. feature normalization), and exposome data and metabolome data were merged into one dataset. . We computed GGMs using the ggm.estimator.pcor function from the R package "GeneNet" 80 . We considered partial correlations between two molecules to be significant if the resulting p value was below the False Discovery Rate (FDR) threshold of 0.1 (for FF p < 1.79 × 10 -4 , for OW p < 1.21 × 10 -4 and for the whole cohort p < 1.71 × 10 -4 ). The GGM networks were constructed and visualized with Cytoscape 3.8.0 81 using "organic" as layout. Edges connecting nodes represent significant partial correlations. Partial correlation coefficients were used as the edge attributes.
Testing of exposome-metabolome associations. For each significant exposure-metabolite partial correlation, we used several approaches to attempt to validate the associations by checking whether a similar type of chemicaleffect relationship has been reported elsewhere, for example in National Health and Nutrition Examination Survey (NHANES) or in PubMed.
We used NHANES to further support or not the association found in our occupational cohort if (1) the chemical of interest and (2) one marker of the biological pathway perturbed were measured in at least one NHANES cycle. NHANES is an ongoing cross-sectional study of the civilian noninstitutionalized U.S. population designed to collect data on dietary and health factors (ref, see Supplementary Information for more details). NHANES study protocols were approved by the National Center for Health Statistics' Research Ethics Review Board. Since our study population consists of women workers, we selected only non-pregnant women adults (20-79 years of age) enrolled in NHANES. All NHANES data (demographics, examination data, laboratory data, and questionnaire data) were downloaded using the R package "RNHANES" 82 . Full description of exposure measurements, outcomes and covariates in NHANES can be found in Supplementary Information.
To test our hypothesis that certain environmental chemicals measured in women's serum affect the level of inflammatory signaling molecules, we explored the relationships between serum or urinary concentrations of each chemical and serum markers of inflammation [C-reactive protein (CRP), and the complete blood count which measured the number of white blood cells, lymphocytes, neutrophils, monocytes, eosinophils, basophils and platelets] in adult women 20-79 years of age. Details about measurements of environmental chemicals and markers of inflammation in NHANES can be found in Supplementary Information. Since we were interested only in the effects of chemicals on chronic inflammation, we excluded participants who reported poor health status, or acute infection at the time of examination, including head or chests colds, stomach or intestinal illness, or flu, pneumonia, or ear infection. We further excluded individuals with CRP concentrations > 10 mg/L because these extreme values likely reflect acute inflammation 83,84 . We constructed multivariable linear regression models with natural log-transformed CRP or absolute blood cells count as the dependent variable and one natural logtransformed chemical divided by 100 as a predictor (e.g. [ln(PFOS)]/100).
To test the hypothesis that exposures to PFHxS are associated with metabolic syndrome (MetS) through alteration of bile acid metabolism, we evaluated the relationships between serum PFHxS and MetS in women adults 20-79 years of age. Data were pooled from NHANES 2003-2014 for PFHxS. We excluded participants with fasting time < 8 h. We used the National Cholesterol Education Program's Adult Treatment Panel III report (NCEP/ATPIII) to define MetS 85  www.nature.com/scientificreports/ blood glucose ≥ 100 mg/dL or treatment for diabetes (further details can be found in Supplementary Information. To assess associations between chemical measurements and MetS, we used logistic regression models to estimate adjusted odds ratio (ORs) and corresponding 95% CIs. Serum concentrations of PFHxS were natural log-transformed to address skewness. All models were adjusted for likely sources of confounding, including age (years, continuous), race and ethnicity (non-Hispanic white, non-Hispanic black, Mexican American, multiracial or other), and poverty/income ratio [PIR (the ratio of self-reported family income to the family's appropriate threshold value), divided into tertiles]. When urinary concentrations of chemicals were used as predictors, we further adjusted for urinary creatinine (continuous, natural log-transformed) to adjust for urine dilution. For models with inflammatory markers as dependent variables, we further adjusted for serum cotinine (natural log-transformed, continuous exposure) as a measure of exposure to tobacco smoke. For models with MetS and individual components of MetS as dependent variables, we further adjusted for self-reported physical activity (none, moderate or vigorous), smoking status (never, past or current), and total caloric intake derived from the average of 2-day 24 recalls and divided into quartiles. Association estimates for models with non-persistent chemicals (i.e. ethyl paraben or propyl paraben) as predictors were reported before and after adjustment for body mass index (BMI) since it can be argued that BMI can act as a confounder or a mediator. For chemicals associated with bile acids in GGM networks (i.e. PFHxS), ORs and association estimates were also reported before and after further adjustment for recent use of antibiotics (in the past 30 days, dichotomous) since the gut microbiota is a possible confounder of associations between exposures to these chemicals and outcome variables related to MetS and inflammation.
Data were analyzed using the R package "survey" to obtain estimates of association or ORs and 95% CIs accounting for the complex NHANES sampling design. We also used the weights to adjust for the oversampling of certain population subgroups and to account for non-response and non-coverage in NHANES. When multiple NHANES cycles were combined, we recalculated new sample weights for each participant by dividing the 2-year sample weights provided by the number of cycles combined. All tests were two sided, and p < 0.05 was the level of significance. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.