An exposome-wide association study on body mass index in adolescents using the National Health and Nutrition Examination Survey (NHANES) 2003–2004 and 2013–2014 data

Excess weight is a public health challenge affecting millions worldwide, including younger age groups. The human exposome concept presents a novel opportunity to comprehensively characterize all non-genetic disease determinants at susceptible time windows. This study aimed to describe the association between multiple lifestyle and clinical exposures and body mass index (BMI) in adolescents using the exposome framework. We conducted an exposome-wide association (ExWAS) study using U.S. National Health and Nutrition Examination Survey (NHANES) 2003–2004 wave for discovery of associations between study population characteristics and zBMI, and used the 2013–2014 wave to replicate analysis. We included non-diabetic and non-pregnant adolescents aged 12–18 years. We performed univariable and multivariable linear regression analysis adjusted for age, sex, race/ethnicity, household smoking, and income to poverty ratio, and corrected for false-discovery rate (FDR). A total of 1899 and 1224 participants were eligible from 2003–2004 and 2013–2014 survey waves. Weighted proportions of overweight were 18.4% and 18.5% whereas those for obese were 18.1% and 20.6% in 2003–2004 and 2013–2014, respectively. Retained exposure agents included 75 laboratory (clinical and biomarkers of environmental chemical exposures) and 64 lifestyle (63 dietary and 1 physical activity) variables. After FDR correction, univariable regression identified 27 and 12 predictors in discovery and replication datasets, respectively, while multivariable regression identified 22 and 9 predictors in discovery and replication datasets, respectively. Six were significant in both datasets: alanine aminotransferase, gamma glutamyl transferase, segmented neutrophils number, triglycerides; uric acid and white blood cell count. In this ExWAS study using NHANES data, we described associations between zBMI, nutritional, clinical and environmental factors in adolescents. Future studies are warranted to investigate the role of the identified predictors as early-stage biomarkers of increased BMI and associated pathologies among adolescents and to replicate findings to other populations.


Materials and methods
Study population. We used the National Health and Nutrition Examination Survey (NHANES) data collected in the 2003-2004 and the 2013-2014 waves. We included participants between 12 and 18 years old at the time of the interview, who reported not being pregnant and not being told by a doctor or health professional they have diabetes, with non-missing BMI value. The selection of two surveys 10 years apart aimed to allow the description of exposome profile variations among adolescents in the United States.
The NHANES is a program of studies conducted every 2 years by the U.S. Center for Disease Control and Prevention (CDC) in a nationally representative multistage probability sample of non-institutionalized population living in the United States 23 . Each of the cross-sectional datasets includes demographic, dietary and healthrelated questions collected during the interview, in addition to laboratory tests, medical, dental and physiological measurements collected during the physical examination at the mobile examination centers 23 . All methods were performed in accordance with the relevant guidelines and regulations. The NCHS Ethics Review Board approved the survey protocols and written parental informed consent and child assent for participants ages 12-18 years were obtained for examination at the mobile examination center 24 .
Study outcome. The study outcome was age-and sex-standardized body mass index (BMI) z-scores calculated using standard deviation scores from the U.S. CDC 2000 growth chart 25 . To describe weight status category, we used the 2010 CDC terminology for children overweight and obesity. The BMI z-score percentiles were used to classify children as being "underweight" (< 5th percentile); "healthy weight" (5th percentile < BMI z-score < 85th percentile); "overweight" (85th percentile ≤ BMI z-score ≤ 95th percentile); and "obese" (> 95th percentile) 26 .
Selection of explanatory variables. We screened the data files from each of the five NHANES sections (i.e., demographics; dietary; examination; laboratory; and questionnaire data) of 2003-2004 and 2013-2014 surveys. In the initial screening, we selected variables: (i) available in both surveys with similar response format; (ii) targeting adolescents (12-18 years old); (iii) being relevant to the objective of the study (i.e., excluding: language spoken at home, audiometry examinations, dermatology, etc.); and (iv) for the laboratory variables, we included only parameters measured in individual samples (i.e., excluding brominated flame retardants (BFRs) that were measured in pooled samples). As a result, we retained 139 variables belonging to the following NHANES sections: laboratory data (75, i.e., clinical and biomarkers of environmental chemical exposures), dietary (63), and questionnaire (1) (Fig. 1  www.nature.com/scientificreports/ Laboratory variables comprised the following measurements: serum and urinary albumin, serum cotinine, standard biochemistry, complete blood count and glycohemoglobin (n = 45); urinary measurements of phthalates (n = 9); perfluoroalkyl and polyfluoroalkyl substances (polyfluoroalkyl chemicals) (n = 7); urinary polyaromatic hydrocarbons (n = 6); blood cadmium, lead and mercury (n = 3); urinary environmental phenols (n = 1); urinary arsenic (n = 1); urinary iodine (n = 1); urinary mercury (n = 1); urinary perchlorate (n = 1). The 63 dietary variables referred to those collecting information about the dietary intake consumed during the 24-h period prior to the interview. The physical activity variable was retained from the questionnaire section.
Variables transformation. All variables were evaluated as either continuous or categorical. For dietary and laboratory continuous variables, zero values were substituted with 1e −10 ; and all values were log-transformed, then scaled and centered. When necessary, categorical variables were recoded for a harmonized coding between the two survey datasets. For physical activity, a positive answer to vigorous physical activity "over past 30 days" (in 2003-2004 dataset) and "during a typical week" (in 2013-2014 dataset) were treated similarly. Particularly, participant's educational level was categorized according to NHANES coding available in 2003-2004 demographic questionnaire: "Less than high school", "High school including GED" and "More than high school". We kept the grouping of race/ethnicity as included in NHANES questionnaire: Mexican American, Non-Hispanic White, Non-Hispanic Black, other Hispanic and other ethnicities. Age and poverty income ratio (PIR)-a ratio of family income to poverty threshold-was treated as a continuous variable. Household smoking was coded as "Yes/No"; with an answer of 1 or more smoking household members recoded as "Yes" in the 2013-2014 dataset to match the variable assessment in the 2003-2004 dataset.

Statistical analysis.
To account for the complex survey design, non-response and post-stratification, survey-weighted analysis was conducted using the appropriate sample weight as per NHANES analytical guidelines 27 . For each survey dataset, we created survey design datasets comprising the variables retained from each section along with their assigned sample weight: the demographic variables with the interview weight (wtint2yr); the laboratory variables and the physical activity with the Mobile Examination Center (MEC) exam weight (wtmec2yr); and the dietary variables with the dietary day 1 sample weight (wtdrd1). Specifically, for 2003-2004 dataset, polyfluoroalkyl, arsenic and mercury variables using sub-sample A weight (wtsa2yr); phthalates and polyaromatic hydrocarbons variables using sub-sample B weight (wtsb2yr); environmental phenols, iodine and perchlorate variables using sub-sample C weight (wtsc2yr). As for 2013-2014 dataset, sub-sample A weight was used for arsenic, iodine, urine mercury, polyaromatic hydrocarbons and perchlorate variables; subsample B weight was used for environmental phenols, polyfluoroalkyl chemicals and phthalates data. For cadmium, lead and blood mercury variables, the MEC exam weight (wtmec2yr) and blood metal weight (wtsh2yr) were used for 2003-2004 and 2013-2014 datasets, respectively. For correlation between laboratory (standard biochemistry) and dietary explanatory variables, we applied the least common denominator approach and hence we used the weights of the dietary variables.
We considered the 2003-2004 dataset for discovery of associations between study population characteristics and zBMI, and replicated the analysis on the 2013-2014 dataset. Descriptive statistics and correlations between all explanatory variables were calculated separately for the discovery and replication datasets. Survey-weighted regressions were conducted to explore the associations between each of the explanatory variables and BMI z-score (univariable) and after adjusting for age, sex, race/ethnicity, household smoking and income to poverty ratio www.nature.com/scientificreports/ (multivariable). We applied the Benjamini-Hochberg method 28 to generate false discovery rate (FDR) adjusted p-values separately for the univariable and multivariable analyses per dataset (FDR correction). Predictors with FDR adjusted p-value < 0.05 were considered significant. Next, we identified predictors that were commonly significant in the multivariable analysis of both the discovery and replication datasets.
To evaluate the possible interaction between sex, the statistically significant predictors retained in the previous step and zBMI, we repeated the multivariable analysis by adding the interaction term, separately for the discovery and replication datasets.
Additionally, a sensitivity analysis was conducted by excluding adolescents classified as obese and repeating univariable and multivariable regression on each of 2003-2004 and 2013-2014 datasets. All of the analysis was conducted using R version 4.1.2 29 , and the R studio version 2022.02.1 30 . The scripts used in the analysis as well as the detailed output of the analysis are available in the Supplementary Information. All survey-weighted descriptive and explanatory analyses were conducted using R survey package [31][32][33] . Data, scripts and outputs are available here: https:// doi. org/ 10. 5281/ zenodo. 65589 79.
Ethics approval and consent to participate. The NCHS Ethics Review Board approved the survey protocols and informed consent was obtained for all subjects.

Consent for publication.
No individual data that consent for publication shall be acquired is present in this manuscript.  (Table 1). No statistically significant (p > 0.05) differences were noted in the distribution of BMI categories between sexes in both survey waves (Supplementary Information S1).

Exploratory ExWAS analysis using the 2003-2004 and 2013-2014 dataset. The univariate
zBMI regression models for each of the 139 variables in the 2003-2004 dataset resulted in 54 predictors with a p-value < 0.05, of which only 27 were significant after FDR correction: 18 variables of laboratory variables (alanine aminotransferase (ALT); albumin; bicarbonate; total bilirubin; cholesterol; gamma glutamyl transferase (GGT); serum iron lactate dehydrogenase; lymphocyte number; mean cell hemoglobin; mean cell volume; platelet count; red blood cell count; segmented neutrophils number; sodium ; triglycerides; serum uric acid, and white blood cell count) and 9 dietary variables (folate as dietary folate equivalents; iron; lutein and zeaxanthin; riboflavin (vitamin B2); thiamin (vitamin B1); total folate; total sugars; vitamin B6, and retinol). All significant predictors (FDR adjusted p-value < 0.05) were negatively associated with zBMI, except for the following: ALT; cholesterol; GGT; lactate dehydrogenase; number of lymphocytes; platelet count; red blood cell count; segmented neutrophils number; triglycerides; serum uric acid, and white blood cell count (Supplementary Information S2).
Multivariable regression resulted in 45 predictors below the p-value cutoff of 0.05 of which only 12 significant after FDR correction: 11 laboratory variables (ALT; GGT; mean cell hemoglobin; mean cell volume; phosphorus; platelet count; red blood cell count; segmented neutrophils number; triglycerides; serum uric acid; white blood cell count) and 1 dietary variable (riboflavin (Vitamin B2)). Eight of these significant predictors were positively associated with BMI z-score (ALT; GGT); platelet count; red blood cell count; segmented neutrophils number; triglycerides; uric acid and white blood cell count) ( Table 2).
For the replication dataset (2013-2014), the univariable regression models resulted in 39 predictors with a p-value < 0.05 of which only 22 were significant after FDR correction: 17 laboratory variables (ALT; albumin; GGT; globulin; serum iron; lactate dehydrogenase; lymphocyte number; mean cell hemoglobin; mean cell volume; monocyte number; platelet count; red cell distribution width; segmented neutrophils number; total bilirubin; triglycerides; serum uric acid; white blood cell count); 2 PAHs (1-hydroxyphenanthrene; 2-hydroxynaphthalene); 2 polyfluoroalkyl (perfluorodecanoic acid; perfluorobutane sulfonic acid) and 1 dietary variable (total sugars) (Supplementary Information S2). Multivariable regression resulted in 27 significant predictors of which only 9 laboratory variables were significant after FDR correction: 2-hydroxynaphthalene; ALT; GGT; lactate dehydrogenase; monocyte number; segmented neutrophils number; triglycerides; serum uric acid and white blood cell count ( Table 2). All of these 9 variables had a positive significant association with zBMI. Volcano plots showing the distribution of the model estimates per predictor used in the ExWAS analysis are available in Figures 3 and 4.
Six variables were commonly significant after FDR correction (FDR-adjusted p-values < 0.05) in multivariable analysis of both the discovery and replication datasets: ALT; GGT; segmented neutrophils number; triglycerides; serum uric acid and white blood cell count. In the second model for multivariable analysis that accounts weak coefficients were found overall; a maximum coefficient of r = 0.217 was noted for the correlation between selenium and blood urea nitrogen. Looking specifically at the correlation between serum uric acid and both laboratory and dietary predictors; the correlation between uric acid and each of serum creatinine and hemoglobin showed the highest coefficients of 0.48 and 0.44, respectively, followed by a coefficient of 0.33 for the correlation between uric acid and GGT. On the other hand, the correlation between GGT and ALT had a coefficient of 0.52. A strong correlation was found between mean cell hemoglobin and mean cell volume (r = 0.94). A medium correlation was found for the following associations: albumin and total calcium (r = 0.56); albumin and total protein (r = 0.56); total bilirubin

Discussion
We conducted an exposome-wide association study exploring 139 explanatory variables with respect to sexspecific BMI-for-age in non-diabetic and non-pregnant adolescents aged 12-18 years old in the 2003-2004 NHANES survey and used the 2013-2014 survey for validation. After adjusting for age, sex, race-ethnicity, household smoking, and poverty income ratio, we found ALT, GGT, segmented neutrophils number, triglycerides, serum uric acid and white blood cell count to have statistically significant associations with zBMI after adjusting for FDR in both the discovery and replication datasets, being in line with literature findings for adolescents. Uric acid is the end product of purine metabolism in humans and its magnitude depends on dietary purines (from animal proteins, meat, seafood, beer and fructose sources), the degradation of endogenous purines as well as the renal and intestinal excretion of urate 34 . Although serum uric acid levels increase differently by sex from birth till adolescence, there is still no universally accepted threshold for defining hyperuricemia, or excess concentrations of serum uric acid in children and adolescents 35 . Elevated levels of uric acid have been associated with obesity and non-communicable diseases, such as kidney and cardiovascular diseases in children and adolescents 35,36 . Recent literature emphasized the association between uric acid and metabolic syndrome (MetS) outcomes in children and adolescents, such as glucose intolerance, central obesity, hypertension, and dyslipidemia [36][37][38] . Uric acid was associated with the prevalence of metabolic syndrome and its components, as shown in a cross-sectional analysis of 1370 adolescents (12-17 years of age) using data from NHANES 1999-2002; the unweighted prevalence of metabolic syndrome was ≈ 21% in the highest quartile (> 339 μmol/L) as compared to ≈ 10% in the third quartile (≤ 339 μmol/L), ≈ 4% in the second quartile (≤ 291 μmol/L) and < 1% among participants in the unweighted lowest quartile of serum concentrations of uric acid (≤ 250 μmol/L) 39     www.nature.com/scientificreports/ acid in both discovery (estimate = 0.452) and replication (estimate = 0.707) datasets. After excluding non-obese participants, the association between zBMI and uric acid was no longer statistically significant, although it still showed the strongest effect size in both discovery and replication subsets (adjusted estimates of 0.24 and 0.39, respectively). Thus, our findings highlight the pathogenic role of elevated concentrations of uric acid in young obese age groups, as showcased in different studies. In a case-control study conducted in Italy among 120 children and adolescents with primary obesity (zBMI ≥ 97th percentile) and 50 healthy controls, carotid intimamedia thickness was significantly correlated (r = 0.61; 95% CI 0.58-0.64) with the fourth quartile of uric acid among obese children regardless of the presence of metabolic syndrome, defined in the study as ≥ 3 or more of the following criteria: obesity, hypertension, low HDL cholesterol, elevated triglycerides, and impaired fasting glucose and/or insulin resistance 40 . On the other hand, the association between serum uric acid and cardiovascular www.nature.com/scientificreports/ diseases, irrespectively of BMI, has also been documented in a study conducted on an 1999-2006 unweighted NHANES sample of 12-17 years old adolescents; after adjusting for age, sex, race/ethnicity and BMI, the odds of having elevated blood pressure (mean systolic and/or diastolic blood pressure percentile ≥ 95th percentile) was 1.38 (95% CI 1.16-1.65) for each 0.1 mg/dL increase in uric acid 41 . In a randomized, double-blinded trial among pre-hypertensive obese adolescents (11-17 years old), patients treated with two mechanisms of urate reduction (allopurinol and probenecid) did not continue to gain weight during the 3-months study period and showed a similar and significant reduction in their systolic blood pressure by 10.2 mmHg and their diastolic by 9.0 mmHg in the two treatment groups as compared to the placebo group; thus, highlighting the role of uric acid as a biochemical mediator of increased blood pressure 42 .
The observed association of both uric acid and GGT with obesity and other cardiovascular risk factors has been previously documented in the literature. In a cross-sectional study of 2067 children and adolescents (6-20 years) in Hong Kong, a combined effect of the upper quartiles of both uric acid and GGT on obesity, low high-density lipoprotein cholesterol (HDL-C) levels and high blood pressure (adjusted odds ratios ranged from 1.63 to 5.82, all p < 0.005) was documented 37 . GGT, a liver enzyme implicated in the degradation of glutathione is associated with BMI, total cholesterol, diabetes mellitus (all components of MetS) as well as cardiovascular disease and all-cause mortality in adults 43 . The correlation between GGT and MetS and hypertension among younger age groups was demonstrated in a 10-year longitudinal study in Taiwan, where subjects (10-15 years) with higher baseline levels of GGT were at least twice more likely to develop MetS and hypertension during the follow-up period 44 . In our analysis, we showed a positive correlation, albeit weak, between uric acid and GGT in both discovery and replication datasets, without adjustment for zBMI.
Also, ALT had a statistically significant association with zBMI in the multivariable analysis using both the discovery and replication datasets. ALT is a liver enzyme related to fat liver accumulation and considered a useful biomarker for non-alcoholic fatty liver disease (NAFLD) 45 . The association between serum ALT and zBMI was previously documented in a study among adolescents (12-18 years) from NHANES III (1988)(1989)(1990)(1991)(1992)(1993)(1994), in which overweight and obese study subjects were three to six times more likely to have higher levels of ALT (> 30 U/L) as compared to those with normal weight 46 . Similarly, a significant correlation between ALT, zBMI and metabolic syndrome was found among 5411 adolescents aged 12-19 years from NHANES 1999-2014; yet, with no significance increase in the prevalence of increased ALT over time 47 . On the other hand, elevated serum ALT levels (> 40 U/L) were also associated with markers of metabolic syndrome, as demonstrated in a study among adolescents 10-19 years old from the Korean National Health and Nutrition Examination Survey in 1998 48 .
Elevated triglycerides or hypertriglyceridemia is common among obese children and adolescents, and this component of metabolic syndrome is a known biomarker of cardiovascular disease risk 49 . In our analysis, a positive association between triglycerides and zBMI was found in each of the discovery (estimate = 0.285) and replication datasets (estimate = 0.444), but not in the sensitivity analysis in non-obese adolescents. This positive association found in our ExWAS study between triglycerides and zBMI is in line with the findings of a study on abnormal lipid levels among adolescents (12-19 years) in NHANES 1999NHANES -2006, in which 22% of overweight and 43% of obese had at least one abnormal lipid level including elevated triglycerides, the most common lipid abnormality associated with excess weight 50 .
Our analysis also showed the association between zBMI and inflammatory markers, such as white blood cell count and segmented neutrophils number. Such findings were also documented among adolescents [51][52][53] , suggesting that obesity-induced inflammation could start in childhood 54 .
None of the dietary variables remained significant at multivariable analysis in the discovery and replication datasets. Additionally, weak correlations were found between the laboratory and dietary variables in each of the discovery and replication datasets. On the other hand, none of the biomarkers of environmental chemical exposures remained significant after FDR correction at multivariable analysis in the discovery and replication datasets. Particularly, 2-hydroxynaphthalene was no longer significant at multivariable analysis in the replication dataset after FDR correction. Although the implicated biological mechanisms remain unclear, further studies are warranted to better understand the role of PAHs and other environmental pollutants in obesity pathogenesis.
The associations found between uric acid, GGT or ALT with zBMI using the whole study population were no longer statistically significant in the sensitivity analysis that included non-obese adolescents. This observation warrants for further investigation on the potential use of these biochemical parameters as biomarkers in the early stages of obesogenesis in adolescence or childhood. The sex specific trends observed in the association between the three aforementioned biochemical parameters and zBMI are also worth of detailed investigation in other population studies as they might be useful in future obesity screening and prevention programs in adolescence and/or earlier life stages.
The strength of this study lies in the agnostic nature of the ExWAS approach which allows for the simultaneous assessment of multiple parameters and their associations with different outcomes. The NHANES dataset is considered representative of the U.S. population; in effect, the weighted estimates of overweight and obesity prevalence in this U.S. study population (12)(13)(14)(15)(16)(17)(18) 55 . The created models are considered as robust, being tested on two NHANES datasets, 10 years apart. Another strong feature of this ExWAS study was the inclusion of variables belonging to all exposome domains; the general external (individual household income), the specific external (dietary variables; education; household smoking and physical activity) and the internal domain (intrinsic and laboratory variables). Yet, in order to fully explore the exposome's utility, it is encouraged to include additional groups of environmental components in relation to the studied health outcome 17 .
Due to the cross-sectional study design of NHANES, causal associations cannot be established and although the approach was as inclusive as possible, not all exposome parameters were available or could be included, www.nature.com/scientificreports/ e.g. chemical exposure data were not fully available in these NHANES surveys. On the other hand, the lack of observed associations between dietary factors and BMI might be interpreted by the use of a self-reported recall of the food items consumed during the past 24 hours; which might be subject to recall bias, under-reporting, over reporting, or omission of foods 56 . Additionally, our analysis did not account for the "dietary recall status" variable and hence subjects classified as not having a "reliable recall" were not excluded. In this study, we used only dietary intake measurements for 1 day as an adequate metric of describing population mean nutrient intake. Earlier studies have shown no significant differences between the 2 days' dietary intake measurements available in NHANES for estimating population means 57,58 . However, multiple measures of daily dietary intake are recommended, particularly in exposome studies, to ensure sufficient capture of the within-and between-subject variability 59 . Another possible limitation is the use of only two surveys out of the total available NHANES year surveys; ExWAS studies integrating additional NHANES datasets as well as a bigger number of environmental exposure variables are warranted to improve our knowledge in the environmental determinants of obesogenesis. Moreover, we focused on the analysis of parameters such as dietary habits that are more directly linked with obesity and not in the analysis of external drivers of variability in these parameters such as mental health, food security, socioeconomic determinants. Although the poverty income ratio was used in the models as a proxy of socioeconomic status, more detailed exposome domain assessments including external parameters in the future will allow for the identification of effective population programs/interventions that tackle adolescent obesity. Adolescence is a critical life window of susceptibility to metabolic diseases, such as, overweight and obesity during which ongoing children's development may be perturbed by a suite of environmental stressors, including lifestyle/behavioral factors and dietary habits 58 . The methodological framework of the human exposome and its tools allow for a comprehensive assessment of multiple factors with respect to disease outcomes through an agnostic, untargeted and hypothesis-generating approach. The NHANES-based discovered and replicated predictors of zBMI among U.S. adolescents seem to be in line with the global literature and further highlight their importance as potential early-stage biomarkers of excess weight. Additional studies at younger age groups are warranted to better explore a broader suite of lifestyle, dietary and environmental determinants in association with these biomarkers of effect and their implications with metabolic disease process.

Data and materials availability
The R code, scripts and output are made publicly available and they have been submitted on the Journal's portal. There is supplementary visualization material based on these datasets, which can be easily customized to accommodate other NHANES surveys.

Code availability
The code used for the analysis is available in the Supplementary Material. Data, scripts and outputs are available here: https:// doi. org/ 10. 5281/ zenodo. 65589 79.