A scalable workflow for the human exposome

Complementing the genome with an understanding of the human exposome is an important challenge for contemporary science and technology. Tens of thousands of chemicals are used in commerce, yet cost for targeted environmental chemical analysis limits surveillance to a few hundred known hazards. To overcome limitations which prevent scaling to thousands of chemicals, we developed a single-step express liquid extraction (XLE), gas chromatography high-resolution mass spectrometry (GC-HRMS) analysis and computational pipeline to operationalize the human exposome. We show that the workflow supports quantification of environmental chemicals in small human plasma (200 µL) and tissue (≤ 100 mg) samples. The method also provides high resolution, sensitivity and selectivity for exposome epidemiology of mass spectral features without a priori knowledge of chemical identity. The simplicity of the method can facilitate harmonization of environmental biomonitoring between laboratories and enable population level human exposome research with limited sample volume.


Introduction
Humans have cumulative lifelong exposure to a million or more commercial, occupational and environmental chemicals ( Figure 1A). Forty-seven percent of the 86,405 chemicals registered with the United States Toxic Substances Control Act (TSCA) inventory as of June 2020 are actively manufactured, processed or imported (1), and each of these has manufacturing impurities and conversion products. Mass spectrometry (MS) provides a powerful chemical analysis platform, and targeted assays are available or possible for almost any chemical. Major, unmet analytical challenges exist for exposome research, however, as a consequence of the number of environmental chemicals and metabolites, chemical diversity, low abundance (2) and lack of readily available authentic standards (3)(4)(5). As a result, few are routinely biomonitored in humans, and many of the commercial chemicals, along with legacy pollutants from prior commercial uses, biotransformation products and impurities, exist as "dark matter" of the human exposome (2).
We focus here on an analytical workflow to operationalize untargeted environmental biomonitoring to gain information on known as well as unknown exposures for human exposome research. In contrast to targeted MS analysis, which is developed to measure specific chemicals, untargeted exposome analysis includes measures of known chemicals which are "identified" by MS criteria, and also other MS signals which are unidentified because they have not been associated with known chemicals by MS criteria (6). These un-identified signals also include chemical contaminants that are known and uncharacterized, as well as reaction products that are effectively unknown to science; capability to measure these unidentified chemicals in biologic samples is essential to enable population health studies of the dark matter of the exposome.
To address the challenge to measure large numbers of identified as well as un-identified environmental chemicals in human samples, we sought to develop a workflow with gas chromatography-high-resolution mass spectrometry (GC-HRMS) which minimizes operator and instrument variation and can be applied consistently for untargeted analysis of tens of thousands of samples. GC-coupled analysis is important because many environmental chemicals are hydrophobic, semi-volatile and do not ionize well with popular liquid chromatography (LC)-MS methods. GC-MS is robust, with universally applicable retention time indices and highly reproducible spectra for database development (7,8). The high mass accuracy and mass resolution of GC-HRMS (9) in full scan mode further enhances resolving power to obtain extensive chemical coverage in complex biological matrices. As opposed to more targeted acquisitions with single-ion-monitoring (SIM) or data-independent-acquisitions (DIA), collection of all spectral features enables measurement of known targets based upon libraries of authentic standards, while reproducibly measuring and preserving information for unidentified MS features.
With recognition that GC-HRMS in full-scan mode provides reproducibility, extensive coverage, and quantifiable data, we focused on key obstacles to implementation of GC-HRMS in exposome epidemiology, specifically variability in sample extraction and automated extraction and assembly of the complex data. In LC-HRMS, a single-step sample extraction procedure (10) improved delivery of data following FAIR principles (11), especially interoperability of data, by eliminating differences due to multistep sample processing. As a result, LC-HRMS is transforming environmental health research through delivery of omics scale exposure and biologic response data with improved sensitivity, throughput and affordability (12)(13)(14). In contrast, workflows for targeted GC-MS of environmental chemicals, such as polychlorinated biphenyls (PCBs), polybrominated diphenyl ethers (PBDEs), polybrominated biphenyls (PBBs) and chlorinated pesticides, use multistep processing to remove biologic matrix effects and enrich the targeted chemicals ( Figure 1B) (15,16). Losses of semi-volatile chemicals can occur from dry-down steps following solid-phase extraction (SPE) and liquid-liquid extraction (LLE).
Variable loss and contamination can also occur at each processing step. For targeted analyses, inclusion of stable isotopic internal standards overcomes these limitations. For unidentified MS features, however, variability in recovery or loss cannot be evaluated directly (Figure 1B), limiting use for discovery of unidentified environmental chemicals associated with health outcomes.
In the present study, we developed a single-step sample preparation method, which we term express liquid extraction (XLE), for use with GC-HRMS to minimize recovery variability and provide extensive coverage of both identified and unidentified environmental chemicals.
We evaluated chemical recovery and used the National Institute of Standards and Technology (NIST) Standard Reference Material-1958(SRM-1958 to test quantification of environmental chemicals with stable isotopic standards. We established validity of single-point quantification by reference standardization (17) and show that XLE with GC-HRMS supports quantification of environmental chemicals in diverse human samples, including plasma, lung, thyroid and stool.
We further show a computational workflow which enables untargeted analysis of identified and unidentified environmental chemicals in a form suitable for exposome epidemiology.

Results
Sample preparation for GC-HRMS analysis. Starting initially with a QuEChERS procedure (19,20), we systematically varied solvent composition, volume and extraction time to obtain a simplified procedure with a minimal number of steps and possibilities for contamination and variability in recovery (Figure 2A). In testing these procedures, results showed that at the low levels of abundance in many samples, analyses were subject to contamination by environmental chemicals in solvents and reagents used for QuEChERS, emphasizing the need for blank analyses in quality control (Figure 2A). The final procedure that allowed high reproducibility, minimal contamination, high sample throughput and maximal coverage of chemicals for XLE used formic acid and hexane:ethyl acetate (2:1) with internal standards, vigorous shaking, centrifugation, and transfer of the organic phase to a new tube, which contained pure MgSO4 to remove water. Results of total ion intensity calculated by the sum of all MS peak intensity showed that, the signals in saline extracted by XLE (i.e. method blank) matched the baseline signals found in directly injected isooctane solvent ( Figure 2B).
Validation of XLE quantification using standard reference material. High recovery of Chemicals with less recovery such as p,p'-DDE can be quantified as long as sample processing is consistent between operators and at different processing times. We evaluated interoperability and found that comparable results (no differences shown by all raw P > 0.05, one-way ANOVA) were obtained with the procedure at two different times, 7 months apart, and by two different operators ( Figure 2D).
We evaluated quantification using XLE by testing 54 different chemicals (PCB, PBDEs, chlorinated pesticides) in SRM-1958 using external calibration curves (0.05 to 2 ng/mL) and comparing to the known concentrations (21). We identified all 43 PCBs that are reported in the were reproducibly quantified in this experiment ( Figure 2E and Supplemental Table S1).
Therefore, XLE provides sufficient recovery to support accurate absolute quantification of a broad range of environmental chemicals. Overall, XLE supported measurement of 71 out of the 73 chemicals that are in the ng/kg range in SRM-1958 ( Figure 2F and Supplemental Figure   S1).
Reference standardization for XLE-based exposome analysis. Absolute quantification of chemicals in human samples is often complicated by ion suppression effects of the biologic matrix on chemical detection by mass spectrometry. Stable isotope dilution addresses both recovery issues described above and matrix effects on ionization efficiency and is therefore ideal for quantification in targeted chemical analysis. Use of stable isotopic standards is not practical for untargeted analysis of large numbers of environmental chemicals or for un-identified environmental chemicals. For LC-HRMS analyses, single-point quantification by reference standardization has been established as a useful alternative. To provide this utility for GC-HRMS ( Figure 3A), we tested the ability of reference standardization (18) as a simple and practical approach in untargeted exposomics to estimate chemical concentrations using a single point calibration. We analyzed 20 human plasma samples and performed reference standardization of 17 chemicals based on SRM-1957 and SRM-1958 that were processed in parallel to the plasma samples. The selected chemicals included 7 PCBs, 7 chlorinated pesticides and 3 PBDEs that are detected in human biomonitoring studies (22,23). To provide additional confirmation, the chemicals were quantified by external standard curves (ranging from 0.05 to 2 ng/mL) with recovery efficiency determined relative to spiked internal standards ( Figure 3B).
Among the 20 plasma samples, the measured concentrations from two quantification methods were similar ( Figure 3B), wtih|ρ|>80% for all 17 chemicals, and |ρ|>90% for 14 chemicals ( Figure 3C, Supplemental Table S2). Thus, the results validate use of reference standardization as a simple approach for quantification in a high throughput XLE workflow with GC-HRMS.

Application of the XLE workflow for analysis of environmental chemicals in human
samples. We analyzed 80 archival samples from individuals (57 females, 23 males; aged 41 to 68 y) without known disease or occupational or environmental exposures of concern as a pilot to test the utility of XLE in large-scale human biomonitoring studies. For targeted environmental chemical analysis, we selected 378 chemicals from an in-house database for which dilution conditions (0.05 to 2 ng/mL) were relevant for general-population analyses (Supplemental Table S3). Using a requirement for at least 3 co-eluting accurate mass m/z features ( 5 ppm)  Table S5). The commonly detected chemicals in human plasma were detected less frequently in the lung. For the 11 lungs, p,p'-DDE was detected in eight, PCB-153 in five, PBDE-47 and PCB-138 in four and PCB-180 in three. Although the plasma samples were from non-diseased individuals and the lungs were both diseased and non-diseased individuals, HCA results suggest that environmental chemical profiles in human lung may be very different from plasma. Indeed, quantification showed levels of PCB-18, PCB-28 and HCB, which are relatively volatile, were 3 to 10-fold higher in the lung than the plasma (PCB-18: 0.033 vs 0.004 ng/g, PCB-28: 0.050 vs 0.005 ng/g, HCB: 0.102 vs 0.032 ng/g, Figure 4C), indicating a potential contribution of inhalation exposure to the more volatile environmental contaminants.
In the small number of thyroids that was analyzed with XLE, 14 environmental chemicals were quantified (Supplemental Table S6). The most prevalent was p,p'-DDE, detected in 4 out of 5 thyroid samples, with median concentration (2.20 ng/g). The amounts of individual chemicals were highly variable among the individuals, and the small number of samples precludes any generalization. Nevertheless, HCA of correlation matrix showed high correlation of chemicals measured in the thyroid samples was similar to that in the lung and plasms ( Figure   4B).
Human stool samples, as a noninvasive matrix, have unique value in exposome research (24,25) but have not been extensively studied for environmental chemical exposures. For lipophilic and unabsorbed dietary environmental chemicals, stool is a primary route of elimination (25) and can therefore provide useful information on body burden and clearance of chemicals (24). In a pilot analysis of six human stool samples, we detected 52 and quantified 21 environmental chemicals, with HCB found in all samples (Supplemental Table S7).
Quantification of HCB showed a median concentration of 0.057 ng/g. HCA of correlation matrix showed co-exposures of chemicals are likely as shown in the plasma, lung and thyroid Suspect screening with chemical databases enables collection of information on known chemicals but not on other chemicals, contaminants and transformation products. In studies of LC-HRMS of human plasma, half of the m/z features associated with health outcomes are unidentified (14). In principle, statistical and bioinformatics analyses can be performed on accurate MS features obtained from GC-HRMS without chemical identification, and these features can then be used with index chemicals to define retention to obtain characterization for identification by database searching or deposition into data libraries (Figure 5A).
We tested the feasibility of XLE with GC-HRMS to capture information on unidentified m/z in a usable form for entry into a data library. We selected 4,747 m/z features detected within

Discussion
XLE with GC-HRMS provides a high-resolution exposomics workflow to address a critical need for public health research, namely, an affordable, interoperable method for population research to study health effects of an extensive number of low abundance chemicals to which humans are exposed in a single biological sample. As much as 85% of chronic disease is determined by the exposome (29,30) As previously shown for LC-HRMS data, reference standardization provides a practical approach for quantification relative to pooled reference material that is processed and analyzed concurrently with unknown samples. Reference standardization assumes a linear relationship between analyte concentration and instrument response, which was validated with authentic standards for the 378 chemicals reported here. We showed that this single-point calibration method performs comparably to quantification by traditional six-point calibration for GC-HRMS (Supplemental Table S2) as well as relative to internal stable isotopes. Human specimens other than blood samples are increasingly available from biorepositories and provide important opportunities for exposome research, as they may allow for assessments of exposure within target tissues of toxicity (31,32). Historically, white adipose tissues were sampled as a storage and effector site for persistent environmental pollutants (33).
As environmental factors are now recognized to contribute to the origins and expressions of many human diseases, more extensive analysis of clinically annotated tissue samples is needed.
The present results showing quantification of multi-class chemicals demonstrate that XLE and reference standardization provide a generalizable approach for human specimens. Compared to human plasma, the lung had higher levels of the more volatile PCBs and HCB and lower levels of the less volatile chemicals indicating the potential effects of respiratory exposure to organ chemical profiles ( Figure 4C). On the other hand, subjects with severe obesity showed lower plasma levels of common persistent pollutants than subjects with normal BMI (P<0.01 for HCB, PCB-138, 153, 170 and 180), indicating a negative effect of body fat on circulating levels of lipophilic chemicals (34). Stool had relatively low environmental chemical content, but is important as a route for elimination of lipophilic chemicals from fat reservoirs (24) and source for information on recent exposures from diet (35). The results obtained reflect inter-organ interactions in chemical uptake, distribution and clearance, and stress the value of XLE with GC-HRMS for analysis of diverse human tissue types. XLE with GC-HRMS provides an important step forward for human exposome research by providing a method to maximize capture of information on unidentified chemicals in human samples. Use of such information is facilitated by available annotation and library search software and algorithms, such as RAMclustR and MS-Finder. These tools cluster spectral ions and predict potential structure information on unidentified spectral features. Even without identification, statistical tests can be applied to detect mass spectral features associated with health outcomes. A rigorous interoperable data reporting structure will enable research community efforts to advance chemical identification.
In conclusion, XLE with GC-HRMS addresses a critical need for methods to deliver omics-scale biomonitoring data for exposome epidemiology in an automated, high-throughput and affordable manner. The method provides measures for both known and unidentified MS features to test for associations with disease. For known chemicals, an automated workflow integrates computational methods for data extraction, pre-processing and spectral annotation; with reference standardization, the method supports quantification of environmental chemicals.
Tests with plasma, lung, thyroid and stool samples showed that the method is suitable for multiple sample types. The simplicity of the method facilitates harmonization of exposome analyses, enabling development of cumulative human exposome databases to include information on tens of thousands of chemical exposures in tens of thousands of individuals.

Materials and Methods
Standards and reference materials. As an initial screening, we have purchased and examined spectra and chromatographic information for over 900 authentic chemical standards (1 to 20 ng/mL) in the form of single chemical or mixture of ≤40 chemicals, from Cambridge Isotopes (Tewksbury, MA) and AccuStandard (New Haven, CT). A total of 556 chemicals showed high detection sensitivity and linearity (|ρ|>0.98 over 0 to 20 ng/mL); 378 of these analyzed under dilution conditions in isooctane (0.05 to 2 ng/mL) relevant for human analyses were entered into a data library (Supplemental Table S3). In addition, two 13   Data extraction and pre-processing. Raw data were examined by checking signal-tonoise ratio, peak shape and spectral information for surrogate and internal standards using a 5 ppm m/z tolerance and 30 s retention time window in xCalibur Qualbrowser software.
TraceFinder software version 4.1 (Thermo Fisher Scientific) was tested with mixtures of standards and found challenging to simultaneously detect >250 chemicals. Thus, data extraction was performed by XCMS (36) to generate about 40,000 chemical features identified by spectral m/z and retention time; extraction with apLCMS (37) generated more than 200,000 m/z features which were considered too many for current needs. Data were pre-filtered to retain around 25,000 features that had average peak intensities for non-blank samples that were 10-fold greater than saline method blanks.
For targeted quantitation, we used the library of 378 chemical standards consisting of the spectral information (the five most abundant m/z) and retention time (Supplemental Table S3).
Features were selected with tolerance of 5 ppm m/z and 30 s retention time, and further clustered by RAMclustR (26) based on feature similarity in retention time and correlation across samples. Features were matched to chemical spectra for identification, and intensities of the most abundant m/z fragments were used for quantification. Alternatively, as an untargeted approach, features were clustered with RAMclustR without matching to target chemicals to support biostatistics and bioinformatics analysis before chemical annotation and identification.

Metabolite quantification in NIST reference serum using external standard curves.
Recoveries of each recovery standard was determined after normalizing to [ 13 C12]PCB-28 and  Table S3). Absolute quantification of chemicals in SRM-1958 was determined from external calibration curves using the most abundant m/z fragment in full scan mode. Welch's t test was used to determine significant differences between two groups, and one way ANOVA was used among multiple groups. Mann-Whitney U test was used as a nonparametric test when normality assumption failed by Shapiro-Wilk test (P < 0.05). Pearson's correlation analysis was performed with SigmaPlot 14.0 (Systat Software, Inc). All other bioinformatics analyses were performed in R Studio version 1.1.447 (RStudio, Inc). The significance level was p < 0.05 for all tests. Data from untargeted analyses can be used directly for biostatistical and bioinformatics analyses of relationships to health markers without chemical identification. By defining the mass spectral signals relative to known index chemicals, accurate mass m/z signals, along with retention time and ion intensity, results on unidentified signals can be incorporated into exposome reference databases and used for subsequent investigation, such as database searches. B. Application of tools such as RAMclustR (26) to untargeted data allow co-eluting m/z features to be studied as possible products derived from an unidentified chemical. In this example of an analysis of human plasma (n=60), unidentified signals of a two-minute retention time interval are clustered into spectra and color-coded based on clusters. Size of circles reflect raw intensities. C. Clustered m/z spectra are likely to include unidentified environmental chemicals and can be used for discovery of unidentified chemical structures. Examples are presented showing putative molecule formula assigned to spectra by MS-Finder Figure 6. Application of XLE with GC-HRMS and reference standardization provides a framework for high-throughput exposome research. The analytic workflow with single-step extraction and analysis along with pooled reference materials provides a simple, automatable method for measurement of low abundance chemicals in biological materials. Left: The framework is anchored to accurate mass m/z signals which are clustered according to retention time and intensity correlations. Middle: These can be aligned relative to index chemicals and quantified relative to standard reference materials, thereby providing key criteria for interoperability and reproducibility. Right: Both targeted and untargeted chemical data obtained by this workflow are suitable for entry into cumulative data library to support exposome research.