Metabolomic similarities between bronchoalveolar lavage fluid and plasma in humans and mice

This observational study catalogues the overlap in metabolites between matched bronchoalveolar lavage fluid (BALF) and plasma, identifies the degree of congruence between these metabolomes in human and mouse, and determines how molecules may change in response to cigarette smoke (CS) exposure. Matched BALF and plasma was collected from mice (ambient air or CS-exposed) and humans (current or former smokers), and analyzed using mass spectrometry. There were 1155 compounds in common in all 4 sample types; fatty acyls and glycerophospholipids strongly overlapped between groups. In humans and mice, more than half of the metabolites present in BALF were also present in plasma. Mouse BALF and human BALF had a strong positive correlation with 2040 metabolites in common, suggesting that mouse models can be used to interrogate human lung metabolome changes. While power was affected by small sample size in the mouse study, the BALF metabolome appeared to be more affected by CS than plasma. CS-exposed mice showed increased plasma and BALF glycerolipids and glycerophospholipids. This is the first report cataloguing the metabolites present across mouse and human, BALF and plasma. Findings are relevant to translational studies where mouse models are used to examine human disease, and where plasma may be interrogated in lieu of BALF or lung tissue.


Results
Catalogue and overlap of the plasma and BALF metabolomes in mice and humans. We used metabolomics to develop a catalogue of plasma and BALF metabolites from both mice and humans and detected a total of 7,654 unique metabolites in all samples (Fig. 1A); these included all smoking and non-smoking samples. We found more than 4,000 metabolites in human and mouse plasma and approximately 3,000 metabolites in the human and mouse BALF (Fig. 1A). The majority of metabolites in both fluids were lipids with only a minority (~10%) being aqueous metabolites. This is consistent with our previous results 14 and is likely because lipids are a major constituent of biological membranes 15 .
Overall, there was at least 50% overlap in metabolites between pairs of the four groups ( Fig. 1A) as follows: 62.3% of human BALF metabolites were also present in mouse BALF, 67.1% of mouse BALF metabolites were also present in mouse plasma, 52.2% of human BALF metabolites were also present in human plasma, and 57.3% of mouse plasma metabolites were also present in human plasma. There were 2,040 compounds common to mouse and human BALF (Fig. 1A); 1,846 were lipids, of which 1,075 were annotated by database identification; 194 were aqueous molecules, of which 87 were annotated by database identification. Based on these annotations, carnitines, purines, amino acids, peptides, sphingolipids, and glycerophospholipids were common to both the human and mouse BALF samples. Table 1 includes a representative list of these metabolites. A comprehensive list is available in Supplemental Table S4. A total of 2,478 compounds were common to mouse and human plasma (Fig. 1A). Of the plasma metabolites, 2,208 were lipids of which 1,383 were annotated using a database; 270 were aqueous molecules of which 213 were annotated using a database. Many of these are signaling molecules including LysoPCs, ceramides, and diglycerides (Table 1 and Supplemental Table S5). Other common groups of metabolites in mouse and human plasma included carnitines, amino acids, carbohydrates, sphingolipids, steroids, and vitamin D2.
There were 1,155 metabolites common to all four groups (Fig. 1A), representing 84 biochemical classes (Fig. 1C). In metabolomics, important biological changes may be found in biochemical groups of molecules in addition to individual species. For example, a total of 112 glycerophospholipids and 104 fatty acyls were found in all 4 sample types. Similarly, there were over 120 glycerolipid, sterol lipid, sphingolipid, and prenol lipid molecules that were common between all 4 samples types (Fig. 1C). Conversely, there was relatively little overlap between purines, pyridines, and pyrimidines. This could be a reflection of biology or of platform limitations.

Metabolite correlations across species and biological fluids.
In order to establish levels of congruency between sample types, Spearman's rank correlation coefficient was used (Fig. 1B). For filtering purposes, this comparison only included molecules that were detected in at least 20% of all samples. This filter level was chosen due to the small sample size, to reduce/eliminate false positives, and to avoid over-filtering the data and potentially missing important metabolites. There was no significant correlation between human BALF and mouse plasma (r = 0.0374, p = 0.794) or between human plasma and mouse BALF (r = −0.0703, p = 0.624). This indicates that these samples have dissimilar metabolomes, in spite of having over 50% metabolites in common (Fig. 1A). Mouse plasma and human plasma were not correlated (r = 0.0975, p = 0.496). Mouse BALF and mouse plasma (r = 0.149, p = 0.135) and human BALF and human plasma (r = 0.222, p = 0.275) were positively correlated but did not reach statistical significance. However, mouse BALF and human BALF were positively correlated (r = 0.326, p = 0.0195). The positive and significant correlation indicates that the samples have similar metabolomes.
We examined the distribution of compounds in the closely correlated biofluids. The metabolites along the green diagonal lines in the scatter plots of mouse BALF and human BALF (Fig. 1D), showed the strongest positive correlations. Similar positive correlation was observed for the mouse plasma and mouse BALF (Fig. 1E). There was weak correlation in the human plasma and human BALF (Fig. 1F). The majority of these compounds are listed in Supplemental Tables S4 and S5. Examples of highly correlating molecules from Fig. 1D-F include phosphatidylinositols (PI), phosphatidylserines (PS), diglycerides (DG), sterol lipids such as cholesterol and Δ8,14-sterol, and fatty acids such as eicosanedioic acid and pentadecyclic acid.
We next focused on individual metabolites that may correlate in BALF and plasma, irrespective of species. Out of 298 annotated metabolites, about half were positively correlated. Figure 2 shows the correlation plot of a subset of these metabolites that had high abundance across all the sample types and were diverse across metabolite class. L-acetylcarnitine was the only negatively correlated metabolite across BALF and plasma. In addition, BALF acetylcarntine positively correlated with nine plasma metabolites, while plasma acetylcarnitine negatively correlated with 18 BALF metabolites. Twenty-four other metabolites in this subset were positively correlated in BALF and plasma. These included L-homocysteic acid, octadecanoyl-carnitine, N-undecanoylglycine, LysoPE(18:0), LysoPC(20:4), MG(18:0), PC(32:0), PC(34:0), PE(40:7), and PI (38:5).
Unique metabolites across BALF and plasma in mice and humans. While plasma metabolites can conceivably be used as proxies for lung metabolites, it is also important to determine what compounds are unique to each biofluid and species. Therefore, we determined the compounds that were only present in a single sample type. Unique metabolites were detected in each of the four sample groups; 506 in mouse BALF, 603 in human  BALF, 960 in mouse plasma, and 1,478 in human plasma (Fig. 1A). Therefore, 33.9% of the human plasma metabolites were only found in human plasma. Similarly, 22.2% of mouse plasma metabolites were only found in mouse plasma, 18.4% in only human BALF, and 16.7% in only mouse BALF. Table 2 shows a list of unique annotated metabolites detected in each group.
Distribution of compound classes across sample types. Next, we determined whether distinct classes of compounds were found predominantly in any biofluid. Sixty compound classes were tested using a proportional test (described in methods); thirteen had significant differences for the proportion of compounds detected in the class across the groups (Fig. 3). The most represented metabolite classes common to all of the four groups were prenol lipids, fatty acyls, and glycerophospholipids (Fig. 3B). The prenol lipids range from quinones, hydroxyquinones, C20 isoprenoids, and retinoids, to triterpenoids and terpene glycosides. A few examples from these prenol lipid classes include coenzymes, vitamins such as A, E and K, retinoic acid as well as plant-related metabolites in plasma such as acetylursolic acid. The fatty acyls include octadecanoids and fatty acyl glycosides. The glycerophospholipids include phosphatidylethanolamines (PE), phosphatidylcholines (PC), phosphatidylserines (PS), phosphatidylglycerols (PG), and phosphatidylinositols (PI). Due to the enrichment of lipids from the BALF and plasma during sample preparation, and the optimization of the LC-MS method to detect and separate lipids, a large number of lipid species were identified. Figure 3A shows that benzopyrans, peptides, amino acids, sterol lipids, sphingolipids, and glycerophospholipids were highly represented (p < 0.001) in human plasma compared to the other sample groups. Isoindoles were only detected in human plasma. Carbonyl compounds, glycerolipids, and fatty acyls were highly represented (p < 0.05) in human BALF. Benzopyrans were present in human BALF, mouse BALF, and human plasma; however, these were absent in mouse plasma (Fig. 3A). Benzoxepines were highly represented in mouse plasma (p < 0.001); however they were absent in human BALF and mouse BALF.
Cigarette smoke induced metabolome changes in BALF and plasma. The BALF and plasma metabolomes were compared in a group of mice exposed to ambient air or cigarette smoke for 1 day (n = 7 mice per group), to determine the congruence of metabolite changes due to acute CS exposure. There were 124 plasma metabolites and 380 BALF metabolites that were differentially regulated and database annotated in smoking versus non-smoking mice (Storey with Bootstrapping multiple testing correction, q < 0.1); 48 of these differentially regulated metabolites were common to both groups. Their degree of congruence is presented as a heat map in Fig. 4A. There were 30 compounds with the same direction of regulation (concordant) in both BALF and plasma and 18 metabolites with opposite directions (discordant) in BALF and plasma. Overall, the following changes were observed in response to smoking: glycerophospholipids and glycerolipids were up-regulated in BALF and plasma. Two anandamides and two sphingolipids were down-regulated in both BALF and plasma. Leucine, two steroids and two vitamin D3 metabolites were down-regulated in BALF but up-regulated in plasma. Ubiquinol-8 and linoleyl carnitine were up-regulated in BALF and down-regulated in plasma. We then explored the effect of smoking on the global BALF and plasma human metabolome ( Fig. 4B-E). Due to the small sample size, only qualitative analysis could be conducted. In summary, it appears that current cigarette smoking results in additional metabolites compared to the former or non-smoking groups. For example, there were 3626 compounds in the current smokers versus 3495 in the non-smokers in mouse BALF; there were 2741 compounds in the current smokers versus 2188 in the former smokers in human BALF; there were 3584 compounds in the current smokers versus 2632 in the former smokers in human plasma.

Discussion
This study used LC/MS-based metabolomics to catalogue compounds in mouse BALF, mouse plasma, human BALF, and human plasma. These compounds were compared to determine overlap amongst the groups and to identify concordant and discordant changes in BALF and plasma in a mouse model of CS exposure. Overall, we found that over 50% of metabolites were common to all four sample groups. Lipids were more prevalent compared to aqueous small molecules such as amino acids and purines; this could be due to sensitivity limitations. A recent study by Peng et al. detected 250 aqueous compounds in rat BALF 16 . This is consistent with the 275 and 331 aqueous molecules detected in our aqueous fraction of the human BALF and mouse BALF, respectively. Their  Table 2. Representative unique metabolites to mouse and human BALF and plasma. This list contains randomly selected metabolites that were annotated using an in-house database comprised of METLIN, HMDB, Lipid Maps and KEGG. Metabolites were selected randomly. Scores ≥70 out of a possible 100 and mass errors ≤10 ppm were used for annotation thresholds. Annotations were based on exact mass and isotope ratios. DG: diglyceride, PE: phosphatidylethanolamine, PIP: phosphatidylinositol phosphate, PS: phosphatidylserine.
study, like ours, identified metabolites belonging to amino acid and purine metabolite classes; these findings are consistent with other studies that analyzed human BALF 17 . Some of the compounds detected in both BALF and plasma in our study include acetylcarnitine, carnitine, creatine, MG(18:0), leucine, and hypoxanthine. Dysregulations of these metabolites have previously been reported in the BALF of mice 12 , rats 16 , and humans 17 in association with asthma, COPD 13 and/or acute respiratory distress syndrome (ARDS). The presence of these molecules in both BALF and plasma suggests that plasma could be used as a surrogate for BALF, thereby providing a non-invasive fluid to study these lung diseases. Their dysregulation in both mice and humans also suggests that mice may be useful models in studying human lung disease including emphysema, as demonstrated in this cigarette smoking model.
Signaling molecules such as LysoPCs, ceramides, and diglycerides were common to mouse and human plasma and are associated with dysregulated plasma levels in airway diseases such as asthma [18][19][20] , in human COPD plasma 8 , and upon exposure to CS in an animal model 7 . In addition, amino acids, sphingolipids, and vitamin D are associated with lung diseases. Some examples include: serum and plasma vitamin D deficiency in asthma 21,22 , serum amino acids perturbation in COPD 23 , plasma and CSF amino acids perturbation in smokers 24 , and an increase in lung tissue sphingolipids in cystic fibrosis 25 . Collectively, results suggest that metabolites are conserved across species and biological fluids. Additional studies in larger, disease-specific cohorts are necessary to understand the roles of these common compounds in disease and to determine if plasma metabolites can act as non-invasive surrogates for lung tissue or BALF metabolites.
Next, we determined the correlation in metabolites between species and biofluid. We observed that mouse BALF and human BALF were positively correlated. BALF metabolites reflect the lining of the airways; therefore, it was not surprising that BALF in mice and humans were the most similar. Mouse plasma and mouse BALF were the next positively correlated. This may be due to the controlled environment of the mice, including identical feeding and cage conditions. The presence of large numbers of exogenous metabolites in human plasma may also explain why human plasma did not correlate strongly with any of the other tested fluids. In spite of this, several metabolites were correlated between BALF and plasma, although these findings require validation in larger cohorts.
We then explored unique metabolites from each sample group. Many of the unique metabolites (MS level 2 putative identifications) in the human plasma may be attributed to the contribution of exogenous metabolites from diet, xenobiotics, medications, and environmental exposures 26 compared to controlled mouse studies. Odd chain lipids have historically been suggested to be bacterial in origin; however, recent studies have noted their presence in plant and mammalian species 27 . These odd chain lipids are associated with disease including cardiovascular and peroxisomal disorders 28 . Exposure to CS may potentially influence levels of certain endogenous metabolites. Alternatively, cigarette smoke is known to cause adduction/modification; although, to our knowledge, this has not been widely reported for small molecules. Overall, the large number of unique metabolites could be explained by both endogenous and exogenous components.
In many disease conditions, changes are seen in a compound class rather than a single molecule. Therefore, the distribution of metabolite classes was examined across the four sample groups. The lipid classes such as glycerophospholipids were most highly represented in all sample types. This is expected as glycerophospholipids are a major component of cellular membranes 29 . The abundance of lipids in the human plasma is consistent with previous studies of human plasma 30 . Many compounds in these lipid classes play crucial roles in disease and inflammation [31][32][33][34][35][36] . Sphingolipids, for example, are associated with CS-induced injury and COPD 8,37,38 . Benzopyrans and isoindoles were predominant in human plasma. Benzopyrans exhibit anti-inflammatory properties through inhibition of prostaglandin E 2 production. Isoindoles are natural products with diverse biological activities including anticancer or antimicrobial properties 39 . Based on their presence in both plasma and BALF and their relationship to disease, these molecules are potentially good proxy candidates.
We then compared the global metabolite profile of human and mouse, plasma and BALF following CS exposure. Qualitatively, results showed that the BALF and plasma of current cigarette smokers contained more metabolites than the former or non-smoker. In addition, the presence of unique metabolites in the smoking groups of mouse BALF, human BALF, and human plasma also points to the introduction of exogenous metabolites to these metabolomes, potentially due to cigarette additives. Results also suggest that smoking may deplete certain metabolites while enhancing others. When considering cigarette smoke, it is possible that many of the 599 additives in cigarettes and 4,000 chemical compounds in tobacco smoke 40,41 may have contributed to the BALF and plasma metabolomes.
Lastly, we investigated changes in the metabolome due to CS exposure in matched mouse biofluids to determine whether plasma reflects changes occurring in the lung. Three times as many changes were observed in BALF than in plasma; this is expected since BALF is closer to the point of injury (i.e. the lung). Many of these metabolite changes were common to both BALF and plasma; 30 compounds were up-regulated in both biofluids, suggesting that these compounds may be of interest to investigators analyzing plasma as a less-invasive means to study the lung. This would be particularly important in emphysema and/or CS-exposure studies, where BALF is difficult to obtain. We observed that sphingolipids were dysregulated in both biofluids: SM(d18:0/16:1) was up-regulated in both BALF and plasma, while C16 sphingosine and N,N,N-trimethyl-sphingosine were down-regulated in both BALF and plasma. Sphingolipids are messenger molecules involved in cellular homeostasis, oxidative stress, and apoptosis. We have previously shown a role for sphingolipids in association with CS exposure, COPD, and emphysema 8,38,42 . These compounds as well as the strongly positively correlated metabolites, have a dual purpose: (1) those present in both BALF and plasma offer a non-invasive clinical alternative to collecting plasma instead of BALF in humans, (2) those present in mice and humans are important in translational studies, such as in drug trials, or for preliminary studies in mice with the goal of subsequent studies in humans.
There were 18 compounds with an opposing direction of regulation in BALF vs. plasma. Linoleyl carnitine was up-regulated in plasma and down-regulated in BALF. Carnitines have not been widely reported in association with lung and airways disease or cigarette smoke exposure. However, L-carnitine has been shown to improve symptoms in children with moderate persistent asthma when administered orally 43 . Dietary supplementation of L-carnitine has also been shown to reverse renal oxidative stress and mitochondrial dysfunction in female BALB/c mice who were exposed to cigarette smoke 44 . In a recent study, L-carnitine decreased with emphysema progression in mice, and L-carnitine supplementation improved lung function and reduced apoptosis 13 . Two vitamin D3 metabolites were down-regulated in BALF but up-regulated in plasma. Vitamin D deficiency has been reported in response to CS exposure, and in asthmatic and COPD patients 45,46 . Our results suggest active transport across the lung/blood barrier, potentially explaining their decrease in BALF and increase in plasma.
We acknowledge that limitations exist in our study. Metabolite annotations were based on exact mass and isotope ratios; only a selected number of metabolite annotations were confirmed using MS/MS since obtaining authentic standards and MS/MS was not possible for thousands of metabolites. However, since identical conditions were used, including sample preparation and chromatography, annotations are consistent and comparable across samples. Also, the sample size for the human cohort was small and as such, statistical comparisons could not be performed. Future work will focus on addressing these limitations.

Conclusion
Over 50% of metabolites overlap between plasma and BALF of mice and humans. Metabolites in common between species are good candidates for molecular intervention studies in mouse models. CS exposure studies revealed that although certain metabolites were concordant between BALF and plasma, others exhibited opposing directions. This emphasizes the biological complexity in studying whole organisms and potential of a system to compensate for changes due to external or internal stimuli. Results from the mice suggest that CS-induced changes in the lung may not be fully recapitulated in plasma; further, interrogation of one biofluid may not be sufficient to inform on health status. Since sample size was limited, further experiments are required to arrive at specific conclusions regarding biological perturbations. However, overall, our findings support the use of mouse models and plasma as proxies for human samples when studying lung disease.

Methods
Ethics statement. All methods were performed in accordance with the relevant guidelines and regulations. Animal studies were approved by the Animal Care and Use Committee of Indiana University. Human subjects were from the Genetic Epidemiology of COPD (COPDGene) cohort, which is a National Institutes of Health-sponsored multicenter study of the genetic epidemiology of COPD 47 . COPDGene was approved by the institutional review board at each participating center; all subjects were enrolled from January 2008 to April 2011 and provided written informed consent. The current analysis was approved by the National Jewish Health Institutional Review Board.
Animal studies. For the metabolite catalogue analysis, matched plasma and BALF was collected from C57BL/6 mice (Jackson Laboratory, Bay Harbor, ME). Three-month old female mice were exposed to ambient air for one day (n = 5 air control) and mice were exposed to CS for up to nine months (n = 5 smoking). For the statistical comparisons used to determine congruence between BALF and plasma upon acute CS exposure, mice were exposed to ambient air for one day (n = 7 air control) or exposed to cigarette smoke for one day (n = 7 smoking).
The acute exposure mice were exposed for 5 hours per day, while the chronic exposed mice were exposed for 5 hours per day, 5 days a week to 11% mainstream and 89% side stream smoke from reference cigarettes (3R4F; Tobacco Research Institute, Kentucky) using a Teague 10E whole body exposure apparatus (Teague Enterprise, Scientific RepoRts | 7: 5108 | DOI:10.1038/s41598-017-05374-1 CA) with monitored suspended particulates (average 90 mg/m 3 ) and carbon monoxide (average 350 ppm). At the end of experiments, the mice were euthanized. The pathophysiologic features between the air control and smoking mice in this CS model have been previously published 7,[48][49][50] .
Blood was collected via venipuncture of the right ventricle and collected in tubes with 1X Complete EDTA-free protease inhibitors (Roche). Plasma was isolated, snap frozen and stored at −80 °C until analysis. BALF collection was performed using a total of 1.0 mL PBS divided into three washes. The first wash was spun down and the supernatant (acellular BALF) was used for analysis.
Human studies. Human subjects were from the Genetic Epidemiology of COPD (COPDGene) cohort 47 .
Plasma was collected using a P100 tube (BD) as described previously 51 . BALF was obtained as described previously 52 . Briefly, BALF was collected in the right middle lobe and lingual by instilling two aliquots of 40 mL and one aliquot of 50 mL of sterile saline per lobe (i.e., 130 mL per lobe, total volume = 260 mL per subject), which is withdrawn by gentle manual suction and immediately placed on ice. Samples were sub-aliquoted into vials for a variety of studies. Aliquots for metabolomics analysis were frozen at −80 °C and stored until sample preparation and MS analysis. Sample preparation for BALF and plasma. BALF and plasma samples were stored at −80 °C prior to sample preparation. Protein precipitation using methanol, and liquid-liquid extraction using methyl-tert butyl ether (MTBE) was performed on 100 µL of BALF and plasma as previously described 14,53 . An aqueous fraction and a lipid fraction were obtained. Plasma and BALF lipids were reconstituted in 200 µL of methanol; plasma aqueous metabolites were reconstituted in 100 µL of 95:5 water:acetonitrile. Due to low concentrations of aqueous metabolites, 200 µL of BALF was used and the aqueous fraction was dried down in a speedvac at 45 °C and reconstituted in 50 µL of 95:5 water:acetonitrile. Liquid chromatography. Lipid fractions of extracted BALF and plasma samples were resolved using reverse phase chromatography using an Agilent Zorbax Rapid Resolution HD (RRHD) SB-C18, 1.8 micron (2.1 × 100 mm) analytical column and an Agilent Zorbax SB-C18, 1.8 micron (2.1 × 5 mm) guard column. An Agilent 1290 series high performance liquid chromatography (HPLC) pump was used. Injection volumes were adjusted because of sample dilution effects in BALF (our preliminary sample extraction studies showed human BALF was at least four times more diluted than mouse BALF). These dilution differences in sample types were adjusted as follows: 4 µL of mouse or human plasma were injected, 4 µL mouse BALF was injected, and 15 µL human BALF was injected. HPLC flow rate was 0.7 mL/min with the following mobile phases: mobile phase A was water with 0.1% formic acid, and mobile phase B was 60:36:4 isopropyl alcohol:acetonitrile:water with 0.1% formic acid. The gradient was as follows for positive mode: 0-0.5 minutes 30-70% B, 0.5-7.42 minutes 70-100% B, 7.42-9.9 minutes 100% B, 9.9-10.0 minutes 100-30% B, 10-14.6 minutes 30% B. Autosampler tray temperature was set to 4 °C and column temperature was set to 60 °C. The gradient was as follows for negative mode: Reversed-phase chromatography was used to analyze the aqueous fraction of the mouse and human BALF samples on an Agilent 1200 series pump using an Agilent Zorbax Narrow Bore RRHT SB-AQ (1.8 micron, 2.1 × 100 mm, 80 Å) analytical column and an Agilent Zorbax SB-AQ (5 micron, 2.1 × 12.5 mm) guard column with a 10 µL sample injection volume. The flow rate was 0.3 ml/min using the following mobile phases: mobile phase A was water with 0.1% formic acid, and mobile phase B was 90:10 acetonitrile:water with 0.1% formic acid. Quality control (QC). To limit variations in metabolite abundances, sensitivity, and batch effects, all samples were prepared on the same day. Also, samples were analyzed in a single LC/MS run to avoid batch effects to avoid day-to-day variation, HPLC column changes, or instrument drift. Total ion chromatograms (TIC) were evaluated for retention time reproducibility using spiked internal standards and endogenous compounds. The largest retention time variation was 0.58% and 1.54% for the spiked standards and endogenous compounds respectively and represents a variation <0.25 minutes, which is well within acceptable limits. Signal intensity of the TICs was also evaluated. The largest variation was less than 10% CV in the largest range of the TIC, and HPLC pressure curves were less than 5% CV. Instrument QC samples, injected after every five samples, were analyzed to ensure that peak areas of 9 spiked internal standards were reproducible (<10% CV) throughout the analysis. The % CVs for the internal standards in the aqueous plasma analysis and aqueous BALF analysis was less than 10%, and for the BALF and plasma samples in the lipid analysis was less than 5%. The % CVs, retention times, and peak areas for the internal standards and selected endogenous compounds are presented in Supplemental Table S1. These standards were used for quality control purposes rather than for normalization. Data processing. Spectral data was extracted using the following parameters in MassHunter software (Agilent Technologies): Find by Molecular Feature algorithm, single charge, proton, sodium, potassium, ammonium adducts in positive ionization mode. Data were imported into Mass Profiler Professional software (MPP, Agilent Technologies) for mass (15 ppm) and retention time alignment (0.2 minutes), and data filtered by selecting features that were present in at least 50% of each sample group. Data from sample preparation blanks and instrument blanks were background subtracted to eliminate noise from contaminants. Because LCMS data can result in missing values 54 , data was further processed using the 'Find by Formula' algorithm parameters (+H, +Na, +K, +NH 4 adducts for positive ionization mode, charge states limited to 2, and absolute height >3000 counts). The 'Find by Formula' algorithm merged multiple features such as ions, adducts and dimers into a single compound which resulted in 7654 total compounds in all sample types and in both species (BALF lipid+, BALF lipid−, BALF aqueous, plasma lipid+, plasma lipid−, plasma aqueous). The final data set was then re-imported into MPP for differential and statistical analysis. Compounds were compared using several strategies across the samples (human BALF, human plasma, mouse BALF, mouse plasma), fractions (lipid versus aqueous), and ionization mode (positive and negative). The metabolites and their associated signal values were exported to GraphPad Prism v6.04 and Excel Professional Plus 2010 (Microsoft Corporation, Redmond, WA) for visualization purposes.

Chemicals, standards and reagents.
The total volume of compounds (number of compounds and peak area of each) in the individual samples was calculated using MassHunter Profinder (Agilent). BALF data was normalized to total volume using external scalar. This external scalar normalization technique used total volume to reduce the variance in the biological measurements due to dilution effects in BALF from sample collection 55 . Variability was evaluated using coefficient of variation 56 . Metabolites with <10% CV increased from 209 without normalization to 1192 post-normalization in the control mice, and increased from 219 to 1191 metabolites in the smoking mice post-normalization.
Metabolite annotation. ID Browser within the Mass Profiler Professional (MPP) software v13.1 (Agilent) was used to tentatively annotate metabolites. This software utilizes an in-house database comprising data from METabolite LINk (METLIN), Human Metabolome Database (HMDB), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Lipid Maps; MPP uses isotope ratios, accurate mass, chemical formulas, and database scores (scale of 0 to 100) to annotate compounds by database ID, molecular formula, or compound number. A database score >70 out of a possible 100 was considered acceptable for annotation confidence; results were manually confirmed. Molecular formula generation included the following elements: C, H, N, O, S, and P. An error window of <10 ppm was used with a neutral mass range up to 2000 Da. The database identifications were limited to the top 10 best matches based on score, and charge state was limited to a maximum of 2. Tandem MS was used to improve confidence in identifications based on fragmentation information. Fragments were matched to reference standards from METLIN and NIST14 MSMS spectral libraries 57 . All identifications are Metabolomics Standards Initiative (MSI) level 2 based on the proposed minimum reporting by Sumner 58 .
Annotated metabolites were grouped into classes using the Human Metabolome Database (HMDB) and Lipid Maps classification system. For the compound classes with four or less detected metabolites in at least one of the four groups (human BALF, human plasma, mouse BALF, mouse plasma), that class was excluded for at least 2 of the following reasons: (1) most likely a false annotation, (2) below the detection level of the instrumentation, or (3) too many classes to display due to space limitations.
MS/MS analysis. The HILIC, C18 and SB-AQ chromatographic methods were replicated for LC-MS/MS analysis using 10, 20, and 40 eV collision energies on a 6520 Q-TOF (Agilent) with a 500 ms/spectra acquisition time, 4 m/z isolation width, and 1 minute delta retention time.
Fragmentation data was exported to the freely available NIST MS Search v.2.2 g GUI program 59 (NIST, Gaithersburg, MD, USA) and were matched to spectra in the NIST 14 Mass Spectral Library. This library contains 193,119 spectra representing 43,912 precursor ions and 8,351 compounds; a detailed description of the library is available 60 . Automated library searching was performed using spectrum search type 'Identity' , search with "MS/MS", and default program settings. The search m/z tolerance was ±0.4 for precursor ions and ±0.4 for product ions without ignoring the precursor ion. The MS search program outputted a list of matched chemical compounds including several measures of spectral similarity 61 . The Match Factor (MF) is the normalized dot product with square-root scaling of the experimental mass spectrum and a library mass spectrum, using all the elements in the experimental mass spectrum. The Reverse Match Factor (RMF) is the normalized dot product with square-root scaling of the experimental mass spectrum and the library mass spectrum, but the elements that are not present in the library mass spectrum are not included.
Fragments were matched to reference standards from METLIN and NIST14 MSMS spectral libraries 57 . Selected matches are presented in Supplemental Tables S2 and S3.
Statistical analysis. Metabolite class testing. Analysis of sixty metabolite classes was performed in R using a proportional test 62 to test whether the proportion of metabolites detected (out of all metabolites defined for that class, categorized by Lipid Maps and HMDB) was different among the groups (p < 0.05). Subsequent analysis was performed to determine which of the groups was significant within each of the significant classes.
Correlation analysis. Spearman's rank correlation coefficient was used for correlation calculations, and coefficients were tested if they were significantly different from 0 in R. Significance was considered at p < 0.05.
Differential analysis across mouse BALF and plasma. Statistical analysis of the matched mouse BALF and plasma samples was performed using MPP v13.1 (Agilent). An unpaired t-test was used to compare matching BALF and plasma for day 1 air controls (n = 7) and day 1 cigarette smoking mice (n = 7). Metabolites that were present in at least 50% of each group, passed fold change ≥±1.5, and Storey with Bootstrapping multiple testing correction q ≤ 0.1 are reported. Because the sample size for human BALF and plasma was small (n = 5), statistical comparison between smoking and non-smoking humans was not possible for this dataset. Excel Professional Plus 2010 (Microsoft Corporation, Redmond, WA) was used to create graphics.