Metabolomics and transcriptomics pathway approach reveals outcome-specific perturbations in COPD

Chronic obstructive pulmonary disease (COPD) comprises multiple phenotypes such as airflow obstruction, emphysema, and frequent episodes of acute worsening of respiratory symptoms, known as exacerbations. The goal of this pilot study was to test the usefulness of unbiased metabolomics and transcriptomics approaches to delineate biological pathways associated with COPD phenotypes and outcomes. Blood was collected from 149 current or former smokers with or without COPD and separated into peripheral blood mononuclear cells (PBMC) and plasma. PBMCs and plasma were analyzed using microarray and liquid chromatography mass spectrometry, respectively. Statistically significant transcripts and compounds were mapped to pathways using IMPaLA. Results showed that glycerophospholipid metabolism was associated with worse airflow obstruction and more COPD exacerbations. Sphingolipid metabolism was associated with worse lung function outcomes and exacerbation severity requiring hospitalizations. The strongest associations between a pathway and a certain COPD outcome were: fat digestion and absorption and T cell receptor signaling with lung function outcomes; antigen processing with exacerbation frequency; arginine and proline metabolism with exacerbation severity; and oxidative phosphorylation with emphysema. Overlaying transcriptomic and metabolomics datasets across pathways enabled outcome and phenotypic differences to be determined. Findings are relevant for identifying molecular targets for animal intervention studies and early intervention markers in human cohorts.

platforms. The latter includes challenges in integrating clinical metadata with multiple datasets from functional genomics, proteomics, and/or metabolomics sources; i.e. a true "systems approach". Some progress has been made in the fields of functional genomics and metabolomics; however, no discovery-based approach has yet resulted in validated clinical biomarkers. For example, genetic studies previously identified alpha-1 antitrypsin in 1964 as a cause of COPD 8 ; however alpha-1 antitrypsin deficiency accounts for only 1% of COPD cases 9 . Most recently, proteomic approaches have identified new biomarkers such as plasma sRAGE for the presence and progression of emphysema 10,11 . Metabolomics approaches have identified potential markers of disease severity or therapeutic candidates such as purines 12 , sphingolipids 13 , glycerophospholipids 14 , and amino acids which differentiate patients with or without emphysema and/or cachexia 15 . While these omic-centric studies have added to the knowledge base, only limited information is obtained. A systemsor pathways-based approach could further pinpoint potential mechanisms by integrating datasets and thereby increasing statistical power. This approach has been used in plant and bacterial studies and has recently been applied to human disease. For example, integrating functional genomics and metabolomics identified markers associated with poor outcome in human neuroendocrine cancers 16 , determined predictors of asthma control 17 , and determined novel therapeutic targets in lung cancer 18 .
While single 'omic studies have helped to address the knowledge gap, to date no study has integrated transcriptomics and metabolomics from a COPD cohort to determine if specific pathways are associated with individual COPD outcomes. This could be a useful strategy since functional genomics measures the association between genes, transcripts and clinical phenotype 19 and metabolomics provides a readout of the physiological state of the human body 20 . Therefore, we used metabolomics and transcriptomics to identify markers associated with six COPD health outcomes, then mapped the significant results to pathways for each outcome. The overall goal is to identify dysregulated pathways that distinguish outcomes.
The novelty and strength of our study lies in the uniqueness of the highly phenotyped and matched samples across several stages of COPD severity. Using transcriptomics and metabolomics datasets provides information at a molecular level that is not available when only a single profiling technology is used. Determining how the molecular basis of disease phenotypes vary will improve treatment options and clinical outcomes since unique therapies can be adapted to each patient based on their individual phenotype.

Results
Cohort description. Clinical characteristics are provided in Table 1. Exacerbation frequency is the total number of exacerbations (0-6) in the year prior to enrollment. Exacerbation severity is the total number of severe exacerbations (0-3) in the year prior to enrollment. Age was uniform across Global Initiative for Chronic Obstructive Lung Disease (GOLD) Stages. Gender was balanced across males (n = 79) and females (n = 70). Clinical phenotypes were associated with GOLD stage, smoking pack-years, asthma, and current smoking status. Approximately 10-15% of the current and former smoker population falls into the Preserved Ratio, Intact Spirometry (PRISm) 21 category. Studies in COPDGene have found that these subjects are not as healthy as control subjects 21 and it is not clear where these patients fit in the spectrum of COPD. We therefore included PRISm subjects in the transcriptomics analyses since patients within that category presented with outcomes such as exacerbations or emphysema. None of the PRISm subjects overlapped with the metabolomics dataset and were therefore not included in the metabolomics analysis.
Individual outcome associations. We evaluated 12,531 transcripts and 2,999 metabolites for associations with COPD outcomes (Fig. 1). The highest number of significant transcript differences was found in FEV 1 % predicted (1,652) while no transcript differences were associated with exacerbation severity. FEV 1 /FVC had the most significant metabolites (269) while emphysema and BDR had none. There was minimal overlap in the significant transcripts ( Fig. 2A) and metabolites (Fig. 2B) associated with each outcome.

Figure 1.
Flow chart summarizing the omics methods and analysis. Peripheral blood mononuclear cells (PBMC) (n = 136 human subjects) and matched plasma (n = 131) were prepared and analyzed using functional genomics and metabolomics, respectively. Metabolomics samples were analyzed randomly in triplicate. Regression model fitting in R resulted in a list of statistically significant transcript probes and metabolites associated with each outcome. These significant transcript probes and metabolites were mapped to pathways using IMPaLA. Pathway p-values for metabolites and transcripts were combined by a meta-analysis using Fisher's method to obtain a single p-value and FDR for each pathway. In the outcome tables, '# sig' refers to the number of statistically significant transcripts probes and metabolites. BMI: body mass index, FEV 1 : forced expiratory volume in 1 second, FVC: forced vital capacity. The top gene transcripts for each outcome based on smallest p-value and FDR are listed in Table 2. ACTG1, a cytoplasmic protein found in all cell types 22 , was associated with seven transcripts, all with increased expression with increasing exacerbation frequency. For emphysema, the top gene transcripts were all associated with a decrease with worsening emphysema. These genes are associated with apoptosis (BCL10, SUMO2), cancer (CCDC186, RECQL), regulation of dietary iron absorption (CYBRD1), lipid binding (OSBPL8), protein transport (TMCO3), protein binding (TOPORS), and the mitochondria (SLC25A43, TFAM) 22 . The gene interactions among these significant associations were investigated, with FEV 1 /FVC having the most interactions (Supplementary Document S1).
For the metabolomics analysis, only significant compounds that were confidently identified are shown in Table 3. A large number of amino acids were associated with exacerbation frequency while both amino acids and carbohydrates associate with exacerbation severity. Predominantly glycerophospholipids are associated with FEV 1 /FVC, and several lipid classes associate with FEV 1 % predicted. Carnitine(C14:2) was increased with increasing exacerbation frequency, acetylcarnitine* was decreased with increasing exacerbation severity, and octanoylcarnitine** was decreased with COPD for FEV 1 /FVC. The metabolite interactions for the significant associations are shown in Supplementary Document S1. Interactions were present for exacerbations and absent for lung function.    Pathway associations. Fifteen pathways were significantly associated with FEV 1 /FVC, 22 pathways with FEV 1 % predicted, 7 with exacerbation frequency, and 15 with exacerbation severity ( Table 4). The transcript expressions for the significant pathways for BDR were all increased (p ≤ 0.0013, FDR ≤ 0.0954) post-BDR treatment; for phagosome these were ATP6V0B, ATP6V1F, FCGR2A, CYBB, HLA-DMA, RAC1, SCARB1, TUBA1B, TUBA1C, and for lysosome these were ATP6V0B, NPC2, NAGA, GAA, GUSB, HEXA, PSAP. Three pathways were unique to FEV 1 /FVC (Fig. 2C), with fat digestion and absorption being the most significant. Nine pathways were unique to FEV 1 % predicted, with T cell receptor signaling as the most significant. There was one unique pathway associated with exacerbation frequency (antigen processing and presentation). There were eight unique pathways related to exacerbation severity, with arginine and proline metabolism as the most significant. Sphingolipid metabolism was significant in exacerbation severity with ganglioside GD3 (d18:0/23:0), lactosylceramide (d18:1/14:0), and the glycosphingolipid Fucα1-4GlcNAcβ1-3Galβ1-3GlcNAcβ1-3Galβ1-4Glcβ-Cer(d18:1/22:0) positively associated (FDR < 0.0001) with disease severity. Sphingolipid metabolism was also significant in the lung function outcomes (FDR < 0.0001), with two glycosphingolipids positively associated with FEV 1 % predicted (lung size), and three glycosphingolipids and four ceramides positively associated with FEV 1 /FVC (airflow obstruction). Emphysema was related to two unique pathways (oxidative phosphorylation and mRNA surveillance pathway). There were no unique pathways associated with BDR. Individual pathway entities are available in Supplementary Document S2.
Transcriptome and metabolome pathways. Oxidative phosphorylation was uniquely associated with emphysema and therefore examined more closely. All gene transcripts were decreased (Fig. 3A) and are located in complex I, II, and IV. Likewise, antigen processing and presentation (Fig. 3B) was uniquely associated with exacerbation frequency. This pathway comprises two sub-pathways (MHC I and MHC II), however only MHC I was significantly perturbed with an overall decrease. Eight unique metabolome pathways were associated with exacerbation frequency, two of which are amino acid-related metabolic pathways; arginine and proline metabolism was decreased (Fig. 4A) and glycine, serine and threonine metabolism was increased (Fig. 4B).
Energy and cellular degradation pathways. Pathways associated with exacerbation severity and exacerbation frequency differed to some extent; therefore, HumanCyc 23 was used to determine what was responsible for the differences. Exacerbation severity comprised eight degradation pathways and three energy pathways, while exacerbation frequency comprised three degradation pathways and no perturbed energy pathways (Supplementary Document S3).
Integrated transcriptome/metabolome analysis. Three pathways were examined to determine association between gene transcripts and metabolites. Glycerophospholipid metabolism was based on FEV 1 % predicted (Fig. 5A) Fat digestion and absorption (Fig. 5B) was uniquely significant in FEV 1 /FVC with an associated increase with disease of triglycerides, cholestrol esters, phospholipids, and bile acids, and a decrease in monoglycerides, diglycerides, and docosahexaenoic acid (DHA) with disease. The SCARB1 gene located in the brush border-membrane, and ABCA1 located in the basolateral membrane, were associated with an increase in expression in COPD subjects compared to healthy controls. The protein encoded by SCARB1 is a plasma membrane receptor for high density lipoprotein cholesterol (HDL) 24 . Lastly, glycerolipid metabolism, also uniquely significant in FEV 1 /FVC revealed an increased association with phosphatidic acid and the AGPAT9 gene that encodes it (Supplementary Document S4).
Pathway relationships. To determine whether one or multiple pathways were driving the outcome perturbations, pathways were plotted based on KEGG 25 . An integrated pathway map (Fig. 6) demonstrates that MAPK and PI3K-AKT signaling pathways, although not statistically significant, were each connected to eight pathways. In comparison, lysosome and T cell receptor signaling pathway each had six connections while purine metabolism and glycine, serine and threonine metabolism each had five connections.

Discussion
Overlaying the significant transcripts and compounds using Venn diagrams allowed us to view the differences amongst the various outcomes in a discernable manner. For example, while a larger overlap between FEV 1 /FVC and FEV 1 % predicted might be expected, we observed minimal overlap. This is however not surprising because although these two outcomes are related, especially in a COPD cohort, their measures are different. FEV 1 /FVC ratio is a measure of obstruction and is required in the COPD diagnosis, while FEV 1 % is a measure of lung size. Therefore, a patient can have small lungs (FEV 1 %) and still have no obstruction and occasionally is the reverse (for mild COPD). Also, not surprising is the minimal overlap across all the tested outcomes since COPD is a heterogeneous disease. This approach was important in allowing us to distinguish phenotypes and outcomes. Overall, a metabolomic signature differentiated the COPD outcomes, which was supported in the transcriptomics analysis for corresponding pathways. Both omics datasets demonstrated that specific COPD phenotypes and outcomes are associated with different transcriptome and metabolome pathways. This supports the premise that distinct COPD phenotypes have distinct mechanisms. There were also differences in the types of omics associations. For example, emphysema appears to have a larger number of gene associations compared to exacerbation severity, which has more metabolic associations. Therefore, we chose to focus the discussion on the most significant and/or unique pathways for each outcome, as a means of narrowing down the large amount of data generated from this study.
We first interpreted exacerbations as a group and observed that for exacerbation frequency, there are no significant differences in the energy pathways and very few associations in the degradation pathways. Conversely, for exacerbation severity, the energy pathways are more dysregulated and accompanied by an increase in the number of degradation pathways, particularly carbohydrate degradation, nucleoside degradation, fatty acid degradation, and amino acid degradation. This is supported by the energy association in degradation pathways; carbohydrate breakdown provides a source of cellular energy for other syntheses such as protein transport, nucleosides act as signaling molecules or provide chemical energy to cells, and fatty acids store energy in the cell 26 . We then observed that sphingolipid metabolism was significant in exacerbation severity. In our previous studies, we observed that trihexosylceramides, a subclass of sphingolipids, were positively associated with severe exacerbations 13 .
Interestingly we also observed for exacerbations that while antigen processing and presentation was unique to exacerbation frequency, only the MHC I pathway (leading to CD8 T cell activation) and not the MHC II pathway (leading to CD4 T cell activation) is affected. The activation of CD4 T cells is beneficial to sustaining memory of CD8 T cells following acute infection 27 . Since COPD exacerbations are associated with viral 28,29 and bacterial 30,31 infections, the perturbation of CD8 but not CD4 in our study suggests that a crucial connection is not being made, and this may predispose patients to frequent exacerbations as their immune system is weakened. An alternative explanation is that CD4 T cells were activated and CD8 T cells are now destroying the viral or bacterial pathogen in the more frequent exacerbations. Interestingly, an outcome-pathway shift is observed where the perturbed genes within the MHC I pathway for exacerbation frequency end at the T cell receptor signaling pathway which is uniquely perturbed in FEV 1 % predicted. We then examined the emphysema phenotype. Oxidative phosphorylation was unique to and significant in the emphysema outcome. Cigarette smoking (CS) is a major cause of emphysematous COPD. CS is known to produce reactive oxygen species (ROS). Excessive production of ROS can lead to oxidative damage, leading to cell apoptosis 32 . In a recent study, exposure of human lung fibroblasts to various doses of nicotine and e-cigarette condensate inhibited myofibroblast differentiation and inhibited oxidative phosphorylation complex III 33 . Therefore, in spite of adjusting for smoking status, our results are not unexpected as our cohort is comprised of current and former smokers.
Next, we focused on the BDR outcome. Unlike the other outcomes, BDR was associated with two cytoplasmic pathways, lysosome and phagosome. Lysosomes break down proteins, nucleic acids, carbohydrates and lipids 26 . Phagosomes fuse with lysosomes allowing the ingested material to be digested; some of the material is recycled to the plasma membrane 26 . The perturbed transcripts in these pathways showed increased expression and may serve as markers of airflow reversibility.
We then examined FEV 1 % predicted outcome. Glycerophospholipid metabolism, although significant in many of the other outcomes, was highly significant in this lung function outcome. These lipids play a role in respiratory infections 34 and in asthma and COPD 35 since they constitute lung surfactant [36][37][38] . Surfactant, a complex combination of proteins and ~90% lipids 39 , is secreted by the airway epithelial cells into the airspace to reduce surface tension and provides a barrier to pathogens. In the FEV 1 % predicted outcome, the lipids classes LysoPE and PG, decreased and increased respectively, and have been previously suggested as markers of lung remodeling 40 . Their associated genes PLA2G6 and LPCAT2 were also decreased and increased respectively and could serve as targets in future studies to repair disrupted surfactant composition.
Sphingolipids were also highly associated with this outcome and FEV 1 /FVC. This is congruent with our recent publications that discuss in depth the significance of sphingolipids to COPD pathogenesis, including signaling related to structural cell fate, innate immune responses, and lymphocyte trafficking 41,42 . Other pathways uniquely and significantly associated with the FEV 1 /FVC was involved in fat digestion and absorption, typicaly reflecting perturbations between the intestine and blood. SCARB1 may be driving the cholesterol (CL) and cholesterol ester (CE) transport and regulation while the phospholipids (PL) may be driving ABCA1. Since bile acids and triglycerides were increased with disease, and the omega-3 fatty acid DHA was decreased with disease, this pathway suggests a diet or nutrition component associated with disease. This is supported by a recent publication on a cohort of 34,739 women where long-term consumption of fruits was inversely assocated with COPD incidence 43 . Diet has been suggested to play a role as a risk factor for many chronic diseases including COPD 44,45 , and suggestions have been made on the role of nutritional supplement therapy in the treatment of COPD 46 .
Finally, we attempted to deconstruct the results into a pathway map where relations could be observed among the outcomes and pathways. Based on the overview pathway map, MAPK and PI3K-AKT signaling pathways appeared to be the unifying links among many of the perturbed pathways. In a recent study exposing human bronchial epithelial cells to urban particulate matter, airway inflammation was triggered via activation of MAPK and NF-κB signaling pathways 47 . Although MAPK signaling was not significant in our study, NF-κB signaling was significant in FEV 1 % predicted. MAPK and PI3K-AKT signaling pathways may therefore be the driving force for downstream events. Researchers have already begun exploring these targets [48][49][50][51] .
A limitation of our study is the use of plasma and PBMCs as these may not directly translate to the lung environment. However, our previous studies, both published and unpublished, have observed a high degree of congruency for compounds in lung lavage fluid, lung tissue and plasma in both human and animal studies 52,53 . Also non-fasting samples may affect results as food intake affects the plasma metabolite profile 54 , We therefore used stringent statistics to filter out many of the diet-associated compounds. Lastly, patients in this study were taking medications, which is unavoidable with this type of cohort. However, we did our best to filter those medications from the final dataset prior to analysis. Future work would incorporate lavage fluid and increase sample numbers to gain more power for discriminating metabolite-transcript-phenotype differences. Targeted quantitative analysis of pathway-specific molecules could provide increased power and more quantitative results. Although these results need to be validated in a larger cohort, they point to therapeutic targets to be tested in animal experiments, and the most significant species from each outcome could serve as non-invasive markers of disease outcomes.
Despite these limitations, we were able to determine unique pathways that can be leveraged for treatment targets. For example, oxidative phosphorylation was decreased with worsening emphysema. It is perturbed due to cigarette smoke exposure 33 ; cigarette smoking is a major cause of emphysema and produces reactive oxygen species, leading to oxidative damage. Interventions such as rotenone, antimycin A, or oligomycin can be used to inhibit oxidative phosphorylation 33,55,56 , while other small molecule interventions can be used to activate the pathway. Autophagy was highly significant for FEV 1 /FVC and can be inhibited using, 3-methyladenine 57,58 . Another pathway that can be leveraged is antigen processing and presentation. This pathway, decreased in exacerbation frequency and is dysregulated in a rat model of COPD 59 . Therefore, activating these pathways using either small molecules or gene interventions in animal models could be beneficial in developing phenotype-specific COPD treatments to reduce or halt lung damage or viral load. These pathway interventions in mouse models have the potential to lead to phenotype-specific treatment targets. Other follow up studies could include targeted analysis in COPD or smoking cohorts such as COSYCONET 60 , Birmingham 61 , SPIROMICS 62 , or KORA 63,64 .

Conclusion
Our pilot study has shown that using a pathway enrichment approach to analyze transcriptomics and metabolomics data from matched samples, identified perturbed biological pathways associated with COPD outcomes and identified pathways that distinguish between outcomes. In addition, our results show that the various phenotypes and outcomes have distinct pathways associated with each one, in spite of COPD being a heterogenous disease. Investigators may choose to select the strongest pathways (those with low p-values) for follow up analyses using either mouse intervention studies via knockouts, mouse treatment studies, or using targeted analysis in other COPD or smoking cohorts. The long term clinical applicability of this work would be in the use of blood-based markers where a simple blood panel could identify patient outcomes and indicate patient-targeted treatment. This work could help in the management of COPD.

Methods
Ethics statement. All methods were performed in accordance with the relevant guidelines and regulations.
Human subjects were from the COPDGene cohort 65 which is a National Institute of Health-sponsored multicenter study of the genetic epidemiology of COPD. COPDGene was approved by the institutional review board at each participating center; all subjects were enrolled from January 2008 to April 2011 and provided written informed consent. The current analysis was approved by the National Jewish Health Institutional Review Board.
Study population. The COPDGene study enrolled 10,192 non-Hispanic White and Black individuals, aged 45-80 years old with at least a 10 pack-year history of smoking, who had not had an exacerbation of COPD for at least the previous 30 days. Additional information on the COPDGene study and the collection of clinical data has been previously described 65 . There were ten Preserved Ratio, Intact Spirometry (PRISm) 21 subjects included in the transcriptomics analysis since these subjects presented with exacerbations or emphysema but are considered an unclassified. Patients who were pregnant or with co-morbidities such as a history of other lung disease (except asthma), active cancer under treatment, and other cardiac hospitalizations, were excluded from the study 65 . Non-fasting blood was collected from a random sampling of subjects (n = 149) from a single clinical center (National Jewish Health) that represented the entire cohort and used for omics analysis. Peripheral blood mononuclear cells (PBMC) from 136 subjects were used for functional genomics and fresh frozen plasma from 131 subjects were used for metabolomics, with an overlap of 118 subjects for both omics technologies. The cohort characteristics are shown in Table 1.
Clinical data collection and definitions of COPD outcomes. The clinical data included forced expiratory volume in 1 second/forced vital capacity (FEV 1 /FVC), FEV 1 percent (%) predicted, percent (%) emphysema on CT scan 10 , exacerbation frequency, exacerbation severity, and bronchodilator response (BDR). COPD was defined as post bronchodilator ratio of FEV 1 /FVC < 0.70. Current or former smokers without spirometry evidence of airflow obstruction (FEV 1 /FVC ≥ 0.70) were classified as controls 66 . Subjects with without airflow obstruction but FEV 1 < 80% of predicted where classified as (PRISm) 21 . Emphysema was measured using quantitative high resolution CT as previously described 67 and quantified as the percent of lung attenuation lung voxels (LAA) ≤−950 HU on the inspiratory images for the whole lung. Emphysema was further categorized as none (LAA ≤ 5%), mild (5% < LAA ≤ 10%), moderate (10% < LAA ≤ 20%), or severe (20% < LAA). Exacerbations (flare ups) of COPD are characterized by acutely worse cough, sputum, and dyspnea. Only moderate exacerbations (treated by corticosteroids or antibiotics) or severe exacerbations (causing an admission to the hospital) were evaluated. Patients had no exacerbations for at least 30 days prior to sample collection. BDR determined the reversibility of airway obstruction by measuring FEV 1 before and after the administration of bronchodilator medication. A significant response was defined as an increase in FEV 1 of 12% and 200 mL increase 68 .

Sample collection.
Blood was drawn into a BD Vacutainer Cell Preparation Tube for peripheral blood mononuclear cells (PBMC) and plasma as previously described 65 . PBMCs were used for gene expression profiling using the Affymetrix Human Genome U133 plus 2.0 Gene Array, (GEO accession number GSE42057) 69 . Plasma was used for untargeted liquid-chromatography mass spectrometry (LC-MS)-based metabolomics.
Chemicals, standards and reagents. All  Sample preparation. Blood samples for functional genomics analysis were prepared as previously described 69 . Briefly, blood was collected from subjects, processed within 30 minutes and peripheral blood mononuclear cells (PBMC) were isolated from the supernatant for RNA isolation as previously described 69 . Fresh frozen plasma was isolated using P100 tubes as previously described 70 and stored at −80 °C until sample preparation for untargeted liquid-chromatography mass spectrometry (LC-MS)-based metabolomics. For LC-MS, 100 µL of each sample underwent protein precipitation using methanol, followed by liquid-liquid extraction using methyl-tert butyl ether as previously described 71,72 to obtain an aqueous fraction and a lipid fraction. All samples were prepared randomly to avoid batch effects from phenotype, gender, or GOLD stage.
Liquid chromatography mass spectrometry. The  The samples from the aqueous fraction were analyzed randomly with each sample vial run randomly in triplicate using an Agilent 1200 series pump using a Phenomenex Kinetex HILIC, 2.6 µm, 100 Å (2.1 × 50 mm) analytical column and an Agilent Zorbax Eclipse Plus-C8 5 µm (2.1 × 12.5 mm) narrow bore guard column. The autosampler tray temperature was set at 4 °C, column temperature was set at 20 °C, and the sample injection volume was 1 µL. The flow rate of 0.6 mL/min with the following mobile phases: mobile phase A was 50% ACN with pH 5.8 ammonium acetate, and mobile phase B was 90% ACN with pH 5.8 ammonium acetate. Gradient elution was as follows: 0. Tandem mass spectrometry (MSMS). The HILIC and C18 chromatographic methods discussed above were replicated for LC-MS/MS analysis using 10, 20, and 40 eV collision energies on a 6520 Q-TOF-MS (Agilent) with a 500 ms/spectra acquisition time, 4 m/z isolation width, and 1 minute delta retention time.
Metabolomics quality control. Samples were randomly selected from 2000 subjects from the COPDGene cohort comprising both healthy subjects and COPD patients to create aliquots of pooled QC plasma samples. At least 20 of the 131 subjects in this study were included in the pooled sample. Samples were prepared as described above and pooled for use as quality control (QC) samples to monitor instrument reproducibility across multiple days. These pooled QC samples were injected after every five samples, followed by a solvent blank. Total ion chromatograms of all samples were evaluated for retention time reproducibility and intensity overlap. Instrument QC samples were analyzed to ensure that peak areas of spiked internal standards in the plasma samples were reproducible with coefficient of variations ≤10%.
Metabolomics spectral peaks extraction. A final list of compounds was obtained by collapsing features (e.g. ions, adducts) into compounds as follows: Spectral peaks were extracted using the following parameters in MassHunter software B.07 (Agilent Technologies): Find by Molecular Feature algorithm, single charge, proton, sodium, potassium, ammonium adducts in positive ionization mode. Data were imported into Mass Profiler Professional software 14.5 (MPP, Agilent Technologies) for mass (15 ppm) and retention time alignment (0.2 minutes). Data from sample preparation blanks and instrument blanks were background subtracted to eliminate noise from contaminants. Because LC-MS data can result in missing values, data was further processed using the 'Find by Formula' algorithm parameters (+H, +Na, +K, +NH 4 adducts for positive ionization mode, charge states limited to 2, and absolute height >3000 counts). The 'Find by Formula' algorithm merged multiple features such as ions, adducts and dimers into a single compound. The final data set was then re-imported into MPP for metabolite annotation.
Metabolite annotation and identification. For the first round of identification, ID Browser within the Mass Profiler Professional software v14.5 was used to annotate metabolites with putative identifications. This software utilizes an in-house database comprising data from METabolite LINk (METLIN), Human Metabolome Database (HMDB), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Lipid Maps and applies isotope ratios, accurate mass, chemical formulas, and scores to provide preliminary identifications. A database score ≥70 out of a possible 100 was considered acceptable for annotation confidence. Chosen elements for molecular formula generation were: C, H, N, O, S, and P. An error window of ≤10 ppm was used with a neutral mass range up to 2500 Da and positive ions selected as H, Na, K, and NH 4 . The database identifications were limited to the top 10 best matches based on score, and charge state was limited to a maximum of 2. There were 1,633 out of 2,999 compounds that were putatively matched to a name in a database.
For improved confidence, only the statistically significant compounds were fragmented as described in the Tandem MS section above, and searched using either an in-house mass, retention time, and MSMS library built from purchased standards, or their MSMS fragments were matched to reference standards from the NIST14 and NIST17 MSMS spectral library 73 . For compounds where their fragments were not present in a reference spectral library, MetFrag 74,75 was used for in silico fragmentation. A list of the confirmed and putative IDs is available in Supplementary Documents S5 and S6.
Reporting level of confidence. Within the manuscript and in Table 3, the symbol * indicates that the compound name was confirmed by accurate mass (<10 ppm) and matching MSMS fragments to the NIST spectral library. The symbol **indicates that the compound name was confirmed by accurate mass (<10 ppm), retention time, and MSMS match to purchased standards. These are MSI 1. All other compound names are MSI 2 and are based on accurate mass (<10 ppm) and isotopic abundance and distribution. Data processing. Metabolomics. Processing of metabolomics data was performed using the R package MSPrep 76 . Only compounds identified in at least 2 of 3 sample vial injection triplicates were summarized to obtain one measurement per compound per subject, all others were filtered out. A maximum coefficient of variation of 0.5 was set as a threshold for utilizing the mean of the triplicates or the median of the triplicates. This threshold was selected to account for when the extraction software may have missed a peak, thus resulting in one replicate having low to no signal and therefore causing a larger CV across replicates. The data was then filtered for compounds that were found in at least 80 percent of subjects. Missing data was then imputed using Bayesian Principal Component Analysis (BPCA), which uses linear combinations of principal axis vectors to estimate the missing value and is not sensitive to the level of missing data 77 . The data was then normalized using the Cross-Contribution Compensating Multiple Standard Normalization (CRMN) method to remove unwanted batch variation 78 . Following filtering, imputation, and normalization the numbers of metabolites decreased from 1,813 to 662 and 6,183 to 2,337 compounds in the aqueous and lipid fractions respectively, for a total of 2,999 compounds. Many of the compounds that were removed from further analysis were attributed to instrument noise, software extraction artefacts, and single hits from individual subjects.
Functional Genomics. Transcript expression was measured using Affymetrix Human Genome U133 plus 2.0 Gene Array (GEO accession number GSE42057; Affymetrix, Santa Clara, CA). Quality control, batch correction, normalization, filtering, and log transformation was performed as previously described 69 . Following filtering, transcripts were reduced from 54,675 to 12,531. Subsequent statistical analysis on the reduced transcript list is described below.

Statistical analysis.
To assess the significance of transcript and metabolite levels in COPD phenotypes, a series of regression models were fit for each outcome and the form of the model depended on the distribution of the outcome. For FEV 1 % predicted, a linear regression was used. For FEV 1 /FVC and % emphysema beta regressions were used. Number of exacerbations was modeled as a negative binomial; since many subjects had no exacerbations, a zero-inflation portion was added. Severity of exacerbations used logistic regression. Since BDR was constrained between 0 (no response) and 1 (positive response), logistic regression was used for BDR. For all models, covariates were included. Covariates for FEV1% predicted were age, BMI, gender, smoking status, parents have COPD, and ATS Pack-years. Covariates for FEV 1 /FVC were age, gender, asthma, and smoking status. Covariates for % emphysema were age, gender, BMI, FEV 1 % predicted, and smoking status. Covariates for exacerbation frequency and exacerbation severity were FEV 1 % predicted, gastro esophageal reflux, St. George's Respiratory Questionnaire (SGRQ) score, and gender. Covariates for BDR were age, gender, smoking status and asthma. Significant differences within the cohort, such as age and smoking, were controlled for in subsequent analyses. Then, to assess the significance of various metabolites and transcripts, the log-transformed metabolite or transcript levels were added to the models as the independent variable, and the significance of the effect of these metabolite or transcript levels were reported. For the transcripts, p ≤ 0.015 was considered significant with FDR ≤ 0.1 adjustment. For the metabolites, p ≤ 0.05 was considered significant with FDR ≤ 0.15 adjustment. All analyses were performed in R 79 .
Pathway enrichment analysis. The significant transcript probes were converted to official gene symbols using DAVID Bioinformatics Database version 6.8 80 ID conversion tool. The significant compounds were converted to KEGG IDs using Chemical Translation Service 81 . The lists of statistically significant transcripts and metabolites within each COPD outcome were searched using IMPaLA 82 to obtain pathways relevant to each outcome. In IMPaLA, the gene transcripts and associated values were uploaded using 'gene symbol' , the metabolites and associated values were uploaded in the 'KEGG' identifier, and pathway over-representation analysis was performed. A meta-analysis using Fisher's method provided a combined p-value and a corrected p-value for the pathways. The results were filtered as follows: KEGG pathways, p ≤ 0.05, FDR ≤ 0.2 and ≥3 pathway hits. A more lenient FDR ≤ 0.2 cut-off was used since the individual samples within each group for each outcome or phenotype were small. Figure 1 highlights the experimental procedure used to generate results from the two datasets.
Visualization. Venn diagrams were created to show the overlap of statistically significant transcript probes, compounds, and pathways for the COPD outcomes using the publicly available jvenn program 83 .
Cellular function relationships for each outcome were visualized using the 'Omics Dashboard' of HumanCyc 23 (https://humancyc.org/dashboard/dashboard-intro.shtml) using ' Analysis' . The gene symbols for the significant transcript probes and their coefficients were uploaded in tab-delimited format for expression analysis. The KEGG IDs for the significant compounds and their coefficients were uploaded in tab-delimited format for metabolomics analysis. The interactions between the top gene transcripts and top metabolites were explored using ConsensusPathDB (http://consensuspathdb.org/) 84 and selecting 'over-representation analysis' . The selected identifiers were 'gene symbol' and 'ChEBI' to determine gene interactions and pathways interactions for the gene transcripts and metabolites, respectively. For the transcriptomics data, protein, genetic, biochemical, and gene regulatory interactions were considered; for protein interactions, only binary protein interactions with high and medium confidence were allowed, while complex interactions were ignored. Intermediate nodes were allowed for all interactions. For the metabolomics data, only KEGG pathways with a minimum overlap of 2 and a p-value cut-off of 0.05 was allowed. Finally, outcome and pathway relationships were visualized using KEGG 85 to plot pathways for each outcome.

Data Availability
The mass spectrometry data from this publication has been deposited to the Metabolomics Workbench database http://www.metabolomicsworkbench.org/ and assigned the identifier PR000438. The microarray data from this publication is available in the Gene Expression Omnibus (GEO) database https://www.ncbi.nlm.nih.gov/geo and assigned the identifier GSE42057 69 .