Cystic fibrosis (CF) is an autosomal recessive disorder caused by mutations within the CF transmembrane conductance regulator (CFTR). CFTR mutations result in abnormal ion transport across the epithelium leading to the production of thick bronchial mucus (Knowles et al., 1983; Quinton, 1983, 2008; Widdicombe et al., 1985). This mucus provides niche space for microbial colonization, resulting in chronic lung infections that begin in infancy and persist throughout a patient’s lifetime. The CF lung is a polymicrobial infection containing viruses (Willner et al., 2009, 2012), fungi (Delhaes et al., 2012; Willger et al., 2014) and bacteria (Rogers et al., 2003; Guss et al., 2011; Blainey et al., 2012; Filkins et al., 2012; Goddard et al., 2012; Lim et al., 2013, 2014; Maughan et al., 2012; Zhao et al., 2012; Price et al., 2013; Salipante et al., 2013; Zemanick et al., 2013; Cuthbertson et al., 2014; Smith et al., 2014) (literature summarized in Supplementary Table S1) co-existing in a complex community. The community is diverse during youth, but as patients age, this diversity decreases and specific pathogens take over (Harrison, 2007; Zhao et al., 2012).

Pseudomonas aeruginosa is traditionally regarded as the principle pathogen of CF disease and is one of the most common bacteria cultured from CF patients (Harrison, 2007). This bacterium is known to produce a myriad of small molecules that damage both host and microbial cells (Allen et al., 2005; Irie et al., 2005; Zulianello et al., 2006; Rada et al., 2008; Heeb et al., 2011). Rhamnolipids (Zulianello et al., 2006), phenazines (Allen et al., 2005) and quinolones (Calfee et al., 2001) have been shown to be important for the pathogenesis of this bacterium in vitro, but their role in CF disease is less well known. These molecules have been detected in CF sputum (Kownatzki et al., 1987; Collier et al., 2002) and the redox active phenazines have been shown to accumulate as lung function declines (Hunter et al., 2012), but beyond these studies little is known about the diversity of these molecules in the lung, how their chemistry differs between the lung environment and the laboratory and how detection of P. aeruginosa metabolites is reflected in other diagnostic methods on the same sample. A better understanding of the chemistry of these molecules in the lung environment is required to realize their importance in the pathology of CF disease.

Irrespective of P. aeruginosa metabolites, there is limited information on the overall molecular constituents of CF lung secretions. Metabolomics studies have investigated the microbial and host molecules in sputum (Jones et al., 2000; Palmer et al., 2007; Bensel et al., 2011; Yang et al., 2012; Twomey et al., 2013), breath gas or breath condensate (Barker et al., 2006; Celio et al., 2006; Newport et al., 2009; Robroeks et al., 2010; Montuschi et al., 2012; Monge et al., 2013) and bronchiolar lavage fluid (Wolak et al., 2009; Eiserich et al., 2012; Yang et al., 2012). These studies demonstrated that core primary metabolites including ethanol, acetate and 2-propanol distinguished CF from non-CF samples (Montuschi et al., 2012) and that the fermentation product 2,3-butanedione was associated with microbial activity (Whiteson et al., 2014). A study investigating host-derived lipid mediators showed a complex network of inflammatory lipid signaling in CF lungs (Yang et al., 2012). These metabolomics studies have identified important molecules in CF disease that are potential markers of host or microbial physiological states (Collier et al., 2002; Price-Whelan et al., 2006; Diggle et al., 2007; Mitchell et al., 2010; Pierson and Pierson, 2010). However, the chemistry of CF lung secretions remains poorly characterized, particularly concerning specialized metabolites that function outside primary metabolism.

There are major challenges when studying the CF metabolome and determining the role of particular molecules in vivo. Principally, the chemical milieu is extremely complex, containing thousands of host and microbial metabolites (Molloy, 2014). Comprehensive and untargeted assessments of the CF airway metabolome are hard to interpret, because annotation of detected molecules is still a laborious process and metabolomics databases are not easily searchable (Dettmer et al., 2007). Molecular networking (Bandeira et al., 2007; Watrous et al., 2012), which compares the fragmentation patterns of individual molecules to show structural relationships, alleviates some of these challenges. This tool enables the investigation of structural relatedness within a metabolome and can be used as a dereplication strategy or for data visualization to investigate small molecule chemistry (Moree et al., 2012; Watrous et al., 2012; Nguyen et al., 2013; Sidebottom et al., 2013; Yang et al., 2013). Molecular networking has been used to study the specialized metabolome of P. aeruginosa, where it revealed chemical details of the antibiotics and quorum-sensing molecules it produces (Moree et al., 2012; Watrous et al., 2012; Nguyen et al., 2013).

In this study, a combination of 16S rRNA gene sequencing and metabolomics was used to analyze sputa from CF patients and non-CF volunteers. A molecular networking approach was used because of its ability to comprehensively assess the chemistry of microbial, host and xenobiotic metabolites in an untargeted manner. Although known microbial metabolites were rare in CF sputum, sphingolipids were highly abundant. P. aeruginosa specialized metabolites, including phenazines, rhamnolipids and quinolones, were detected, but their prevalence did not correspond to 16S rRNA gene sequence profiles or clinical cultures. Furthermore, the metabolites produced by P. aeruginosa in the lung were different from that produced in laboratory cultures. The abundant sphingolipids, particularly sphingomyelin, may be clinically relevant, as this molecule contains the inflammatory lipid ceramide, creating a reservoir of hyperinflammatory responses that may be damaging to the lungs.

Materials and methods

Additional methodological details for each section are available in the Supplementary information.

Sample collection

Seven CF samples and two non-CF samples were collected in October 2012 at the Adult CF Clinic at the University of California, San Diego Medical Center, and four non-CF samples were collected from volunteers at San Diego State University. These samples were brought to a 12-ml total volume in phosphate-buffered saline if the sample was less than 12 ml. An additional 27 sputa were collected during a 6-month period in 2013–2014 at the UCSD adult CF clinic during routine clinical visits, because a smaller sample volume was needed for only targeted MS analysis they were not diluted. All samples were collected with the same clinical sampling procedure. After a mouthwash to minimize oral contamination, samples were expectorated into a sterile sputum cup following hypertonic saline inhalation for 30 min. The initial CF and non-CF sputum samples were frozen on dry ice, whereas the 27 additional sputum samples were stored immediately in liquid nitrogen. All samples were collected in compliance with the University of California Institutional Review Board (HRPP 081500) and San Diego State University Institutional Review Board (SDSU IRB#2121) requirements and written consent was obtained.

Bacterial culturing and metabolite extraction

P. aeruginosa cultures were spot plated onto a single ISP2 plate and grown overnight. Extracts were taken by cutting out the agar around the colony and extracting in 200 μl of ethyl acetate by brief vortexing and incubation for 1 h. The top ethyl acetate layer was removed and dried in a centrifugal evaporator. The remaining sample was extracted in 200 μl of methanol by vortexing and incubation for 1 h. The methanol extract was then spun in a tabletop centrifuge at 10 000 g for 30 s. The methanol extract supernatant was then added to the dried ethyl acetate extract, dried down and frozen at −80 °C until mass spectrometry analysis. The same cultures were also grown in artificial sputum medium in thin glass capillary tubes simultaneously overnight according to the method developed in Quinn et al. (2014), which is meant to mimic the conditions of a CF bronchiole. Metabolites were extracted with the same solvent procedure, but for these samples the liquid media was removed from the capillary tube first with a syringe and then added to the ethyl acetate solvent.

Microbiome sequence profiling

The microbiome sequencing was performed according to Quinn et al. (2014) where the same CF sputum profiles are also published. Briefly, DNA was extracted from 100 μl of sputum obtained from a −80 °C aliquot immediately after thawing by directly pipetting the sample into 200 μl of the Trizol reagent (Life Technologies, Carlsbad, CA, USA). Extraction of DNA was done using the manufacturers protocol for DNA extraction. Total DNA was sent to the Genomics Core at Michigan State University for 16S rRNA gene sequencing. The V4 region (515F/806R) of the bacterial 16S rRNA gene was amplified by PCR (Caporaso et al., 2011). A standard Illumina MiSeq v2 reagent kit (San Diego, CA, USA) was used to prepare for paired end, 2 × 250 bp sequencing on the Illumina MiSeq format. Data processing, quality control and operational taxonomic unit clustering in mothur (Schloss et al., 2009) are described in Supplementary methods. Briefly, the Silva database (Quast et al., 2013) was used with Ribosomal Database Project taxonomy (Cole et al., 2005) to identify the taxonomy of operational taxonomic units at a cluster cutoff of 97%. Sequence profiles were rarefied to a minimum of 4000 sequence reads per sample, using the rrarefy command in the ‘vegan’ package in the R statistical software (Oksanen et al., 2015).

Sputum metabolite extractions and LC-MS/MS

All sputum samples were defrosted and 20 μl was extracted in 100 μl of LC-MS/MS grade ethyl acetate by brief vortexing and incubation for 1 h. The top ethyl acetate extracted layer was then removed and dried in a centrifugal evaporator. The remaining sample was then mixed with 100 μl of LC-MS/MS grade methanol by vortexing and incubation for 1 h, and centrifugation in a tabletop centrifuge at 10 000 g for 30 s. The methanol extract supernatant was then added to the dried ethyl acetate extract, dried down and frozen at −80 °C until tandem mass spectrometry (MS/MS) analysis. Liquid chromatography was performed with ThermoScientific UltraMate 3000 Dionex (Sunnyvale, CA, USA). Mass spectrometry was performed using a Bruker Daltonics Maxis qTOF mass spectrometer (Billerica, MA, USA) equipped with a standard electrospray ionization source. The mass spectrometer was tuned by infusion of Tuning Mix ES-TOF (Agilent Technologies, Santa Clara, CA, USA) at a 3 μl min−1 flow rate. For accurate mass measurements, lock mass internal calibration used a wick saturated with hexakis (1H,1H,3H-tetrafluoropropoxy) phosphazene ions (Synquest Laboratories, Alachua, FL, USA, m/z 922.0098) located within the source. High-performance liquid chromatography (HPLC)-MS/MS analysis was performed using Phenomenex (Torrance, CA, USA) Luna 5 μm C18(2) HPLC column (2.0 mm × 250 mm) on the initial seven CF samples and on the additional 27 samples for targeted analysis. A Phenomenex Kinetex 2.6 μm C18 (30 × 2.10 mm) ultra performance liquid chromatography (UPLC) column was used to obtain metabolomics data from the seven CF samples and six non-CF samples for statistical comparison of molecule abundances. Both analyses utilized a 20-μl injection volume. A linear water–acetonitrile gradient (from 98:2 to 2:98 water:acetonitrile) containing 0.1% formic acid was utilized (HPLC: 54 min gradient; UPLC: 14 min gradient). The flow rate was 0.2 ml min−1 for the HPLC analysis and 0.5 ml min−1 for the UPLC analysis. The mass spectrometer was operated in data-dependent positive ion mode, automatically switching between full-scan MS and MS/MS acquisitions for both the UPLC and HPLC analysis. Full-scan MS spectra (m/z 50–2000) were acquired in the TOF and the top 10 most intense ions in a particular scan were fragmented using collision induced dissociation at 35 eV for +1 ion and 25 eV for +2 ions in the collision cell.

Statistical analysis

Abundances of molecules were calculated using the MS1-based area under the curve and normalized for the original sample volume. Putative assignments of the source or annotation of a molecular feature were based on MS/MS data and analyzed with molecular networking. For random forests analysis, each area under the curve abundance in the UPLC metabolomes was normalized to the total abundance of all molecules detected to generate an abundance matrix. This matrix was imported to the R-Studio software package v0.97.318 (Boston, MA, USA) for statistical analysis. Shannon–Weiner indices were calculated based on the normalized abundance of each molecular feature or each bacterial genus using the ‘vegan’ package in R (Oksanen et al., 2015). A correlation between these two indices was then tested using Pearson’s r and statistical differences tested with the Student's t-test. A supervised random forests (5000 trees) was done on the abundance matrix using the R ‘randomForest’ package v4.6-7 (Boston, MA, USA). A variable importance plot (VIP) of the random forests was used to detect molecules enriched in CF compared with non-CF samples. Statistical significance between CF and non-CF metabolite abundances was then verified with the Wilcoxon rank-sum test after normalization for original sample volume.

Molecular networking

All LC-MS/MS data collected were converted into the .mzXML format using Bruker DataAnalysis software v4.1 (Billerica, MA, USA). Molecular networking was carried out as described in Bandeira et al. (2007), Watrous et al. (2012) and Yang et al. (2013) using our in-house bioinformatics workflow ( To detect P. aeruginosa metabolites, pure cultures of CF and non-CF isolates were grown on ISP2 media, and the media and colonies were extracted with the same solvents as the sputum samples and run with the same LC-MS/MS methods. Highlighting individual nodes by their source in Cystoscape (San Diego, CA, USA) allowed for assessment of the putative source of the metabolites detected. The pure culture data were seeded into the network of the sputa data, and networks were analyzed using the GNPS database and the Cytoscape software (Shannon et al., 2003). P. aeruginosa metabolites and sphingomyelin were identified by matching retention time and MS/MS spectra according to the metabolomics standard initiative level 1 annotation guidelines (Sumner et al., 2007), previously unidentified molecular relatives of these metabolites are considered as level 2 and all other microbial metabolites are unknown (level 4 compounds).


Initially, seven CF sputum samples were collected from patients with a varied clinical culture history to compare clinical culture results with metabolomics and 16S rRNA gene sequencing. Six non-CF sputa (obtained from healthy human individuals without CF disease using the same induced sputum method) were used for a statistical comparison to the CF metabolomic data from these original patients. Sputum samples from 27 additional CF patients were then collected and analyzed with targeted metabolomics and clinical culture for a more in-depth investigation of the relationships between metabolomics and P. aeruginosa culture results. The molecular abundances in this study were determined using area under the curve of the MS1 signal of each unique ion to provide accurate abundance information on molecules detected. Annotation and assignment of molecular sources were based on molecular networking of MS/MS spectra from sputum with known spectra in the GNPS mass spectrometry database (

Global chemistry of CF sputum

An HPLC-MS/MS approach was used for an initial chemical investigation of the seven CF sputa. Metabolomics data were also generated on bacterial CF isolates that were detected in the microbiomes, including a P. aeruginosa strain (PAnmFLR01), methicillin-sensitive Staphylococcus aureus (SaFLR01), methicillin-resistant S. aureus (MRSAFLR01), Stenotrophomonas maltophilia (SmFLR01), Escherichia coli (EcFLR01) and Streptococcus salivarius (SsFLR01). The sputum and microbial metabolomics data were co-networked together with stringent parameters using our in-house molecular networking bioinformatics tool GNPS to identify putative microbial metabolites by matching MS/MS spectra between the samples and with our spectral libraries (Figure 1, Supplementary Figure S1). Molecular networking is based on the property that molecules of similar structure fragment similarly resulting in similar MS/MS spectra. This information is visualized using nodes as a proxy for MS/MS spectra and the thickness of the edges representing the similarity between the nodes. For visualization of the CF sputum molecular network, the nodes are colored based upon the sample origin, and the ‘V’ shape indicates a match to the GNPS spectral libraries (Figure 1). In the global molecular network, 1352 unique MS/MS spectra were observed. Putatively annotated molecules included host lipids, amino acids, various xenobiotics (particularly drugs given to the patients) and P. aeruginosa specialized metabolites (Figure 1). A total of 2.5% of detected molecules could be annotated through GNPS, 3.8% matched to a microbial metabolome, the remaining 96.2% were only detected in CF sputum, and were therefore assigned as host or other microbial molecules (Figure 2a). Only 1.0% of molecules detected in the CF sputa matched a P. aeruginosa pure culture. A number of phosphoethanolamines were present in both CF sputum and bacterial cultures, indicating that this molecular family is shared between bacteria and the host (Figure 1).

Figure 1
figure 1

Molecular network of CF sputa and bacterial isolate HPLC-MS/MS metabolomic data generated on GNPS. The network was visualized using the Cytoscape software. Each node represents a unique spectrum that was detected at least twice in the data set and is colored by its sample of origin according to the legend. Nodes that were detected in multiple bacteria as well as a sputum sample are considered together regardless of which bacteria were represented. Bacterial-only nodes are colored gray and were not considered for subsequent sputum statistical analysis and ISP2 media blank nodes are colored black to be ignored as a background. Nodes that hit to the GNPS database based on the molecular networking algorithm (Watrous et al., 2012) are shaped as a ‘V’ and circular nodes are those that are not known in GNPS. Molecular families and identified molecules are highlighted. *Singleton nodes are those that did not have any molecular relatives and are not shown in the network.

Figure 2
figure 2

(a) Bar chart of the distribution of the detected molecules based on the network mapping. Molecules are assigned to a particular bacterium when a spectrum was detected in sputum and that bacterium only. Shared molecules between multiple bacteria are also shown. (b) Clinical culture results and 16S rRNA microbiome profiles of the seven sequenced CF sputum samples in this study. Microbiome profiles were generated using 16S rRNA gene amplicon sequencing and operational taxonomic unit clustering using Mothur, 19 of those most abundant are shown here while the rest are clustered into ‘other’ as they were of very low abundance.

Microbiome and metabolome diversity

Clinical culture results and 16S rRNA gene amplicons were sequenced to generate microbial profiles of the initial seven CF patients in this study. The microbiome profiles differed between all patients and did not coincide with culturing results (Figure 2b, Table 1). Two of the patient microbiomes were dominated by Rothia spp. and Streptococcus spp. (CF10, CF11), two were dominated by Pseudomonas spp. (CF9 and CF13) and the other three were unique (CF1, CF6 and CF12, Figure 2b). Pseudomonas spp. constituted >5% of the 16S rRNA gene sequence reads in only three of the patients (CF1, CF9 and CF13, Table 1). As the initial patients in this study were selected based on varied culture results, we investigated the literature to assess how often P. aeruginosa was the dominant bacterium in microbiome profiles without this selection (Supplementary Table S1). From 18 studies comprising 393 patients, 60% of patients had P. aeruginosa as the most relatively abundant bacterium (Supplementary Table S1). This demonstrated that while P. aeruginosa is most often the dominant pathogen in adult CF patients, other bacteria are dominant in 40% of individuals. To compare microbiome and metabolome diversity, Shannon–Weiner indices were calculated on the abundance matrices from both data types on the same samples. The CF metabolomes were significantly more diverse than non-CF (Shannon–Weiner index, one-tailed Student's t-test; P=0.00003, df=13), but the 16S rRNA gene profiles produced the opposite trend (P=0.0010) (Supplementary Figure S2). There was no significant Pearson's correlation between microbial diversity and the metabolome diversity in either CF or non-CF populations (Supplementary Table S2, r=−0.12, and r=0.43, respectively).

Table 1 Clinical culture history and 16S rRNA gene profile results of CF and non-CF sputa

P. aeruginosa specialized metabolites in sputa

P. aeruginosa metabolites were detected in four of the seven initial CF patients within the global molecular network (Figure 1, Table 2). The specific chemistry of these metabolites was then analyzed by networking individual patients with P. aeruginosa data alone (Figure 3, Table 2, Supplementary Figure S3). Patient CF9 had the largest relative abundance of P. aeruginosa in the microbiome (75.0%, Table 1), and quinolones were detected in this patient’s metabolome; however, the overall abundance of these molecules was only 0.03% of the total metabolome. The most abundant specialized metabolite in CF9 was 2-heptyl-4-quinolone (HHQ), which made up 0.02% of the total counted ions. For comparison, the most abundant host metabolite in CF9 (sphingomyelin) was 1.1% and the most abundant xenobiotic (azithromycin) was 2.1%. This demonstrated that while detectable in sputum, the P. aeruginosa specialized metabolites were at a low relative abundance compared with other molecules, even in a patient dominated by this bacterium. A total of six unique quinolones were detected in patient CF9 (Figure 3, Table 2), including the two base quinolones: 2-heptyl-4-hydroxyquinolone-N-oxide (HQNO, m/z 260.19) and HHQ (m/z 244.17). The remaining six quinolones were structural derivatives of these molecules, differing only in the length and unsaturation of the aliphatic chain (Figure 3). The pseudomonas quinolone signal (PQS, 2-heptyl-3-hydroxy 4-quinolone) was not detected, even though its precursor HHQ was the most abundant quinolone in CF9 (Figure 3). To further investigate the chemistry and prevalence of P. aeruginosa metabolites, a targeted analysis was performed on the additional 27 CF sputa and compared with clinical culture results. Fourteen of these additional samples were positive by P. aeruginosa targeted metabolomics. Six of these samples were positive for P. aeruginosa culture, but negative by metabolite detection, and four contained a P. aeruginosa metabolite, but were negative by culture results (Table 3). Incorporating the initial seven samples, 50% of the sputa that were negative by P. aeruginosa clinical culture (n=12) contained at least one specialized metabolite from this bacterium. HHQ was detected in four samples, 2-nonyl-4-quinolone in eleven and both molecules were found in three (Table 3). Similar to CF9, PQS and C9-PQS were not detected in any of these sputum samples even when HHQ was present.

Table 2 Chemical characteristics and abundances of P. aeruginosa specialized metabolites detected in CF sputum
Figure 3
figure 3

Molecular clusters of P. aeruginosa quinolones, phenazines and rhamnolipids. The metabolomes of CF9 and CF6 were networked separately with PAnmFLR01 to explore the specific chemistry of these molecules in the samples that they were detected in. The rhamnolipid cluster is from the network in Figure 1, because they were detected in multiple samples. Nodes are colored and shaped by their sample origin as indicated in the legend. The name and chemical structure of each detected P. aeruginosa metabolite in sputum is shown.

Table 3 Extracted ion abundance of P. aeruginosa specialized metabolites in the additional 27 sputum samples and their P. aeruginosa clinical culture results

To test whether the chemical nature of the P. aeruginosa quinolone production in sputum was specific to growth in the CF lung environment, three different CF strains of P. aeruginosa were grown in a CF mucus bronchiole model that mimics lung conditions (WinCF model) (Quinn et al., 2014) and compared with metabolite production when grown on ISP2 aerobic media. The quinolone production in the WinCF model matched that of the sputum, where the isolates produced abundant HHQ, HQNO and NQNO (C9-HQNO), but not PQS or C9-PQS (Supplementary Figure S4). However, PQS and C9-PQS were produced when P. aeruginosa was grown on the aerobic ISP2 media (Supplementary Figure S4).

Phenazines were detected in patient CF6 (Figure 3, Table 2) and in three of the additional twenty-seven samples (Table 3). These known phenazines included 1-hydroxyphenazine (1-HP), phenazine-1-carboxylic acid and a variant of pyocyanin (PYO) (Figure 3, Supplementary Figure S3). Additional putative phenazines were also detected in the raw data including a phenazine with the same exact mass and similar MS/MS fragmentation pattern to 1-HP (m/z 197.07), but a different retention time when compared with P. aeruginosa pure culture (Supplementary Figure S3). This indicated that this metabolite was likely an analog of 1-HP, probably differing in the position of the hydroxyl group. The PYO variant shared the same mass as PYO, but both the fragmentation pattern and retention time differed (Supplementary Figure S3). The similarity of MS/MS fragmentation between the detected phenazine and PYO suggests that their structures were closely related.

Despite the diversity of rhamnolipids produced by P. aeruginosa in culture, only the rhamnolipid Rha-Rha-C10-C10 (m/z 673.37 ([M+Na]+)) was found in CF sputum (Figure 3). This molecule was detected in four of the original CF samples (Figure 3,Supplementary Figure S3, Table 2), two of which were negative by clinical culture (Supplementary Table S3) and three of the additional twenty-seven samples (Table 3). The peak intensity of Rha-Rha-C10-C10 was highest in sputum from patient CF9, who had the highest proportion of P. aeruginosa in the microbiome (Table 1). Even though Rha-Rha-C10-C10 was detected, the microbiome of patients CF6, CF10 and CF11 contained less than 2% P. aeruginosa.

Xenobiotics in CF sputa

A number of exogenous metabolites were identified in the original seven CF sputa, including antibiotics, antifungals and antidepressants (Figure 1,Supplementary Figure S5). Caffeine was detected in four CF samples (CF1, CF6, CF11 and CF13) and one non-CF sample (H1) (Figure 4). The antibiotic azithromycin was detected in prescribed patients (CF1, CF9 and CF13) and aztreonam (Figure 4) was only detected in CF9; although several other patients were prescribed the medication (Supplementary Table S3). In addition, the antibiotics sulfamethoxazole and trimethoprim (Figure 4) were detected in several patients (CF1, CF6, CF9 and CF10) even though these drugs were not known to have been administered recently. The antifungal itraconazole was detected in patient CF9 (Figure 4). Antidepressants citalopram and amitriptyline were detected in CF6 and CF13, respectively (Figure 4). Metabolism and transformations of these xenobiotics were also observed, including loss of a sugar moiety of azithromycin, desulfated aztreonam, acetylated sulfamethoxazole, and demethylation and/or hydroxylation of caffeine, citalopram, amitriptyline and itraconazole (Figure 4). There was a positive correlation between levels of azithromycin in sputum with levels of P. aeruginosa reads and the P. aeruginosa rhamnolipid Rha-Rha-C10-C10 (one-tailed test of Pearson’s r=0.72 and r=0.89 respectively, P<0.05).

Figure 4
figure 4

Molecular network highlighting xenobiotic metabolites detected through GNPS library searching in CF samples (green nodes). Successfully annotated xenobiotic molecular families and associated chemical structures are highlighted including nodes representing particular transformations and breakdown products.

Metabolites that distinguish CF from non-CF

After assessment of the overall chemistry and putative source assignments of the molecules detected, an UPLC-MS/MS method was used to generate metabolomic data on the initial seven CF sputa and an additional six non-CF sputa for statistical comparisons between these two groups. A random forest statistical approach was used to identify which molecules best differentiated the CF from non-CF metabolomes based on their MS1 abundances. Although this represents a relatively small sample size, our statistical approach reveals molecules that are drastically different in abundance between CF and non-CF lending the findings more broadly applicable. Thirty differential molecular features were identified in the random forests VIP as significantly different between CF and non-CF (Supplementary Table S4,Supplementary Figure S6). The source of these molecules was then determined by matching their parent mass in the molecular network and determining which group they were from xenobiotics, CF only, non-CF only, shared or matching a bacterium. None of the distinguishing molecules were known to be bacterial, their source was CF only (18/30), non-CF only (1/30) or both (11/30, Supplementary Table S4). This indicated that the most differential molecular features between CF and non-CF were molecules highly abundant in CF sputa, likely of host origin, as they did not match a microbial culture metabolome. However, it is important to note that different conditions of microbial culture may result in variable metabolite production.

Of the 30 molecular features identified in the VIP, many yielded an MS/MS fragmentation peak of m/z 184.07 which corresponded to phosphocholine (Supplementary Figure S6). This fragmentation pattern indicated that these differential metabolites were phospholipids, particularly the sphingolipids, which are rare in bacteria and fungi (Olsen and Jantzen, 2001; Wieland Brown et al., 2013). One particularly differential molecule (m/z 703.58), highly abundant in CF, was annotated as the lipid sphingomyelin (18:1/16:0) (Supplementary Figure S7). Three additional sphingomyelins were detected in the VIP (m/z 675.54, m/z 689.56 and m/z 701.56). The sphingomyelin molecular cluster in the CF/non-CF molecular network visualized these sphingomyelins and a number of other related molecules not identified in the VIP (Figure 5a). Manual annotation of the MS/MS spectra of these molecules, in conjunction with searches of the LipidMaps database (Sud et al., 2006), putatively annotated them as related sphingomyelins. All putative sphingomyelins detected in the molecular cluster in Figure 5a were significantly elevated in CF vs non-CF sputa according to the Wilcoxon rank-sum test (two-tailed), including d18:1/16:0 (m/z 703.58, P=0.002), d18:1/14:0 (m/z 675.54, P=0.002), d18:1/15:0 (m/z 689.56, P=0.01) and d18:1/16:1 (m/z 701.56, P=0.003, df=13, Figure 5a). The glycosphingolipids, tetraglycoceramide (d18:1/16:0, m/z 1227.76) and lactosylceramide (d18:1/16:0, m/z 862.62) were also identified with the random forests VIP and significantly more abundant in CF sputa (P=0.009 and 0.02, respectively, Supplementary Figure S6,Supplementary Table S4). To determine whether sphingolipid forms were especially abundant as opposed to other phospholipids, the area under the curves of the C18 lipids ceramide (18:1/16:0), sphingomyelin (18:1/16:0), and diacylglycerophosphocholine (18:1/16:0) were compared. Significant differences in abundance were only found for the C18 sphingolipids ceramide (P=0.002) and sphingomyelin (P=0.004), but not for the C18 diacylglycerophosphocholine (P=0.1, Figure 5b, Wilcoxon rank-sum test). The sphingolipids were not significantly correlated with any particular bacterial genus.

Figure 5
figure 5

(a) The molecular network cluster of sphingolipids and related molecules in sputum data. Node size was scaled according to their abundances using the Cytoscape software. Nodes identified in the random forests VIP are shown as squares, and round nodes were not identified by random forests. Annotation of sphingomyelin (d18:1/16:0) was verified with a purchased standard and its chemical structure is shown. Putative annotations of other nodes in the cluster are according to the LipidMaps database and guided by MS/MS fragmentation spectra. Abundances of the metabolites in the cluster are shown using boxplots from CF and non-CF area under the curve data with attention to the most abundant molecules. (b) Area under the curve abundances of ceramide (Cer), sphingomyelin (SM) and C18 phosphocholine (PC) in CF and non-CF sputa. Statistically significant differences were determined using a Wilcoxon rank-sum test.


This study investigated the global chemistry of CF sputum comparing findings to clinical culture history and microbial 16S rRNA gene profiles. We found that sputum is a diverse sample containing molecules from microbial, host and xenobiotic sources. What was unexpected, however, was that microbial molecules were not particularly abundant and the microbiome diversity was not correlated with metabolome diversity. The most abundant molecules in CF sputum were sphingolipids and antibiotics. Sphingolipids were significantly more abundant in CF sputum than in non-CF sputum, indicating that they may accumulate in the lungs of CF patients. This discrepancy between microbial and metabolite diversity is likely due to the high abundance and diversity of host metabolites coming from inflammatory cells in sputum (Macher and Klock, 1980; Dechecchi et al., 2011).

Clinical culturing, 16S rRNA gene profiling and metabolite-based detection were not congruent for the diagnosis of a P. aeruginosa infection (Tables 1 and 2). It was found that patients without a known P. aeruginosa infection contained the bacterium’s molecules in their sputum and vice versa. For example, patient CF13 had abundant reads (67.9%) mapped to P. aeruginosa in the sputum microbiome, but was clinically classified as being infected by MRSA, and LC-MS/MS did not detect any P. aeruginosa specialized metabolites. Similarly, CF1 had abundant P. aeruginosa reads in the 16S rRNA microbiome profile and was classified as a Stenotrophomonas maltophilia/P. aeruginosa infection by culture, but P. aeruginosa metabolites were absent. Three patients had low amounts of P. aeruginosa in their microbiome, but LC-MS/MS detected the rhamnolipid Rha-Rha-C10-C10. Patient CF9, however, had signatures of P. aeruginosa using all three methods of detection. In light of these results, we further investigated the presence of P. aeruginosa molecules in another 27 sputum samples and found similar discrepancies with clinical culture. Of the samples tested, 21% were culture positive, but negative for P. aeruginosa metabolites. Thus, clinical culture and microbiome results do not reflect the active production of P. aeruginosa small molecules. Rhamnolipids, for example, may be produced by P. aeruginosa active metabolism despite its overall low abundance or ability to be cultured on selective media. Identification of actively growing bacteria is of particular interest for antibiotic treatment, as antibiotics are most effective against actively growing cells (Hu and Coates, 2012). Clinical decisions should take into account the potential that culture identifications are not necessarily reflective of active growth and metabolite production of a particular bacterium.

The molecules produced by P. aeruginosa grown in culture did not match those produced in CF sputum, demonstrating that the bacterium’s specialized metabolite production is unique in the lung environment. For example, in patient CF9, quinolones were detected, but PQS, the best-studied P. aeruginosa quinolone, was completely absent even though it was readily detected from cultured cells under the same LC-MS/MS conditions. The most abundant quinolone detected was HHQ, the biosynthetic precursor of PQS. This phenomenon was verified in another 27 sputum samples by targeted quinolone analysis. We hypothesize that the absence of PQS in the sputa is due to the high concentration of iron (Stites et al., 1998; Ghio et al., 2012) and low concentration of oxygen (Schertzer et al., 2010), which has been shown to inhibit PQS production (Bredenbruch et al., 2006; Schertzer et al., 2010). Growing P. aeruginosa in an environment mimicking the CF lung reproduced the specific quinolone production, indicating that the chemical conditions of a mucus-plugged bronchiole are likely responsible for the specific quinolone chemistry detected in sputum. The presence of HHQ and HQNO suggests that the historical research focus on PQS may not be relevant in the lung environment (Collier et al., 2002; Bredenbruch et al., 2006; Häussler and Becker, 2008), these other quinolones may be more important to the bacterium in vivo. A similar phenomenon has been observed with P. aeruginosa homoserine lactone quorum-sensing molecules, which also implicated the environment of the CF lung as the driver of the unexpected chemistry (Singh et al., 2000). Close molecular relatives of phenazines were detected in CF6, indicating that those that are produced in the lung may include previously unrecognized molecules. Rhamnolipid production also varied, where 12 specific rhamnolipids were produced by cultured P. aeruginosa cells, but only 1 detected in sputum. The disparity between the specific chemistry of P. aeruginosa small molecules in culture and in vivo is likely due to differential abundance of the bacterium, clonal adaptation and hyper-mutation affecting its phenotype, and the effects of the complex lung environment on the bacterium’s physiology (Wilder et al., 2009; Hogardt and Heesemann, 2010; Behrends et al., 2013; Workentine et al., 2013). This study demonstrates the importance of growing the bacterium in an environment better mimicking the CF lung, an approach that has previously revealed important aspects of P. aeruginosa physiology as a CF pathogen (Palmer et al., 2005, 2007; Sriramulu et al., 2005; Fung et al., 2010; Hare et al., 2012; Quinn et al., 2014; Turner et al., 2015).

The most differential metabolites between CF and non-CF sputa were sphingolipids. Sphingolipids are rare in bacteria and fungi and therefore likely host derived. However, some microbes have been documented to produce them, including a bacterium detected in our microbiome profiles (Prevotella) (Olsen and Jantzen, 2001; Wieland Brown et al., 2013). Molecular networking revealed that there was a diverse complement of sphingolipids highly abundant in CF lungs including various sphingomyelins, lactosylceramide, tetraglycoceramide, ceramides and sphingosine. Hydrolysis of the hydrophilic head group of sphingolipids produces ceramide, which induces strong inflammatory cascades (Brodlie et al., 2011). Thus, the abundance of sphingomyelins found in this study represents a large reservoir of ceramide induced inflammatory signaling that could be harmful to the lungs. In support of these findings, elevated levels of ceramide have been measured in the epithelial membrane of CF mice and late-stage CF patients (Teichgräber et al., 2008; Brodlie et al., 2011; Ziobro et al., 2013). Recent studies suggest that CFTR is directly involved in the regulation of ceramide metabolism, because CFTR knockouts accumulate ceramide in the epithelium (Bodas et al., 2011; Ziobro et al., 2013). Our findings support previous studies that have focused on the human enzyme acid sphingomyelinase, which converts sphingomyelin to ceramide and is activated by microbial products and inflammatory cytokines (Sakata et al., 2007). P. aeruginosa also produces a phospholipase that can act as a sphingomyelinase (Truan et al., 2013). These enzymes have been suggested as a target for novel therapies to reduce lung inflammation (Teichgräber et al., 2008; Ziobro et al., 2013), because their mechanistic action can contribute to the harmful hyperinflammation characteristic of CF disease. In addition, sphingolipids have been shown to have a CFTR-dependent role in vasoconstriction through hypoxia signaling (Tabeling et al., 2015), a potentially important link between sphingolipids and the host response to the hypoxic CF lung that drives microbial physiology (Quinn et al., 2014). Sphingolipid accumulation is not specific to CF disease, it has also been observed in other inflammatory lung diseases, such as asthma (Ammit et al., 2001), emphysema (Petrache et al., 2005) and chronic obstructive pulmonary disorder (Telenga et al., 2014). Regardless of its disease specificity or source of production, the exceptionally high abundance of these molecules in CF sputum further supports that targeting sphingomyelinases can be a potentially efficacious treatment for CF.

The molecular networking approach in this study revealed that CF sputum is a heterogeneous milieu of host, microbial and xenobiotic chemical products. Xenobiotics were in high relative abundance in sputa. Common drugs used to treat CF patients were identified including antibiotics, antifungals and antidepressants. We also observed metabolism of these drugs, including desulfation, methylation, hydroxylation, acetylation and removal of sugars. The ability of molecular networking to visualize the chemistry and metabolism of drugs in the same sample along with microbial products can inform physicians about which microbes are active in a particular sample and their antibiotic resistance mechanisms. This is important knowledge and can influence subsequent treatment regimes. Furthermore, this analysis can be done on a patient-by-patient basis to allow for precision care. Metabolomics studies are often limited by the need to analyze a particular targeted group of compounds. Application of molecular networking to metabolomics greatly increases the comprehensiveness and ability to study the effects of small molecules on host and microbial physiology together with one method.


This study described the chemical makeup of CF sputum and found that the most abundant molecules were sphingolipids. Drugs targeting the conversion of sphingolipids to ceramide, such as amitriptyline, already prescribed for some CF patients (Riethmüller et al., 2009) and detected in this study, may be efficacious in reducing the overall hyperinflammation across individuals. There was a marked discrepancy in 16S rRNA gene sequencing, clinical culture and metabolomics results for the detection of P. aeruginosa in a sputum sample. Clinical decisions for antibiotic use are currently based on results from routine microbiological culture on selective media. This study indicates that treatment decisions based on culture results need to consider that culture-based detection of a bacterium does not imply that it is actively growing and producing potentially damaging metabolites in the patient. Metabolomics methods can detect microbial metabolites unique to particular bacteria or groups of bacteria and could aid clinical decisions by providing evidence that a particular pathogen is active in a clinical sample.