Cystic fibrosis (CF) is characterized by recurrent airway infection, inflammation and progressive decline in lung function. Infection with key organisms such as Pseudomonas aeruginosa and Burkholderia cepacia complex (Bcc) has been associated with the frequent exacerbations of airway dysfunction and progressive functional decline that are the hallmarks of the disease1,2,3.

Over the last decade, cross-sectional and longitudinal studies of bacterial taxa using culture-independent microbial detection methods such as terminal restriction fragment length polymorphism (T-RFLP) and 16S rRNA gene sequencing, have identified polymicrobial communities in the airways of CF patients that exceed the complexity captured by traditional culture4,5,6,7,8,9,10,11,12,13. Significant heterogeneity in bacterial community composition has been demonstrated within groups of clinically similar patients5,6,7. Although the abundance of individual taxa (such as Gemella spp. and the Streptococcus anginosus group), ecological diversity and community stability are potentially associated with disease status8,10,14, no single static or dynamic metric or bacterial taxon has emerged that consistently explains the heterogeneity of disease within otherwise similar hosts.

Culture-based methods have demonstrated the clinical significance of infection with pathogens such as P. aeruginosa. Molecular methods have demonstrated that not only their presence or absence, but also their relative abundance, dominance over other taxa and temporal stability are potentially important determinants of ecological and clinical change in lung disease10,15. While studies of the CF microbiome to date have largely focused on adult patients, we expand this base of knowledge by including 76 pediatric patients less than 18 years old (28% of the total sample). This early time period is particularly interesting from the perspective of CF microbiology since community composition is more dynamic in pediatric patients than in adults16,17. Here, we report a cross-sectional survey of the bacterial community in sputum samples from 269 CF patients spanning a 60-year age range and a wide range of lung function.


Clinical and microbiological characteristics of patients

During the study period, 76 pediatric (<18 years old) and 193 adult CF patients provided at least one expectorated sputum sample for analysis. Complete clinical data was available for 73 pediatric and 191 adult patients. The characteristics of these patients are reported in Table 1. Pancreatic insufficiency was more common in pediatric patients (P < 0.001). A larger proportion of adult patients had intermediate (FEV1 < 70% predicted) or advanced (FEV1 < 40% predicted) disease (68.6% vs 44.3% of pediatric patients, P = 0.001) and the median FEV1 was lower in the adult cohort, reflecting the progression of disease with increasing age (P < 0.001). Adult patients were more likely than pediatric patients to be infected by Bcc or Aspergillus species by sputum culture (P = 0.037, 0.011 respectively), with a trend towards increased infection with P. aeruginosa (P = 0.085).

Table 1 Study participant summary by age category.

Individual sample and genus characteristics

After quality and abundance filtering, 87 genera were identified across all samples. The size of the core microbiota of adult and pediatric samples, arbitrarily defined in our study as genera with a relative abundance of ≥1% of sequence in ≥50% of samples, were similar in both groups (Fig. 1A, Table 2). Streptococcus, Prevotella, Rothia, Veillonella and Actinomyces were core genera in both groups, while Neisseria, Haemophilus and Gemella were unique core genera in pediatric samples and Pseudomonas was unique to the adult core microbiota. In both groups, the genera with the greatest prevalence also had the greatest mean relative abundance (Fig. 1B).

Table 2 Dominance proportion of selected genera across specimens.
Figure 1
figure 1

Genus level summary of bacterial population structure in cystic fibrosis (CF) in pediatric and adult sputum samples by 16S rRNA gene sequencing, after exclusion of operational taxonomic units that account for less than 0.005% sequence in the entire dataset. (A) Number of genera in pediatric (open diamonds) and adult (filled diamonds) specimens plotted by prevalence. Dashed (pediatric) and solid (adult) lines indicate core microbiota (genus present in >50% of samples). (B) Mean relative abundance of genera in pediatric (open diamonds) and adult (filled diamonds) specimens plotted by prevalence (C and D) Cumulative relative abundance of genera within a sputum sample of pediatric (C) and adult (D) CF patients plotted by number of genera. Error bars indicate range of values and vertical dashed lines indicate the number of OTUs accounting for 99% of total sequence.

A small subset of genera comprised most of the bacterial sequences detected in any sample (Fig. 1C and D, Table 3). The median number of genera required to account for half of the sequences in each sample was two in both groups, while a median of 20 and 13 genera accounted for 99% of the sequence in pediatric and adult patients, respectively (Table 3).

Table 3 Relative sequence abundance by number of genera in each patient.

A dominant genus (the most abundant genus with at least twice the abundance of the second most abundant genus) was present in 45% of pediatric samples and 57% of adult samples. Streptococcus, Haemophilus, Pseudomonas, Staphylococcus and Achromobacter were dominant in multiple pediatric samples in descending order of frequency, while Neisseria and Stenotrophomonas were each dominant in one pediatric sample. Pseudomonas, Burkholderia, Streptococcus, Haemophilus and Staphylococcus were the dominant genus in descending order of frequency in adults (Table 2). The median relative abundance of the most abundant genus in each sample was 0.38 in pediatric samples (IQR 0.29-0.50) and 0.46 in adults (IQR 0.32-0.71), the median relative abundance of dominant genera was 0.64 (range 0.26-0.99).

Rothia, Prevotella, Gemella, Actinomyces and Veillonella were prevalent in both pediatric and adult samples, but were seldom or never the dominant genus. Pseudomonas was more prevalent in adults but tended to be dominant in both pediatric and adult samples in which it was present (Table 2). Streptococcus, Burkholderia, Achromobacter, Stenotrophomonas and Haemophilus were commonly the dominant genus in samples in which they were present. Bordetella accounted for 95% of the sequence in one adult specimen from a patient identified by culture as being chronically infected by Bordetella avium, but was not present in any other samples. In addition to having a high dominance proportion (the number of samples in which the genus was dominant divided by the total number of samples in which it was present with >1% relative abundance), Pseudomonas and Burkholderia, when dominant, accounted for a larger proportion of the community than Streptococcus in Streptococcus-dominant samples (72.8%, 78.2% and 47.1% of sequence in adult samples respectively, P < 0.0005). Only five genera (Pseudomonas, Burkholderia, Stenotrophomonas, Haemophilus and Bordetella) comprised more than 90% of any sample and only in samples from adult patients.

Within sample (alpha) diversity

Advanced disease stage was associated with lower diversity in adult patients (P = 0.004, Fig. 2A). This difference was not statistically significant in pediatric patients; however, only 6 pediatric patients had advanced disease. Baseline samples were more diverse than treatment samples for adults (P = 0.002, Fig. 2B). There was a stepwise, statistically significant decrease in diversity associated with increased number of antibiotics prescribed in both cohorts (Fig. 2C). Streptococcus dominance was associated with higher community diversity in both datasets (Fig. 2D).

Figure 2
figure 2

Within-sample (alpha) diversity of sputum specimens from CF patients. Boxplots represent minimum-25th centile-median-75th centile-maximum values and markers indicate individual patients. Open markers indicate pediatric patients and filled markers indicate adult patients. Alpha diversity was summarized using the Shannon diversity index, with diversity increasing on the y axis. Samples were grouped and compared by (A) disease stage (B) clinical status, (C) number of antibiotics being administered at the time of sample collection and (D) dominant genus. P-values represent the results of t-tests between groups, within pediatric and adult cohorts. Stage, clinical status, treatment groups and dominant genus are defined in the Methods section. Abbreviations: Int.= intermediate, Adv.= advanced, Base.= baseline, Exacer. = exacerbation, Recov. = recovery, Pseudo. = Pseudomonas, Haem. = Haemophilus, Strep. = Streptococcus, Burk. = Burkholderia, None = No dominant genus.

Between sample (beta) diversity

We assessed inter-sample variability in community structure (beta diversity) using unsupervised principal coordinates analysis (PCoA) of Bray-Curtis dissimilarity (Fig. 3). The relative abundance of Pseudomonas, Burkholderia, Streptococcus and Haemophilus largely drove ordination (Fig. 3). Samples did not cluster by clinical status at the time of sample acquisition (baseline, exacerbation, treatment, recovery or other). Notably, Bray-Curtis ordination was influenced by disease stage differently across the age groups, with clustering based on disease stage becoming more apparent after age 30 (Fig. 4). This analysis did not reveal any grouping of bacterial community structures by gender, CFTR genotype, pancreatic status, FEV1, FVC or BMI.

Figure 3
figure 3

Between-sample (beta) diversity of sputum specimens from CF patients. Bray-Curtis dissimilarity principal coordinates analysis (PCoA) was used to generate ordination of beta-diversity in two dimensions. Principal coordinates 1 and 2 (PC1 and PC2) explain 23.2% and 9.5% of the variance in Bray-Curtis dissimilarity respectively (x and y axes). Increasing distance in two-dimensional space represents increasingly dissimilar community structures. The marker style indicates samples were obtained from patients at clinical baseline, exacerbation, treatment, recovery or other as defined in the Methods section. Samples are colored according to most abundant genus. Colored arrows in the plot indicate Pearson correlation coefficients between genus relative abundance and each principal coordinate, with the line colors corresponding to the genus color in the legend.

Figure 4
figure 4

PCoA by Bray-Curtis dissimilarity was plotted by age group as indicated in the upper right corner of each panel. Markers represent individual patients and marker style indicates disease stage at the time of sample collection. Principal coordinates 1 and 2 correlate with relative abundance of Pseudomonas and Burkholderia, respectively as in Fig. 3.

Age group and disease status associated bacterial community changes

Patients were stratified into age groups and compared based on lung function, community diversity and dominant genera (Fig. 5). As expected, FEV1 was lower in older age groups, with the lowest median FEV1 in patients aged 30-34 years (Fig. 5A). Bacterial density (log10 ng/μL of 16S rRNA gene by quantitative PCR) was greatest at ages 20-24 (Fig. 5B). Diversity was also lower in older age groups and differences in community diversity paralleled lung function across age groups (diversity nadir 30-34 years, Fig. 5C). In addition, the lower lung function and diversity observed in older age groups coincided with a higher proportion of samples dominated by Pseudomonas or Burkholderia and a decreased proportion of Streptococcus dominance and samples without a dominant genus (Fig. 5D). If dichotomized at age 25, Streptococcus dominance and no dominant genus account for 69% of the samples from younger patients, but only 43% of samples from older patients, whereas Pseudomonas and Burkholderia dominant samples comprise only 14% of samples from patients under 25 years but 49% of cases from patients 25 years or older (Chi-squared of dominant taxa in <25 or ≥25 year old patients, P < 0.0005).

Figure 5
figure 5

(A) Age dependent changes in lung function (FEV1% predicted), (B) bacterial load (ng/μL 16S rRNA), (C) alpha diversity (Shannon diversity index) and (D) dominant genus (defined in Methods) demonstrate concordant changes in lung function, diversity and increased prevalence of Pseudomonas and Burkholderia with increasing age. Chi-squared significance for panel D is P < 0.0005.

Predictors of FEV1 by multivariate linear regression

Given the observed correlation of bacterial diversity, density and dominant taxa for lung function across age groups, we performed multivariate linear regression to ascertain which features of the lung microbiota independently predict lung function while controlling for the influence of host factors. Factors with known significance and potential confounders present in our dataset were entered into the multivariate model prior to sequentially testing diversity (SDI), bacterial density (log10 16s rRNA gene by quantitative PCR) and dominant taxa based on the strategy outlined in the methods section. The final model included age, BMI z-score, pancreatic insufficiency, SDI and Pseudomonas dominance (adjusted R2 = 0.315, df = 5, F = 22.9, P < 0.0005, Table 4). Increasing age, pancreatic insufficiency, increasing number of antibiotics and Pseudomonas dominance negatively correlated with lung function, while increasing BMI and diversity positively correlated with lung function in our model.

Table 4 Multivariate regression model for predicting lung function (FEV1 % predicted).


We describe the bacterial community structure of sputum samples from 269 cystic fibrosis patients across a broad range of age, disease stage, clinical status and treatment using a culture-independent method. This is the largest cohort of CF patients spanning both adult and pediatric populations for whom sputum has been analysed by 16S rRNA gene sequencing yet published and includes a large number of patients younger than 18 years of age. We observed several clear features of bacterial community structure in CF sputum and their relationship with disease states.

Firstly, the airways of CF patients share only a small number of common taxa, but otherwise demonstrate a high degree of inter-individual variability in community structure, composition and diversity, confirming the results of smaller studies5,6,7. In pediatric and adult CF patients, the shared core microbiome was small and consisted largely of genera not considered to be traditional CF pathogens, namely Streptococcus, Rothia, Veillonella and Actinomyces. Pseudomonas, a prototypic CF pathogen, was a member of only the adult core microbiota. A small number of taxa accounted for the bulk of bacterial sequences present in any sample and approximately half of all patients had a bacterial community dominated by a single genus.

Secondly, the observed taxa in the CF lung vary in their relative abundance and dominance of the bacterial community. Only six genera had a dominance proportion >0.10 and a maximum relative abundance >0.70: Pseudomonas, Burkholderia, Stenotrophomonas, Achromobacter, Haemophilus and Bordetella. Of the remaining genera, only Staphylococcus, Streptococcus and Bifidobacterium were observed at a relative abundance of at least 0.50 in one or more samples.

In addition, whereas within sample diversity correlated with several patient, treatment and microbial factors, between patient diversity was largely driven by the age-associated relative abundance of Pseudomonas and Burkholderia. Age, disease stage, treatment and dominant genus were found to significantly correlate with differences in alpha diversity, confirming the observations of previous published reports of smaller datasets8,11,12,13,16,17,18,19,20. The largest differences between patient communities noted in our study were the greater prevalence and relative abundance of Pseudomonas and Burkholderia in older age groups. As patients with CF age, their risk of acquisition of these organisms increases17,21 and we observed a stepwise progression in the prevalence and relative abundance of these two important genera with increasing age. A notable increase in Pseudomonas and Burkholderia dominance was observed in patients aged 25 and older, which coincided with the lowest lung function and sample diversity observed in any age group. Beyond this age, there are few differences in overall community structure, diversity or lung function in surviving adult patients and it is possible that the establishment of these CF associated pathogens is largely complete by age 20-25. These findings corroborate the findings of smaller cross-sectional studies of CF microbiota over a large age range, including the ‘plateau’ of community and functional change at approximately 25 years of age17,22.

This study also demonstrated that sample diversity was positively correlated with lung function after controlling for host factors. Multivariate regression of clinical and bacterial features for prediction of lung function revealed that age, BMI z-score, pancreatic insufficiency, diversity and Pseudomonas dominance independently predict lung function. The observation that diversity positively correlates with lung function agrees with other studies in CF and non-CF bronchiectasis8,9,23. Our and previously published analyses do not address causality, however and it remains uncertain whether the relationship between disease stage and diversity is causal.

This study has important caveats. The prevalence of infection with Pseudomonas in this study was lower than published registry data for both pediatric and adult patients24. Similarly, none of the included specimens were culture-positive for Mycobacterium, despite recently reported prevalence rates of nearly 10%25. However, prevalence of all species by culture in our study was determined for the study sample only, not for historical samples from each patient and thus is unsurprisingly lower than published reports which generally define infection as culture positivity within a time range, rather than at a single timepoint.

Adequacy of sampling in CF microbiome studies is difficult to conclusively ascertain and is a challenge in similar studies. Culture-independent analysis by 16S rRNA gene sequencing is not uniformly sensitive to the detection of all bacterial taxa, such as Staphylococcus, in the absence of specific sample processing procedures26. Our study was not designed to systematically compare within-sample relationships between culture and 16S rRNA sequence data. As such, our analysis of Staphylococcal relative abundance should be interpreted cautiously.

Since our cohort did not include paired specimens from the same patient, this analysis was not designed to identify differences in bacterial community characteristics occurring at times of different disease activity (such as during an exacerbation). Given the highly variable bacterial community structure in CF which we and others5,6,7 have observed, between-patient differences are likely to obscure important within-patient trends occurring during exacerbations.

Furthermore, our analysis was not designed to assess the impact that rare or very low abundance taxa (<1% relative abundance) may have on the biology of the CF airway, but rather to assess trends in community structure over a broad range of ages and disease stage. Very low abundance and rare taxa may be critical determinants of pathological processes in the CF lung. Our observations highlight the heterogeneity between patients and confirm the need for longitudinal analysis in individual patients and groups of patients to identify factors associated with disease exacerbations, disease progression and response to treatment. Our method resolved community composition to the genus level only. Significant within-genus or intra-species variability in community composition may have important effects on community behavior and disease pathogenesis and has the potential to explain cross-sectional variability in disease status that is not apparent at the genus level.

Despite these caveats, our data agree with prior reports of CF microbiota and extend these observations to a broader age range of patients, particularly those less than 18 years of age, a group that is not well represented in previously published reports. Our principal findings agree with and extend those of smaller studies of mostly adult patients showing high inter-individual variability in community structure5,6,7, a common core microbiota9, age associated changes in community composition16,17 and disease stage associated decreases in diversity8,11,12,13,16,17,18,19,20.

Longitudinal studies identifying microbial and host risk factors for infection with patho-adapted taxa such as Pseudomonas and Burkholderia, and mechanistic studies elucidating the interactions of these taxa with other colonizing organisms, as well as exploration of possible causal relationships between community diversity and disease manifestations would be illuminating.


Study enrollment, sample and data collection

This study received approval from the research ethics boards of both St. Michael’s Hospital and the Hospital for Sick Children in Toronto, Canada. Patients provided informed consent and were enrolled prospectively at The Hospital for Sick Children and St. Michael’s Hospital between 2010 and 2013. Height, weight, forced expiratory volume in one second (FEV1), forced vital capacity (FVC), age, sex, disease stage, antimicrobial therapy, clinical status, cystic fibrosis transmembrane conductance regulator (CFTR) mutation, pancreatic status and visit outcome were recorded for the day of sample collection. Expectorated sputa were collected for all patients - no oropharyngeal samples were used. Pediatric samples were split into aliquots, one for culture and one for nucleic acid extraction, while adults provided two samples in parallel at each clinic visit. Samples for sequencing were treated with sputolysin and frozen for subsequent processing. Quantitation of 16S rRNA gene copy number was performed by quantitative PCR based on published protocols27.

Si-Seq 16S rRNA gene analysis and informatics

Sequencing was performed at the Centre for the Analysis of Genome Evolution and Function (CAGEF) at the University of Toronto as previously described using an Illumina GAIIx28. An in-house Java script was used to prepare the sequence for downstream analysis using MACQIIME29. Chimeric reads were removed using reference-based chimera detection with USEARCH30. Operational taxonomic units (OTUs) were picked against a SI-Seq structured reference set using a similarity of 0.87 (corresponding to the genus level with the SI-Seq method), with premature termination turned off. Taxonomy was assigned using an RDP structured dataset using UCLUST30 with --max_accepts set to 0 and similarity set to 0.97, (empirically determined to produce the highest consistency identification at the genus level using Pseudomonas aeruginosa controls)28. Unassigned OTUs were identified with BLASTN31. OTUs representing less than 0.005% of overall sequence were removed. At this stage, the median number of sequences/sample was 67,812 (IQR 42,835-87,773). For calculation of Shannon diversity index (SDI), samples were rarefied to 1500 sequences to account for the effects of sequencing depth on diversity, with a median Good’s coverage of 0.972 (IQR 0.955-0.979) at that sampling depth. Diversity and principal coordinates analyses were performed using MACQIIME with default settings.

Statistical analyses

Statistical analyses were performed using SPSS version 22 (IBM). Means were compared using T-tests or ANOVA with Bonferroni correction as needed. Chi-squared tests were used to test proportions. A multivariate linear regression model predicting FEV1 (% predicted) was built by entering all variables, then using backwards removal to identify variables in the final model.


Disease stage, pancreatic insufficiency, FEV1 and FVC were defined as previously reported32,33. BMI z score was calculated using the method described in Flegal et al.34 using age-specific parameter values from the Centers for Disease Control ( For adults over 20 years old, the age values for BMI z score were calculated based on an age of 2035. Antibiotic number equaled the number of antibacterials of any type being taken on the day of the visit (including inhaled antibiotics). Clinical status was defined as ‘baseline’, ‘exacerbation’, ‘treatment’ or ‘recovery’ using the criteria of Zhao et al.11. Samples not meeting these criteria were categorized as ‘other’ and included patients receiving new prescriptions for steroids, antifungals or antibiotics for an indication other than a pulmonary exacerbation (e.g. an extra-pulmonary infection). ‘Dominant genus’ was defined as any genus whose relative abundance was at least twice that of the next most abundant genus10 or if no such genus was present, ‘no dominant genus’.

All methods were performed in accordance with the approved guidelines.

Additional Information

How to cite this article: Coburn, B. et al. Lung microbiota across age and disease stage in cystic fibrosis. Sci. Rep. 5, 10241; doi: 10.1038/srep10241 (2015).