Introduction

Gastrointestinal (GI) disease in cystic fibrosis (CF) begins in utero, continues throughout childhood and into adulthood1,2. Dysfunction of the cystic fibrosis transmembrane conductance regulator (CFTR) results in an altered intestinal milieu with proposed disease factors including: (i) reduced bicarbonate secretion and low intestinal pH, (ii) thick and inspissated mucus, (iii) a lack of endogenous pancreatic enzymes, (iv) delayed intestinal transit and (v) possibly an impaired innate immunity3,4. These mechanisms in combination with an energy- and fat-dense diet5 and frequent antibiotic usage likely contribute to a dysbiosis of the intestinal bacterial community, which has been observed in CF6,7,8,9. Individuals with CF suffer from maldigestion and malabsorption, which contribute to malnutrition. Nutrition is of paramount importance in CF, as it has prognostic value for growth, lung function, overall well-being and mortality10.

Intestinal inflammation has been reported in several studies indirectly using various biomarkers, including calprotectin, rectal nitric oxide and M2-pyruvate kinase (M2-PK), as well as directly by capsule endoscopy11,12,13,14,15,16. The pathogenesis of intestinal inflammation in CF is poorly understood, however the altered intestinal environment and microbiota likely play a role. Intestinal inflammation in CF may have significant clinical relevance due to its link with growth and nutrition11,17,18, and to the increased risk of GI cancers in the adult patient population19,20,21.

Recently, Manor et al.8 utilised metagenomic shotgun sequencing of faecal samples and suggested that in children with CF, enteric fat abundance selects for a pro-inflammatory GI microbiota. Debyser et al.7 utilised liquid chromatography-mass spectrometry (LC-MS) on faecal protein extracts from children with CF and their unaffected siblings to highlight the presence of an intestinal dysbiosis and inflammation in CF. Vernocchi et al.9 utilised 16S rRNA sequencing and metabolomics (LC-MS) in children with CF and age-matched healthy controls (HC) to suggest that the gut microbiota enterophenotypes of CF, together with endogenous and bacterial CF biomarkers, reflect the expression of functional alteration at the intestinal level.

These studies all demonstrate an altered functional environment and with established and recent tools for microbial modulation (e.g. prebiotics, probiotics, synbiotics, antibiotics, diet and faecal microbiota transplantation), understanding these changes in the CF gut is essential for developing potential therapeutics. It is therefore essential to explore and define the functional differences between children with CF and well-matched HC. Furthermore, the links between intestinal bacteria and markers of inflammation have yet to be characterised in CF.

In this study we investigate composition and function of bacterial communities inhabiting the intestines of children with CF. We hypothesise that the composition and functional capacity of bacterial communities in the intestine of children with CF are different when compared to HC. We also hypothesise that intestinal bacteria and their functional profiles will be associated with biomarkers of intestinal inflammation. Furthermore, we hypothesise that these changes will be associated with growth and lung function in children with CF.

Results

Population characteristics

Initially, 39 children with CF and 39 paired HC (matched for age and gender) were included, whose microbiota was assessed using 16S rRNA gene sequencing. After removal of low-coverage samples (sequencing depth <22,578, see below), the final study population comprised of 27 children with CF and 27 paired. The mean (SD) age of CF and HC subjects was 8.2 (5.0) and 7.7 (5.3) years respectively (p = 0.2). Each group consisted of 14 males (51.9%). Of the CF children, 24 were pancreatic insufficient (PI) and 3 pancreatic sufficient (PS). Twenty-two of the CF children were homozygous F508del (81%), whilst the remaining 5 were heterozygotes. Clinical characteristics are presented in Table 1.

Table 1 Clinical characteristics of included participants.

Bacterial community characteristics

A total of 3,179,543 bacterial 16S rRNA sequences were retrieved covering more than 72% of the bacterial diversity per sample. This is further supported by the rarefaction curves, which shows that the majority of bacterial communities in our samples are being covered by the sequencing effort undertaken here (Supp. Fig. 1).

Alpha diversity

Children with CF compared with HC had a significantly lower bacterial richness as assessed by the number of zOTUs (mean difference (95% CI) of −168.8 (−219.5 to −118.1), p = 2.9 × 10−07) and a lower diversity as assessed by the Shannon index (mean difference (95% CI) of −0.74 (−1.00 to −0.49), p = 2.2 × 10−06) (Fig. 1A,B). Controlling for age, children with CF had consistently lower richnesses (estimate (SE) −175.1 (21.6), p = 1.0 × 10−10) and Shannon diversity indices (estimate (SE) −0.78 (0.15), p = 3.2 × 10−06) (Fig. 1C,D).

Figure 1
figure 1

Boxplots of sample richness (number of zOTUs) (A) and Shannon index (B) in CF and HC cohorts. Scatterplots of sample richness (number of zOTUs) (C) and Shannon index (D) against age in CF and HC cohorts. Cohort mean and 95% confidence intervals are constructed from generalised linear models and presented as lines and shaded regions, respectively (C,D).

Phylogeny-based beta diversity

Visualisation of phylogeny-based beta diversity (weighted and unweighted UniFrac distances) is presented in Fig. 2. PERMANOVA showed a significant difference in bacterial communities between the CF and HC cohorts (weighted UniFrac distance R2 = 0.139, p = 0.001, unweighted UniFrac distance R2 = 0.121, p = 0.001). PERMANOVA showed no significant difference in bacterial communities between males and females (weighted UniFrac distance R2 = 0.029, p = 0.2, unweighted UniFrac distance R2 = 0.018, p = 0.5), or with age (weighted UniFrac distance R2 = 0.013, p = 0.7, unweighted UniFrac distance R2 = 0.026, p = 0.09).

Figure 2
figure 2

NMDS plots based on weighted (A) and unweighted (B) UniFrac distances between CF and HC cohorts.

Differences in microbial communities between CF and HC populations

The relative abundance of all bacterial phyla in CF and HC subjects are presented in Supplementary Fig. 2. Relative abundances of the top 10 most abundant bacterial genera in CF and HC are presented in Fig. 3. Bacterial taxa with a significant different abundance (ANCOM with q < 0.05) between CF and HC populations are presented in Table 2 and Supplementary Fig. 3.

Figure 3
figure 3

Relative abundance of top 10 most abundant bacterial genera for CF and HC subjects. Samples ordered in increasing age (from left to right).

Table 2 Bacterial taxa with a significant different abundance (ANCOM with q < 0.05) between CF and HC populations at each taxonomic rank (bold). Arrows indicate if the relative abundance of each taxa is higher () or lower () in CF compared with HC populations. Taxa without an arrow describe phylogeny and were not significantly different in abundance. ANCOM analysis presented in Supplementary Fig. 3.

Predicted functional analysis using Tax4Fun

To predict the functional profiles of intestinal bacteria, we used Tax4Fun. On average per sample, 84.6% of zOTUs could be used by Tax4Fun to predict a functional profile of the community based on KEGG Orthology (KO) terms, demonstrating high predictive power in our analysis. In total, abundances of 275 unique KO pathways (based on 6,286 unique functions) were predicted in CF and HC populations. The abundance of 56 KO pathways was significantly different between CF and HC pairs (p < 0.00019) (Table 3). Exploring the effect of age, linear models of pathways of interest were constructed and presented in Fig. 4.

Table 3 Predicted KEGG pathways with a significant different abundance between CF and HC pairs (p < 0.00019).
Figure 4
figure 4

Scatterplots of the relative abundance of predicted KEGG pathways against age in CF and HC cohorts. Predicted functions associated with: (i) short-chain fatty acids (A,B), (ii) fatty acid metabolism (CF), (iii) antioxidants (GI), (iv) essential amino acid (J), and (v) vitamins (KL). Cohort mean and 95% confidence intervals are constructed from generalised linear models and presented as lines and shaded regions, respectively.

Intestinal inflammation and its correlation with the bacterial community

Faecal calprotectin levels were measured in 19/27 pairs (70.4%). Calprotectin levels were significantly elevated in CF compared with paired HC subjects (median (IQR) 92.0 mg/kg (58.7–142.3) vs. 19.5 mg/kg (19.5–19.8), respectively, p = 3.8 × 10–6) (Fig. 5A). Controlling for age, children with CF had consistently higher faecal calprotectin levels than HC (estimate (SE) 110.6 mg/kg (32.9), p = 0.0019) (Fig. 5C).

Figure 5
figure 5

Boxplots of calprotectin (n = 19 pairs) and M2-PK (n = 18 pairs) in CF and HC cohorts (A,B). Scatterplots of calprotectin and M2-PK against age in CF and HC cohorts (C,D). Cohort mean and 95% confidence intervals are constructed from generalised linear models and presented as lines and shaded regions, respectively (C,D).

Faecal M2-PK levels were measured in 18/27 pairs (66.7%). M2-PK levels were significantly elevated in CF compared with paired HC subjects (median (IQR) 10.5 U/mL (4.7–46.0) vs. 1.0 U/mL (1.0–1.0) respectively, p = 0.00032) (Fig. 5B). Controlling for age, children with CF had consistently higher faecal M2-PK levels than HC (estimate (SE) 36.8 U/mL (13.4), p = 0.0096) (Fig. 5D).

Correlations between bacterial taxa (at each taxonomic rank) and inflammatory markers (calprotectin (n = 27) and M2-PK (n = 26)) within CF subjects are presented in Supplementary Fig. 4. Ten genera (Acidaminococcus (r = 0.8, q = 3.9 × 10−7), Allisonella (r = 0.8, q = 3.5 × 10−7), Eubacterium coprostanoligenes group (r = 0.7, q = 0.0002), Howardella (r = 0.8, q = 1.7 × 10−7), Lachnospiraceae UCG-010 (r = 0.8, q = 6.9 × 10−7), Mogibacterium (r = 0.8, q = 4.0 × 10−7), Olsenella (r = 0.8, q = 1.8 × 10−7), Sutterella (r = 0.8, q = 2.1 × 10−5), uncultured Lachnospiraceae (r = 0.5, q = 0.03) and uncultured Porphyromonadaceae (r = 0.8, q = 6.5 × 10−6)) were positively correlated with faecal calprotectin. Lachnospiraceae AC2044 group positively correlated with faecal M2-PK (r = 0.5, q = 0.04).

Correlations between predicted KEGG pathways and inflammatory markers (calprotectin (n = 27) and M2-PK (n = 26)) within CF subjects are presented in Supplementary Fig. 5. Six predicted pathways (benzoate degradation (KO362) (r = 0.6, q = 0.005), carbon fixation pathways in prokaryotes (KO720) (r = 0.5, q = 0.02), DDT degradation (KO351) (r = 0.6, q = 0.01), phenylalanine metabolism (KO360) (r = 0.6, q = 0.009), styrene degradation (KO643) (r = 0.5, q = 0.048) and tropane, piperidine and pyridine alkaloid biosynthesis (KO960) (r = 0.6, q = 0.007)) were positively correlated with faecal calprotectin. Indole alkaloid biosynthesis (KO901) (r = 0.5, q = 0.01) and Photosynthesis – antenna proteins (KO196) (r = 0.5, q = 0.04) positively correlated with faecal M2-PK.

Associations with clinical measures in children with cystic fibrosis

The mean (SD) weight, height and BMI z-scores in children with CF were −0.0004 (1.1), 0.07 (1.2), and 0.1 (1.0) respectively. No significant correlations between anthropometric z-scores and alpha diversity indices (richness or Shannon index) were identified. Correlations between bacterial genera and anthropometric z-scores identified positive correlations between: (i) Ruminococcaceae UCG 014 and BMI (r = 0.6, q = 0.04), (ii) uncultured VadinBE97 and weight (r = 0.5, q = 0.049), and (iii) uncultured VadinBE97 and BMI (r = 0.6, q = 0.02). No significant correlations between anthropometric z-scores and predicted KEGG pathways were identified.

Lung function was recorded in 19 (70%) children, with a mean (SD) FEV1% of 98.9% (15.4). No significant correlations between FEV1% and alpha diversity indices (richness or Shannon index) were identified. FEV1% positively correlated with five genera: (i) Adlercreutzia (r = 0.6, q = 0.03), (ii) Ruminococcaceae NK4A214 group (r = 0.6, q = 0.04), (iii) Lachnospiraceae NC2004 group (r = 0.6, q = 0.04), (iv) Tyzzerella 3 (r = 0.6, q = 0.04), and (v) Candidatus Soleaferrea (r = 0.5, q = 0.047). No significant correlations between FEV1% and predicted KEGG pathways were identified.

Metabolomics

Untargeted metabolomics data was available for 12 CF and 4 HC children as part of a previous study22. A total of 24,814 and 13,341 hits were detected in positive and negative ion modes, respectively, from all stool samples. Metabolites were identified using the human metabolome database (HMDB)23 and the mean (SD) number of identified metabolites per sample in positive and negative ion modes were 6,409 (690) and 1,845 (150), respectively. These lists of identified metabolites were used for all subsequent analyses. The metabolomes of CF and HC cohorts revealed distinct clustering (Supp. Figs. 7, 8). Principal component analysis (PCA) revealed a strong separation between CF and HC cohorts (Supp. Fig. 8A,C). A sensitivity analysis with 4 CF and 4 age and gender matched HC children also revealed a strong separation between CF and HC cohorts (Supp. Fig. 8B,D). Metabolites with a significantly different abundance between CF and HC cohorts (q < 0.05) are presented in a volcano plot (Supp. Fig. 9) and hierarchical clustering presented in Supplementary Fig. 10. Correlations between metabolites and bacterial genera (significant genera identified in Table 2) in children with CF are presented in Supplementary Fig. 10 and Supplementary Dataset 6. Correlations between metabolites and predicted KEGG pathways (significant pathways identified in Table 3) in children with CF are presented in Supplementary Fig. 12 and Supplementary Dataset 6.

A post hoc analysis of metabolites related to short-chain fatty acids (SCFA) was performed. A total of 25 and 2 hits were putatively identified as metabolites related to propanoate and/or butanoate metabolism23 in positive and negative ion modes, respectively. Two metabolites identified as butyric acid (positive ion mode): (i) m/z 106, RT 10.0 min and (ii) m/z 106, RT 9.8 min were constantly lower in children with CF compared to HC across all ages (normalised abundance estimate (SE) −10,134.4 (2,631.8), p = 0.002 and −6,841.5 (3,011.4), p = 0.04, respectively) (Supp. Fig. 13). One metabolite identified as pantetheine (negative ion mode; m/z 277, RT 11.5 min), an amide of pantothenic acid (vitamin B5), was significantly lower in children with CF than HC (normalised abundance median (IQR) 2,795 (1,343-16,628) vs. 35,418 (24,798-43,056), p = 0.045) (Supp. Fig. 14). Pantetheine is involved in the metabolism of propanoate to propionyl coenzyme A23.

Discussion

We have demonstrated that the intestines of children with CF exhibit a marked taxonomic and inferred functional dysbiosis when compared to well-matched healthy controls. To our knowledge this is the first study to infer the function of intestinal bacterial communities in a paediatric CF population. We predicted an enrichment of genes involved in SCFA, antioxidant and nutrient metabolism, all of which are relevant to growth and nutrition in CF (Table 3; Fig. 4). We identified in a subset of samples, that children with CF have distinct metabolic profiles compared to HC. Reduced levels of butyric acid and pantetheine in the stool of children with CF provides support for the notion of increased SCFA (butanoate and propanoate) metabolism. Furthermore, the notion of pro-inflammatory GI microbiota in children with CF is supported by positive correlations between intestinal inflammatory markers and both genera and functional pathways. We also demonstrated an association between intestinal genera and both growth z-scores and FEV1%. Thus, this study provides evidence and insights into the links between alterations in the composition of the intestinal bacterial community with (i) predicted functional changes, (ii) intestinal inflammation, and (iii) clinical outcomes (e.g. growth and lung function). However, considering the nature of 16S rRNA-derived gene data, the precise pathogenic mechanisms in CF GI disease need further exploration and validation.

The paediatric CF GI microbiota demonstrated significantly lower α-diversity indices (richness and Shannon index) across all ages (Fig. 1). Increased GI diversity has been repeatedly associated with health, whilst decreased diversity has been associated with several inflammatory, metabolic, immune-mediated and systemic diseases24. These changes are clinically relevant in CF as the GI microbiota of young children has been proposed to be a determinant of respiratory and systemic disease progression25. Intestinal dysbiosis is further supported by the significant difference in bacterial communities between CF and HC cohorts using both weight and unweighted UniFrac distances (Fig. 2). Similarly Vernocchi et al.9 demonstrated that children (aged 1 to 6 years) with CF have lower alpha diversity (Chao1) and a distinct beta diversity (unweighted UniFrac distance) compared to HC.

Similar to previous reports, we identified a marked increase in the relative abundance of Proteobacteria (genera Escherichia, Shigella, Enterobacter and Morganella), particularly in the first few years of life (Fig. 3; Table 2 and Supp. Figure 3A,E)6,8,26. In our samples, Escherichia was predominantly comprised of Escherichia coli species, which have been shown to positively correlate with faecal measures of nutrient malabsorption and inflammation in CF26. The relative abundances of functions related to two intestinal pathogens were increased in CF, shigellosis (p = 3.0 × 10−8) and Vibrio cholerae (pathogenic cycle) (p = 0.0001) (Table 3). This likely reflects the enrichment of Escherichia and Enterobacter, which share several common genes, rather than the presence of the pathogens themselves. The clinical implications of this however are unclear.

The increased relative abundance of Veillonella in CF samples (Fig. 3; Table 2 and Supp. Figure 3E) has also previously been observed8,25,27 and Veillonella comprises part of the core respiratory microbiota in CF25,27. Veillonella is known to produce propionate (i.e. propanoate) from lactate fermentation28 and the relative abundance of airway Veillonella has been reported to inversely correlate with airway inflammation29.

Fusobacterium was significantly enriched in children with CF compared to HC, and this genus has been extensively linked to colorectal cancer30. This finding is relevant given the 5–10 times greater risk of colorectal cancer in adults with CF compared to the general population31.

Several members of the Ruminococcaceae family were reduced in CF samples (Fig. 3; Table 2 and Supp. Figure 3D,E) and they are known producers of SCFAs (from fermentation of carbohydrates, including resistant starch)32. Interestingly in CF children, the relative abundance of Ruminococcaceae UCG 014 positively correlated with BMI z-scores (r = 0.6, q = 0.04) and Ruminococcaceae NK4A214 group positively correlated with FEV1% (r = 0.6, q = 0.04). Dietary studies examining resistant starch intake and potential interventional studies may be worth exploring in children with CF.

The relative abundance of Alistipes was decreased in CF samples (Fig. 3; Table 2 and Supp. Figure 3E) and species of this genus are known to produce succinic acid33. In mice, dietary succinate was identified as a substrate for intestinal gluconeogenesis and improved glucose homeostasis34. This may be relevant to the exploration of factors associated with glucose abnormalities in children with CF, an increasingly recognised problem35.

Intestinal inflammation as measured by faecal calprotectin is clinically relevant given its negative correlation with height and weight z-scores in CF children11. Both faecal calprotectin and M2-PK were significantly elevated in our CF cohort compared with HC (Fig. 5). Interestingly, Acidaminococcus showed a strong positive correlation with calprotectin (r = 0.83, q = 3.9 × 10−7) in CF (Supp. Figure 4D). Increased Acidaminococcus relative abundance has been associated with lower future height z scores in twin-cohorts of children from Malawi and Bangladesh36. Acidaminococcus sp. consume glutamate (as their sole source of carbon and energy) which is important to gut amino acid metabolism, nitrogen balance, barrier function and epithelial restitution36.

Children with CF appeared to have distinct metabolic profiles compared to HC (Supp. Figs. 710). Interestingly, those children with CF and PS (samples CF33 and CF36) appeared similar to HC rather than PI CF children (Supp. Figs. 7, 8, 10A), providing support for a gradation effect with the degree of CFTR dysfunction. The taxonomic and functional dysbiosis which we have identified, along with the known CFTR dysfunction are plausible factors contributing to the altered metabolic profiles in children with CF (compared to HC). In children with CF, we identified several significant correlations between metabolites and (i) bacterial genera (Supp. Fig. 11), and (ii) predicted KEGG pathways (Supp. Fig. 12), suggesting that intestinal dysbiosis has an influence on metabolic profiles.

In our CF cohort there was a predicted enrichment of genes involved in the metabolism of SCFAs propanoate, butyrate and a precursor, pyruvate (Table 3 and Fig. 4A,B). Short-chain fatty acids have several beneficial effects on GI tract health including, improved intestinal motility, reduced intestinal inflammation, promotion of differentiation of regulatory T cells and regulation of both fatty acid and glucose metabolism37,38. Recently, Vernocchi et al.9 reported butyrate and propionate levels to be lower in children with CF compared to HC. Similarly, in a subset of samples, we identified lower levels of butyric acid and pantetheine in children with CF compared to HC. It is likely that in children with CF, intestinal microbiota have an increased propensity for the metabolism of beneficial SCFAs. The variable trend of propanoate metabolism with advancing age in CF subjects (Fig. 4A) highlights the need to perform a quantitative analysis of dietary-fibre intake and faecal excretion of SCFAs.

In children with CF, we predicted an enrichment of bacterial genes associated with glutathione, taurine and hypotaurine metabolism and ubiquinone (coenzyme Q10) biosynthesis (Table 3 and Fig. 4G–I). Both glutathione and taurine are antioxidants which are protective against oxidative stress and exhibit anti-inflammatory properties39. This is notable in light of a recent randomised controlled trial on oral reduced glutathione demonstrating improved growth z-scores and decreased faecal calprotectin in CF children18. Antioxidants have been explored as a therapeutic intervention in CF lung disease, however evidence is lacking to support their use40. Little is known about the role of oxidative stress in the intestinal metabolism of CF and further exploration into the role of antioxidants for CF GI disease is warranted.

The relative abundance of predicted functions associated with phenylalanine metabolism were increased in CF (p = 0.0002) and also positively correlated with calprotectin (r = 0.6, q < 0.01) (Table 3, Fig. 4J and Supp. Fig. 14). Phenylalanine is an essential and aromatic amino acid which may play several key roles including: (i) attenuating intestinal inflammation through activating calcium-sensing receptors (in piglets)41; (ii) inhibiting TNF-α production42; and (iii) enhancing immune responses42. Quantification of phenylalanine intake may be a useful measure in future CF studies.

In our CF cohort predicted functions related to the metabolism of thiamine (vitamin B1) (p = 1.3 × 10−6) and biosynthesis of pantothenate (vitamin B5) and coenzyme A (CoA) (p = 3.1 × 10−5) were decreased (Table 3 and Fig. 4K,L). These two water-soluble vitamins help metabolise carbohydrates, fats and proteins and their depletion may provide insights into the pathophysiology of malnutrition in CF (despite pancreatic enzyme replacement therapy, high-fat high-calorie diets and vitamin supplementation43).

Some limitations of this current study need to be considered. Firstly, functional pathways were inferred from 16S rRNA gene data and not measured directly. However, given that 84.6% of our 16S rRNA genes could be assigned a functional profile, we believe our analysis still has strong predictive power. Secondly, although our sample size is small, it is still the largest study to date analysing intestinal microbiota function in CF children and HC (age and gender matched) and exceeded the minimum number required (calculation below). Given only three CF children were PS, we were unable to make any meaningful comparisons with PI children. Furthermore, for children with CF, FEV1% was not recorded at the time of stool sampling but within the preceding 12 months. And finally, confounding factors, including antibiotic usage and altered dietary regimes between CF and HC, were not controlled for with this study. A quantitative analysis of dietary intake and direct measures of faecal proteins and metabolites (i.e. targeted metabolomics for measurement of SCFA levels) would be required to validate the findings presented in this study. Although our comparison of metabolomes between CF and HC cohorts was unmatched, we performed a sensitivity analysis with 4 CF and 4 age and gender matched HC which yielded similar results (Supp. Fig. 8B,D).

In conclusion, there exists both a taxonomic and inferred functional dysbiosis, which provides insights into paediatric CF GI disease. Future observational or interventional studies should simultaneously evaluate dietary intake, abdominal symptoms and faeces. Further exploration of potential CF GI therapeutics including antioxidants (e.g. glutathione), SCFAs (e.g propanoate and butyrate), amino acids (e.g phenylalanine) and gut microbiota modulators (e.g. prebiotics including resistant starch, probiotics and synbiotics) is warranted.

Methods

Study population

We performed a prospective, cross-sectional, observational study in children with CF and HC. The subjects and samples for this analysis were collected as part of three prior studies evaluating the progression of intestinal microbiota6 and inflammation in children with CF12,14. Children with CF and HC (aged 0 to 18 years) were prospectively recruited from the outpatient clinics (CF and Orthopaedic/Plastics clinics respectively) at Sydney Children’s Hospital Randwick, Australia. In addition, inclusion and exclusion criteria were as follows:

Inclusion criteria:

  • Children with CF diagnosed according to the United States Cystic Fibrosis Foundation consensus criteria44.

  • Exocrine pancreatic insufficiency (PI) or pancreatic sufficiency (PS) defined based on the 72-hour faecal fat and/or faecal elastase-145,46.

  • Children with CF on oral or inhaled antibiotic therapy were not excluded.

  • Healthy controls included children without CF or any gastrointestinal disease.

Exclusion criteria:

  • Patients requiring antibiotic therapy for a pulmonary exacerbation47 or intravenous antibiotics in the preceding four weeks before sampling.

  • Any child (CF or HC) with gastroenteritis, on oral corticosteroids, probiotics and/or non-steroidal anti-inflammatory drugs in the preceding two weeks before sampling.

Children with CF were matched to healthy controls for gender and age (as closely as possible).

Sample collection and processing

A single stool sample from each subject was collected and stored immediately at −80 °C, or stored at −20 °C (home freezer) until transport to the laboratory for storage at −80 °C. Thawing of the sample during transport did not occur6. At the time of sample collection, demographic and anthropometric (height, weight and body mass index (BMI) z-scores) data was recorded. In CF children older than four years, the forced expiratory volume in one second, percent predicted (FEV1%) from their most recent lung function test was recorded (within the preceding 12 months).

DNA was extracted from stool samples and sequenced as previously described6. The 16S rRNA genes of the gut microbiota were amplified with primers 27F (AGAGTTTGATCMTGGCTCAG) and 519R (GWATTACCGCGGCKGCTG) spanning the V1–V3 regions and sequenced using the Illumina MiSeq platform (v3, 2 × 300 bp).

Processing of 16S rRNA data

Paired-end sequences were quality filtered using Trimmomatic v0.3648: low quality reads were truncated if the quality dropped below 15 in a sliding window of 4 bp. Reads shorter than 36 bp were removed Remaining paired-end reads were merged and then filtered using USEARCH v9.2.6449. Filtering included the removal of reads shorter than 400 bp or longer than 500 bp as well as the removal of low-quality reads (expected error > 1) and reads with more than one ambiguous base. Processed sequences of all samples were concatenated into one file and subsequently dereplicated into unique sequences. These sequences were clustered in unique sequences (zero-distance operational taxonomic unit; zOTU) with the unoise250 algorithm implemented in USEARCH. A denovo chimera removal step was included in the unoise step. Subsequently, the uchime algorithm was utilized in reference mode with the SILVA 123 database51 to remove remaining chimeric sequences. Chimera-free sequences were classified by BLASTn alignment against the SILVA database. Concatenated sequences of all sequences were mapped on the final set of zOTUs to calculate the abundance of each zOTU in any given sample. All non-bacterial sequences were removed based on their taxonomic classification. A phylogenetic tree was constructed using MUSCLE v3.8.3152.

Samples with <22,578 sequences per sample were removed, with this threshold being based on a trade-off between sequencing depth and the number of samples for paired analysis. Coverage per sample was estimated with a Michaelis-Menten Fit.

Functional profiles were inferred from 16S rRNA data using Tax4Fun53. Tax4Fun predicts the functional capabilities of the microbial communities by mapping the 16S rRNA gene sequences with the functional annotation of sequenced prokaryotic genomes in the Kyoto Encyclopedia of Genes and Genomes (KEGG).

Inflammatory biomarkers

Calprotectin was extracted and measured as described in a previous study12 using the PhiCal kit (Calpro, San Diego, CA, US). The lower limit of detection for the assay was 19.5 mg/kg. Calprotectin > 50 mg/kg is considered elevated. Faecal M2-PK was extracted and measured as described in a previous study14 using the ScheBo Tumour M2-PK kit (ScheBo Biotech, Giessen, Germany). The lower limit of detection of the assay was 1 U/mL.

Stool metabolomics

A subset of samples underwent untargeted ultra-high performance liquid chromatography-tandem mass spectrometry (UHPLC-MS/MS) as previously reported, with the primary aim of exploring inflammatory pathways22. Metabolites were identified using the human metabolome database (HMDB)23 and the normalised abundances of these identified metabolites were uploaded into Perseus v1.6.6.0. The normalised abundances were then log(2) transformed. Metabolite rows were filtered for at least 70% valid values in at least one group. Missing values were replaced from a normal distribution (total matrix). Proteins with a significantly different abundance between CF and HC cohorts were determined using a two sample Student’s T-test (permutation-based FDR < 0.05). The normalised abundances were then Z-score normalised which were used for downstream analysis. A post hoc analysis of metabolites related to propanoate and butanoate metabolism (Supp. Dataset 7)23 was also performed.

Sample size

The minimum required sample size for this study was calculated using GLIMMPSE 2.0.0 (https://glimmpse.samplesizeshop.org/#/) and based on the age-adjusted mean (95% CI) Shannon index reported in our initial gut microbiota study for CF and HC cohorts (2.75 (2.51–2.98) and 3.90 (3.72–4.08), respectively)6. A minimum sample size of 10 subjects was required to reject the null hypothesis that the population means of the CF and HC groups are equal with probability (power) 0.8 and Type I error of 0.05.

Statistical analysis

Statistical analysis was performed in R v3.4.4. Alpha and phylogeny-based beta diversity indices were calculated with a dataset random subsampled to 22,578 sequences per sample. Pairwise weighted and unweighted UniFrac distances between samples54 were calculated using the GUniFrac package55 and used to generate non-metric multidimensional scaling (NMDS) plots. Continuous variables were compared using a paired t-test or Wilcoxon signed-rank test for parametric and non-parametric data, respectively (p < 0.05 considered statistically significant). Generalised linear models (glm function; using a gaussian distribution) were constructed to control for age when comparing continuous variables between pairs. Permutational multivariate analysis of variance (PERMANOVA) tests (permutations = 1000) were utilised to test if beta diversity significantly differed between groups (‘CF vs HC’ and ‘male vs female’) and for age using the vegan function adonis56. Graphs were generated using ggplot2 in R57. A significant difference in abundance of taxa or zOTU between CF and HC groups was assessed using the ANCOM package v1.1-3 (Benjamini & Hochberg correction for multiple comparisons, q < 0.05)58. A significant difference in abundance of functional pathways was assessed with multiple Wilcoxon signed-rank tests corrected for multiple testing using a Dunn-Sidak correction (n = 275, p < 0.00019). Correlations between continuous variables were assessed using Pearson or Spearman correlations (Benjamini & Hochberg correction, q < 0.05)59.

Patient consent

Written informed consent was obtained from each subject or caregiver(s) and the study was carried out in accordance with the approved guidelines.

Ethics approval

This study was approved by the South Eastern Sydney Area Health Service, Human Research Ethics Committee, Sydney Australia (HREC ref no: 10/240).

Data sharing statement

The datasets generated and analysed during the current study are available upon reasonable request to the corresponding author.