A low-gluten diet induces changes in the intestinal microbiome of healthy Danish adults

Adherence to a low-gluten diet has become increasingly common in parts of the general population. However, the effects of reducing gluten-rich food items including wheat, barley and rye cereals in healthy adults are unclear. Here, we undertook a randomised, controlled, cross-over trial involving 60 middle-aged Danish adults without known disorders with two 8-week interventions comparing a low-gluten diet (2 g gluten per day) and a high-gluten diet (18 g gluten per day), separated by a washout period of at least six weeks with habitual diet (12 g gluten per day). We find that, in comparison with a high-gluten diet, a low-gluten diet induces moderate changes in the intestinal microbiome, reduces fasting and postprandial hydrogen exhalation, and leads to improvements in self-reported bloating. These observations suggest that most of the effects of a low-gluten diet in non-coeliac adults may be driven by qualitative changes in dietary fibres.


SUPPLEMENTARY MATERIALS A low-gluten diet induces changes in the intestinal microbiome of healthy Danish adults
Hansen, Roager, Søndertoft, Gøbel et al.

Product intake (g per week)
Product intake (%) In total 1988 ± 539 100% In total 1812 ± 467 100% Data are given as consumption per week from 51 individuals completing both intervention arms and are presented as mean (SD).

Supplementary Table 4 Absolute plasma alkylresorcinol concentrations and changes hereof during the 8-weeks lowgluten and high-gluten interventions
Low-gluten diet High-gluten diet Week 0 Week 8 Change Week 0 Week 8 Change P value Total alkylresorcinol (nmol L -1 ) 139 (149) 13 (17) -119 (147) 125 (122) 99 (90) -27 (85) <0,001 Data are from 51 individuals with complete data and are presented as median (IQR) (*). P value is derived from linear mixed models adjusted for changes in colonic transit time, serum triglycerides and low-density lipoprotein-cholesterol concentrations and study participant as a random variable (as justified by Lind et al. 2  8.87E-01 Data are from 50 individuals with complete diet data from both interventions arms and is presented as mean (SD) for normally distributed variables whereas median (range) is reported for non-normally distributed variables (*). Mean of difference and corresponding confidence interval and P value are derived from a paired t-test (high-gluten relative to lowgluten). Log transformation was applied to make data conform to normality when variables were non-normally distributed (*). The effect, standard error and P value of low-gluten vs. baseline and high-gluten vs. baseline are estimates of the mean change in the two periods derived from the LMM with an intervention-visit interaction adjusted for age, sex, intestinal transit time and with participant-specific, within-period participant-specific and qPCR plate batch number as random effects. The effect, standard error and P value of low-gluten vs. high-gluten are estimates of differences in change (high-gluten relative to low-gluten) and derived from a similar LMM. 1 Log transformation was applied to make data conform to normality. * Number of successful observations out of a total of 205 observation possible after removing individuals with only one sample and samples missing CTT measure. For outcomes with five measure at visit, the total number of observations possible is 1025 and eight measures is 1640. Overall well-being was measured 9 times after a standardised meal and the total number of possible observations is 1845. CTT and ex vivo cytokine production measures were not corrected for CTT and the total number of possible observations is 212.

Supplementary
** Number of individuals out of a total of 54 individuals possible after removing individuals with only one sample, samples missing intestinal transit time measure and samples taken immediately after antibiotic treatment. The effect, standard error and P value of low-gluten vs. baseline and high-gluten vs. baseline are estimates of the mean change in the two periods derived from the LMM with an intervention-visit interaction adjusted for age, sex, intestinal transit time and with participant-specific and within-period participant-specific specific random effects. The effect, standard error and P value of low-gluten vs. high-gluten are estimates of differences in change (high-gluten relative to low-gluten) and derived from a similar LMM. For outcomes measured at more than one time point the intervention-visit interaction was replaced with a three-way treatment-visit-time interaction and the global effect was evaluated with a chi square test. The effect, standard error and P value are estimates of the correlations between mean changes in two variables using a LMM with customized reference modules as response and breath hydrogen, significant ex vivo cytokine production and β-aminoisobutyric acid (BAIBA) as a continuous variable when correcting for age, gender, intestinal transit time and individual as a random factor. LPS, lipopolysaccharides; SCFA, short-chain fatty acids. The effect, standard error and P value are estimates of the correlations between mean changes in two variables using a LMM with modules as response and urine metabolites and breath hydrogen as a continuous variable when correcting for age, gender, intestinal transit time and individual as a random factor. m/z, mass-to-charge; DHPPA, 3,5-dihydroxyhydrocinnamic acid-glucuronide The effect, standard error and P value of low-gluten vs. baseline and high-gluten vs. baseline are estimates of the mean change in the two periods derived from the LMM with an intervention-visit interaction adjusted for age, sex, intestinal transit time and with participant-specific and within-period participant-specific specific random effects. The effect, standard error and P value of lowgluten vs. high-gluten are estimates of differences in change (high-gluten relative to low-gluten) and derived from a similar LMM. Log transformation was applied to make data conform to normality. The effect, standard error and P value of low-gluten vs. baseline and high-gluten vs. baseline are estimates of the mean change in the two periods derived from the LMM with an intervention-visit interaction adjusted for age, sex, intestinal transit time and with participant-specific and within-period participantspecific specific random effects. The effect, standard error and P value of low-gluten vs. high-gluten are estimates of differences in change (high-gluten relative to low-gluten) and derived from a similar LMM. Log transformation was applied to make data conform to normality. The list of metabolites was constructed from hypotheses based on results from the microbiome, urine metabolome and host response, hence no multiple test correction was applied. The spearman coefficient and P value between two variables using a Spearman correlations with customized reference modules for serotonin and kynurenine production, faecal and serum kynurenine, faecal and serum serotonin and β-aminoisobutyric acid (BAIBA) as a continuous variables. Serotonin/Kynurenine production potential ((KYN008+KYN009)/KYN006). The modules were constructed to specifically target the significant metabolites and tested based on hypotheses, hence no multiple test correction was applied.   Table 18). Modules/metabolites with higher proportional abundances/faecal concentrations following the low-gluten diet are highlighted in blue, those positively associated to the high-gluten diet in yellow. Low-gluten dieting was associated with both higher faecal kynurenine concentrations and a lower proportional abundance of a serotonin production pathway. Together, both observations suggest a shift in faecal tryptophan metabolisation from serotonin (high-gluten diet) to kynurenine (low-gluten diet).

Biochemical analyses of postprandial blood samples
Before plasma GLP-2 measurements, all samples were extracted in a final concentration of 75% ethanol, and at a final concentration of 70% ethanol before plasma GIP and plasma PYY measurements. Intact GLP-2 was measured using an in-house developed radio-immuno assay originally described elsewhere 5  Asp33 -> Tyr33 substitution. Total GIP was measured using a C-terminally directed antiserum (code no. 80867), which reacts fully with intact GIP and N-terminally truncated forms as described elsewhere 6 . The standard was human GIP (Bachem, cat no. H-5645) and the tracer was 125 I-labeled human GIP (Perkin Elmer, cat no. Nex402). Total PYY was measured as described elsewhere 7 using a monoclonal antibody MAB8500 (Abnova, clone RPY-B12), which reacts equally well with PYY1-36 and PYY3-36. Synthetic human PYY3-36 (Bachem, cat no. H-8585) was used as standard and 125 I-labeled PYY (Perkin Elmer, cat no. Nex341) as tracer. Sensitivity for all hormone assays was below 5 pmol L -1 , and intra-assay coefficient of variation below 10%.

Dietary fibre composition of the two diets
A representative meal of each distinct diet was prepared based on daily intake from dietary records of the present study. The samples were freeze dried and milled to a ≤1 mm particle size.
In order to determine the amount of starch, glucose and non-starch glucose, subsamples of the three diets were de-starched prior to hydrolysis. Samples were treated with 0.2% weight per volume Termamyl® SC for 85 min at 70°C in 50 mM phosphate buffer pH 6 according to Thomassen et al. 8 . Carbohydrate composition of the dietary fibre analysis was performed by a modified NREL sulphuric acid hydrolysis 9 . In brief, 150 mg sample was added to 1.5 mL 72% sulfuric acid, followed by incubation at 30°C for 60 min. Acid concentration was diluted to a final 4% sulfuric acid and the sample was autoclaved at 121°C for 60 min. The monosaccharide standards used for quantification were treated under the same conditions in order to take the recovery into account. Quantification was performed by High Pressure Anion-Exchange Chromatography system attached to Pulsed Amperometric Detection (HPAEC-PAD) BioLC system (Dionex, Sunnyvale, CA, USA) equipped with an AS50 autosampler, GS50 gradient pump and ED50 electrochemical detector equipped with a CarboPac PA1 analytical column (4x250 mm) and a guard column (4x50 mm) operated at 1 mL min -1 with isocratic elution in water for 38 min followed by isocratic elution in 500 mM NaOH for 15 min. Post-column addition of 500 mM NaOH at 0.2 mL min -1 was added for detection.

Resistant starch composition of the dietary study products
The percentage of digested starch in the dietary study products was evaluated by an in vitro static method that simulates physio-chemical conditions present in the human digestive system. Prior to the in vitro digestion processing, the food items were prepared according to instructions on the packet and snap-frozen in liquid nitrogen and stored before analysis. Before freezing, study products cooked by boiling were drained of excess water. A standardised static in vitro digestion method 10 was used with slight modification. The simulated digestion fluids (Simulated Salivary Fluid (SSF) contained 15.1 mmol L -1 KCl, 13.6 mmol L -1 NaHCO3, pH 7, Simulated Gastric Fluid (SGF) contained 6.9 mmol L -1 KCl, 47.2 mmol L -1 NaCl, 12.5 mmol L -1 NaHCO3, pH 2) and Simulated Intestinal Fluid (SIF) contained 6.8 mmol L -1 KCl, 38.4 mmol L -1 NaCl, 85 mmol L -1 NaHCO3, pH 7. These solutions were prepared to simulate the physio-chemical conditions present in the human body, but slightly simplified as compared to Minekus et al. 10 . For the oral phase the food was processed to simulate the chewing by using a mincer (Kitchen craft).
Approximately 5 g (dry weight) of minced food substance (25°C) were mixed with SSF solution at the ratio of 1:1 (w:w) including CaCl2 (0.75 mmol L -1 ) and human salivary α-amylase (90 IU mL -1 , A1031 Sigma Aldrich). The bolus was incubated for 5 min at 37°C at 170 rpm using a Stuart orbital incubator SI500. For the gastric phase the oral bolus was mixed with one volume of SGF solution and CaCl2 (0.075 mmol L -1 ), porcine pepsin (1200 IU ml -1 ) and fungal lipase (60 IU mL -1 ) were added. Finally, for the intestinal phase the gastric chime was mixed with one volume of SIF stock solution and CaCl2 (0.3 mmol L -1 ) and pancreatic α-amylase (200 IU mL -1 ) added using porcine pancreatin 8x, (P7545 Sigma Aldricht). The samples were incubated for 16 h at 37°C with 170 rpm mixing. Aliquots were withdrawn in triplicate for further analysis at different time-points to analyse the rapidly available sugars (RAS, oral and 30 min duodenal digestion) during the duodenal digestion, slowly digestible starch (SDS, starch digested between 30 and 120 min of duodenal digestion) and resistant starch (RS, 16 h). The digestion was stopped at the time points by adding one volume of 96% ethanol and increasing the pH to 10. Following the digestion, the samples were centrifuged at 10000 g for 10 min and the pellet was washed twice by using 80% ethanol. The supernatant and pellet were treated separately. Starch content in the fractions was analysed using the AOAC method 996.11/AACC method 76.13. Glucose content was analysed by the PGO enzyme system (P7119 Sigma-Aldrich).

FODMAP composition of the dietary study products
Each dietary study product (10 g) was freeze-dried and finely grinded using a pestle and mortar.
One g from each sample was pooled to generate one gluten-free and one non-gluten-free pool.

Metagenomics analyses
The raw reads output was quality controlled using Cutadapt v. 1.8.1 with Python 2.7.10. 11 Bases with a Phred quality score below 20 were trimmed in both the 5' and 3' end of the sequences and reads shorter than 50 bases were discarded. Adaptor sequence remnants were removed in the 3' and 5' end of the reads. The total number of base pairs (bp) per sample ranged from 3.7 Gbp to 16.8 Gbp. Eight samples were discarded due to low sequence (< 3.7 Gbp) output. Trimmed reads were mapped against the Integrated Gene Catalogue (IGC) 12 comprising 9,879,896 genes derived from 1267 faecal samples, using BWA-MEM 13 . The read alignments were filtered using a 95% sequence similarity cut-off across the whole read and the number of reads mapping to a gene was counted. Samples were rarefied using the R-package GUniFrac v 1.0 to the lowest sequence depth per individual to enable comparison within individuals, but not between individuals. This was done to ensure a minimum loss of information. For inter-individual analyses the gene abundances were rarefied to lowest sample sequence depth: 3.7 Gb.
The canopy clustering method described by Nielsen et al. 14 was applied on the 1267 samples from Li et al. 12 to create co-abundance gene clusters. Clusters containing more than 700 genes were assigned as metagenomic species (MGS) and amounted to 1264 MGSs. Abundances of genes belonging to such a cluster were used to calculate the overall relative abundance of a MGS. At least 5% of all genes belonging to a MGS should be observed before the MGS was detected; hence the 95 th quantile of the abundances of the collection of genes from the same MGS was used as abundance measure. However, a minimum abundance cut-off was set to a 95 th quantile of the read count to genes normalised by gene length (in bp) of less than 1e-4.
To use the full information contained within the genes of an MGS and to avoid potential missing marker genes, the taxonomy of the significant MGSs was determined by aligning all genes using which were downloaded as described by Nordahl et al. 16 We only included Blast hits with an e-value lower than 1e-5 and an alignment length above 80% of the query length with an 80% sequence identity for determining the taxonomy. MGSs were classified as unknown if less than 15% of the genes had a Blast hit. MGSs were classified all the way to family level if more than 15% (min. 140) genes had a Blast hit and they consistently mapped to the same tax-id (minimum drop of 5% point). If more than 70% of the genes showed consistent taxonomic annotation, the MGS was assigned to genus level and if more than 90% of the genes showed consistency, MGSs were mapped to species level.
The genes were annotated to the EggNOG database (v3) and the KEGG database as described by  Table 18) and MetaCyc database 18 . A set of 9 modules (referred to as KYN) was assembled following the KEGG database syntax 19 as described in Vieira-Silva et al 17 . Each KYN module is delimited by its input and output compounds and encompasses all enzymes (ortholog groups) required to perform the reaction steps of all alternate pathways.
When non-orthologous reactions for synthesis/utilization of a certain compound exist in prokaryotes, different KYNs were assembled for each. For each enzyme, the most precise prokaryotic-specific orthologous group containing all taxa that were experimentally proven to perform the function was selected, using the KEGG 19  Reactions conditions were as follows: Initial 95⁰C for 5 minutes, followed by 45 cycles of 95⁰C for 10 seconds, 60⁰C for 15 seconds and 72⁰C for 45 seconds. Finally, a melting curve was generated by gradually increasing the temperature (95⁰C for 5 seconds, 68⁰C for 1 minutes and increasing the temperature to 98⁰C with a rate of 0.11⁰C per second with continuous fluorescence detection). The qPCR was performed in triplicate for each sample/primer combination in 384well plate format on a LightCycler® 480 II instrument (Roche Applied Science) and analysed using the dedicated LightCycler® 480 software for absolute quantification with standard curves.
Samples were randomised between treatment groups and the four time-point, but all samples originating from a specific individual (4 samples x 3 replicates x 2 primer-sets = 24 samples) were always analysed on the same 384-well plate to reduce effects of plate-to-plate variation.
Absolute abundances were calculated as bacterial genome equivalents per g faecal pellet and 16S gene copies per g faecal pellet for Bifidobacterium spp. and total bacterial load respectively. The relative abundance of Bifidobacterium spp. was calculated by dividing the calculated absolute abundance of Bifidobacterium spp. with the calculated total bacterial load.
Exploratory non-targeted urinary metabolic profiling was performed by UPLC-MS with a quality control (QC) sample injected once for every 10 urine samples as previously reported 26  To confirm the identity of sulfonated and glucuronidated metabolites, deconjugation experiments using β-glucuronidase from E. coli K12 (Roche Diagnostics GmbH) and sulfatase from Aerobacter aerogenes (Sigma-Aldrich, Schnelldorf, Germany) were performed. For each deconjugation reaction, urine samples were diluted in sodium phosphate buffer (pH=7.4), incubated with the enzymes at 37°C for 1-2 h, injected onto the UPLC-MS for analysis, and compared to standards when available.

Urine metabolomics by GC-MS
All chemicals used for urine metabolic profiling by GC-MS in the study were analytical grade.
Sodium hydroxide, pyridine, methyl chloroformate (MCF), anhydrous sodium sulfate, sodium bicarbonate, the internal standard 2,3,3,3-d4-DL-alanine and the external C7-C30 saturated alkanes standard were obtained from Sigma-Aldrich (St. Louis, MO, USA). Chloroform, methanol and hexane were obtained from Merck (Darmstadt, Germany). The GC-MS analyses were performed as previously described 11 . Briefly, urine samples were derivatised by MCF and injected on a GC-MS (Agilent 7890A coupled to MSD 5975C) with a mass selective detector (EI) operating at 70 eV. The column used for all analyses was a TG-1701MS GC column (30m x 0.255mm x 0.1μm with 5m guard column, Thermo Fisher, USA). Data processing was done in MSD ChemStation and AMDIS software (NIST, Boulder, CO, USA). The GC-MS data (peak height) were normalised by the internal standard 2,3,3,3-d4-DL-alanine and the measured urine creatinine concentration to account for the technical variability associated with chemical derivatisation and the dilution of urine, respectively. Identification of metabolites was done using our in-house MCF MS library, which contains information of mass fragmentation and retention times of more than 300 authentic standards and by use of the commercially available library from National Institute of Standards and Technology (NIST).

Serum and faeces metabolomics
All serum and faeces samples were randomised prior to extraction and analysis. Extraction of metabolites from serum was carried out essentially as previously described 29 . Briefly, four volumes of ice-cold methanol supplemented with 13

SCFA in plasma and faeces by LC-MS
Analysis of short-chain fatty acids in serum and faecal extracts were carried out as previously described 30 , except that derivatives were separated on a on a Agilent 1290 Ininity HPLC system (Agilent Technologies, Santa Clara, CA) equipped with an Agilent Zorbax Eclipse Plus C18 column (2.1 x 150 mm, 1.8 μm) with a 50 mm guard column, both kept at 40°C and identified on a Agilent 6530 quadrupole time of flight (Q-TOF) mass spectrometer.

Statistical analyses
For testing the host physiology, all factors described above were included in the LMM, except when testing intestinal transit time, where transit time as a factor was excluded. An interaction term of intervention and visit was included in the LMM with seven levels including all combinations of intervention (2 levels) and visit (4 levels: visit 1=baseline 1, visit 2, visit 3=postrandomization baseline 2, visit 4). The baseline at visit 1 was assumed to be the same for the two randomized groups, hence visit 1 was combined to one level. This design allowed for carry-over effects since baseline at visit 3 preceded a specific intervention. The intervention effect was estimated directly by a post-hoc t-test as the average differences of the differences between baseline and intervention, hence between visits 1 and 2 and between visits 3 and 4. The effects of the low-gluten intervention and high-gluten intervention compared to the habitual diets at baseline were also tested.
All physiological measures were assessed for linearity using histograms, residual plot and QQplot of the residuals. TAG, ALAT, ASAT, CRP, IL-6, TNF-α, HOMA-IR, insulin, haemolysis and exhalation of hydrogen were all log-transformed to accommodate linearity. The visual analogue scoring of personal gastrointestinal indicators were zero inflated due to the nature of the questions, why the scorings were dichotomized, where scores below 5 mm were set to zero.
The gastrointestinal indicators were tested using binominal distribution and responders were subsequently tested using the LMM.
For the postprandial analyses of blood biochemistry, exhalation of hydrogen and gut wellbeing questionnaire with multiple time-points for one visit, a three-way interaction-term of visitintervention and time after the standardized meal was tested. Assuming that measures closer in time would correlate more, we implemented a Gaussian covariance structure, which assumes measurements of the same participant to be serially correlated within one visit. For each time point, the intervention effect was evaluated and a χ 2 -test was used.
For the cytokine measures after ex vivo LPS stimulation, triplicates were averaged and tested using the LMM as previously described but including mix of monocytes, eosinophils, basophils and mast cells (MXD) as a factor for TNF-α, IL-6 and IL-1β and IFN-γ was corrected for lymphocytes (LYM) .
Change in plasma levels of alkylresorcinols were analysed using a LMM adjusted for changes in intestinal transit time, serum triglycerides and low-density lipoprotein-cholesterol concentration and participant as a random variable 2 . For comparison of end-points of the two periods, a paired t-test or a Wilcoxon matched pairs test was used depending on whether the data were normally distributed.
All 989 out of a total of 1264 species present in at least 2 individuals were tested using the LMM and adjusted for participant, participant x period term, age, gender and intestinal transit time as described above. The participant was included in the analysis if the MGS was observed in at least one of the four time points. In cases, where all participants included were of the same gender, this factor was excluded from the model. All abundances were naturally log-transformed after adding the lowest abundance to all numbers to avoid infinity measures. Similar to the host physiology analyses, an intervention-visit term was tested. Due to the natural individual variance in the gut microbiome composition, eight levels of the intervention-visit term were included to allow for differences at baseline between the two randomised groups besides including individuals as a factor to allow for a unique intersect at baseline. P values were corrected for multiple testing using the Benjamini-Hochberg false discovery rate (FDR) 4 .
Histograms, residual plot and QQ-plot of residuals were used to assess the compliance of normality of the significant MGSs. Significant MGSs with zero inflation were confirmed using a mixed model with binary distribution followed by a linear mixed model test of all non-zero values.
Concentrations of Bifidobacterium (B. longum DSM 20219 genomes per g faeces) calculated with standard curve, total bacterial load, Bifidobacterium relative abundance calculated with standard curve and Bifidobacterium relative abundance calculated with Ct method asses by qPCR were all log transformed to conform normality and evaluated for intervention effects using the linear mixed model.
All 6739 KOs present in more than two study participants were tested using a LMM as described above with an eight-level intervention-visit interaction-term. All KOs present in less than 10 individuals were excluded and the smallest value was added to all abundances before natural logtranformation of the data. LMMs were applied to test the modules including a three-way term of intervention-visit-KO, where the effect on each KO was tested and globally summarized using a χ 2 -test. The average effect size, P values and adjusted P values were reported. To evaluate the contribution of the significant MGSs to the significance of the modules, a similar analysis was conducted where genes from the significant MGSs were excluded from the dataset and adjusted P values were reported.
Changes in urine metabolites were analysed using the described LMM with eight levels of the intervention-visit term. We adjusted P values using the Benjamini-Hochberg FDR 4 .
To evaluate intervention effects on the targeted metabolites measured in serum and faeces, all data were log transformed to conform normality and tested using the described LMM with eight levels of the intervention-visit term.
To test the association between the significant outcomes of MGSs, exhalation of hydrogen and urine metabolites a pairwise regression analysis was conducted. The LMMs included the outcomes as continuous variables, participant as a random effect and age and gender as fixed effects.
The association between significant modules, hydrogen exhalation and urine metabolites were also evaluated using LMM adjusted for participant as a random effect and age, gender and intestinal transit time as fixed effects. Specifically, LMMs were fitted for each KO within a significant module and slope coefficients for the association of interest were extracted from all model fits and used obtaining a pooled slope coefficient that was interpreted as the module-level measure of the association. P values were adjusted for multiple comparisons using the Benjamini-Hochberg FDR 4 .
The code used to compute abundances of the kynurenine specific modules (KYN) from a KO abundance table is shared on GitHub (https://github.com/raeslab/GMMs), where KYN abundances were computed as the median orthologue group abundance of the maximum alternative pathway.
Renormalization by using ratios between sub-components of the matrix is a way to circumvent the compositionality of metagenomic data 31 . We thus used the ratio between microbial serotonin production potential (KYN008, KYN009) and kynurenine production potential (KYN006). We