Altered intestinal microbiota–host mitochondria crosstalk in new onset Crohn's disease

Intestinal microbial dysbiosis is associated with Crohn's disease (CD). However, the mechanisms leading to the chronic mucosal inflammation that characterizes this disease remain unclear. In this report, we use systems-level approaches to study the interactions between the gut microbiota and host in new-onset paediatric patients to evaluate causality and mechanisms of disease. We report an altered host proteome in CD patients indicative of impaired mitochondrial functions. In particular, mitochondrial proteins implicated in H2S detoxification are downregulated, while the relative abundance of H2S microbial producers is increased. Network correlation analysis reveals that Atopobium parvulum controls the central hub of H2S producers. A. parvulum induces pancolitis in colitis-susceptible interleukin-10-deficient mice and this phenotype requires the presence of the intestinal microbiota. Administrating the H2S scavenger bismuth mitigates A. parvulum-induced colitis in vivo. This study reveals that host–microbiota interactions are disturbed in CD and thus provides mechanistic insights into CD pathogenesis.

I nflammatory bowel disease (IBD) is characterized by chronic and relapsing mucosal inflammation of the gastrointestinal tract and comprises two main subtypes, Crohn's disease (CD) and ulcerative colitis (UC) 1 . A prevailing hypothesis is that IBD development is a consequence of aberrant interplay between an altered intestinal microbiota with the host 2 . The composition of the intestinal microbiota from IBD patients is typically characterized by decreased prevalence of protective microorganisms (that is, Clostridium XIVa and IV groups) and an expansion of detrimental bacteria (that is, Enterobacteriaceae and Fusobacteria) [2][3][4][5][6][7] . However, there is limited information on causality and mechanism of disease. Moreover, a fundamental unanswered question is whether the observed microbial dysbiosis is a cause or a consequence of inflammation.
To address causality and microbial dysbiosis in CD pathogenesis, we conducted a systems-level study of the interaction between the intestinal microbiota and host at the mucosaluminal interface (MLI) in newly diagnosed paediatric patients. New-onset, paediatric subjects constitute an important population to study the role of the intestinal microbiota in IBD as there are no treatment effects on the disease process, co-morbidities are rare and there are fewer confounders as compared with adults. In addition, the inflammatory phase of IBD characteristically predominates in younger subjects and thus, these patients are more likely to reveal the underlying mechanisms that promote early development of disease. To further identify key microbial drivers of inflammation, we characterized the MLI microbial community at the time of diagnostic endoscopy. In contrast to stool microbiota, the MLI microbial community is in direct contact with the site of inflammation and is therefore likely to be directly involved in the initiation and maintenance of the diseased state. To understand the interactions between the intestinal microbiome and the host, we characterized the host proteome and identified key associations between the host proteome and intestinal microbiota. We also validated the causative role of a microbe that predicts disease severity in a mouse model of colitis. The use of these approaches in the context of a phenotypically well-defined patient cohort has enabled us to elucidate a possible mechanism of disease and to identify key microbial drivers of inflammation.

Results
Characteristics of the paediatric IBD gut microbiota. Although others have previously characterized the microbiota composition at the mucosa-luminal interface (MLI) in adult cohorts 8,9 , the MLI of new-onset paediatric IBD patients has not been previously reported. Given the intimate location of the MLI microbiota on the host intestinal epithelium, we hypothesized that studying the microbial composition at this location would provide direct information on how microbes affect inflammation and gut physiological processes. To characterize the composition of the MLI microbiota in IBD, we collected the MLI from the ascending colon of new-onset paediatric IBD patients and controls by flushing sterile water onto the mucosa to dislodge the mucus layer from the mucosal epithelial cells followed by aspiration of these mucosal lavages through the colonoscope (Table 1 and  Supplementary Table 1). We investigated the MLI microbiota using 16S rDNA-based Illumina sequencing on a total of 65 CD, 21 UC patients and 42 control subjects with the remaining samples from our cohort (29 CD, 16 UC and 21 controls) used for independent validation with alternative quantification approaches (that is, Ion Torrent/454 sequencing and quantitative PCR (qPCR)). Microbial diversity was unchanged between controls and the CD/UC patients ( Fig. 1a and Supplementary Figs 1 and 2) in contrast to the current model of decreased diversity in IBD patients 10 , but in agreement with a previous study on new-onset paediatric IBD patients 3 . These results suggest that first-onset paediatric patients have a distinct microbial profile as compared with adults and post-treatment cohorts.
Phylotypes differentially abundant between CD and controls or UC and controls were identified using a linear discriminant analysis effect size method 11 and a zero-inflated Gaussian mixture model (metagenomeSeq) 12 . These two statistical methods identified most of the same key microbes ( Fig. 1b and Supplementary Fig. 3; and Supplementary Data 1 and 2). In addition, to ensure that our use of the V6 hypervariable region as opposed to the V4 region commonly used in microbiota studies did not unduly influence our results, we sequenced the V4 and V6 regions in a subset of our samples using Ion Torrent sequencing. Importantly, while there are obviously unique taxa identified as being differentially abundant using each approach ( Supplementary Fig. 4, Supplementary Data 3), many of the same key taxa are identified using each technique. Moreover, these results are highly concordant with those obtained by Gevers et al. 3 , who characterized the microbiota composition of newonset paediatric CD patients using rectal mucosal biopsies instead of MLI samples. These include increased Clostridiales and Bacteroidales in controls and increased Enterobacteriaceae, Veillonella, Fusobacterium, Neisseria and Haemophilus in CD patients (Supplementary Data 2). The identification of the same key players highlights the usefulness of MLI as biological samples and the high quality of our inception cohort.
Numerous reports have cautioned that 16S rDNA results can be confounded by the presence of sequencing artifacts and/or contamination from the kits used during 16S library construction 13 . To quantify the levels of these artifacts and contaminants in our experimental setup, we purified genomic DNA from pure bacterial cultures and subsequently processed and analysed these samples as done for our MLI aspirates. These experiments revealed that our sequencing approach yields accurate results, with each pure bacterial sample primarily composed of a single dominant operational taxonomic unit (OTU) matching the genus of the sequenced bacteria with a median relative abundance of B99.1% ( Supplementary Fig. 5). Using this data set, we also determined that our median PCR/sequencing error rate was 0.007, which is comparable to those obtained in other studies 14 (Supplementary Fig. 6). In addition, we determined that potential contamination from our sequencing reagents/kits accounted for a median of B0.6% relative abundance. Importantly, the levels of sequencing artifacts and contamination quantified in these control experiments are low enough to be confidently removed during our bioinformatic processing and statistical analyses ('Methods' section).
Key phylotypes correlating with CD inflammation severity. To identify candidate causal microbes of CD pathogenesis, we examined changes in colonic microbiota composition at different stages of the disease from mild to moderate to severe inflammation (as determined by the PCDAI (Pediatric Crohn's Disease Activity Index) score). As taxonomy binning might mask important microbial dynamic behaviour, we identified key phylotypes correlating with CD inflammation severity at the OTU level (defined by sequence clustering at 97% identity as a proxy of species-level resolution). Non-supervised multidimensional scaling analysis failed to separate the microbial communities as a function of disease severity likely due to high interpersonal microbial variation (Supplementary Fig. 7) and instead segregated by microbial diversity as reported by others 3 . Nevertheless, we assessed the association of individual OTUs with disease severity using generalized linear models and controlling for potential confounders (gender and inflammation status of the sampled site). A total of 161 OTUs were found to strongly correlate, (53 OTUs positively and 108 OTUs negatively), with disease severity ( Fig. 1c; Supplementary Data 4). The OTUs negatively correlating with severity include major butyrate producers such as Blautia, Lachnospiraceae, Roseburia, Eubacterium rectale, Ruminococcus, Clostridium and Faecalibacterium. This decrease in butyrate producers was further confirmed using a qPCR assay targeting the butyryl-CoA:acetate CoA-transferase (BCoAT) gene, which encodes a key enzyme of the major pathway for butyrate production in the gut environment. As expected, CD patients had significantly fewer copies of the BCoAT gene as compared with the control subjects indicating a decreased microbial capacity to produce butyrate (Fig. 1d). The OTUs found to positively correlate with severity include members of the order Bacteriodales, the families Enterobacteriaceae and Veillonellaceae, and the genera Atopobium, Fusobacterium, Leptotrichia, Sutterella, Vibrio, Parabacteroides, Prevotella, Peptostreptococcus, Peptococcus and Streptococcus. It is noteworthy that one-fourth of these OTUs (Atopobium, Fusobacterium, Veillonella, Prevotella, Streptoccocus and Leptotrichia) are known to produce H 2 S through the fermentation of sulfur-containing amino-acids 15 . We therefore evaluated whether the relative abundance of these H 2 S producers could predict CD severity using receiver-operating characteristic analysis. These H 2 S producers predicted severe inflammation with an AUC of 0.74 (95% confidence intervals 0.61 to 0.87; Supplementary Fig. 8) indicating that this group of microbes could be used to classify CD patients with severe inflammation. Indeed, the relative abundance of these H 2 S producers correlated positively with the severity of inflammation (Fig. 1e). H 2 S can also be produced by sulfate-reducing bacteria (SRB) but we found no evidence of a link between the SRB and CD patients as shown by a qPCR assay targeting the dissimilatory sulfite reductase gene (Fig. 1f). All together, these results suggest a possible role for the H 2 S-producing bacteria in CD pathogenesis through the fermentation of sulfur-containing compounds and not through dissimilatory sulfate reduction.
Atopobium parvulum is a key network hub. To predict interactions among individual OTUs, we constructed a network of correlations between the abundance of the OTUs found to be associated with disease severity (Fig. 2a). The resulting network included 89 nodes (OTUs) and 341 edges (representing 312 co-occurrence and 29 co-exclusion interactions). Two major co-excluded modules of OTUs appeared in the network analysis separating OTUs on the basis of their relative abundance with respect to inflammation (increased or decreased in CD as compared with control subjects). One module of co-occurring OTUs consisted of Vagoccocus (OTU-570199), Streptococcus (OTU-218959), OTU-535825 (Veillonellaceae) with A. parvulum (OTU-529659) as the central hub of this module. The OTUs from this module were all found to be increased in CD. The A. parvulum-module anti-correlated with the other major module, which is mostly composed of OTUs that are decreased in CD and belong to the class Clostridia. The relative abundance of A. parvulum was validated by qPCR and found to be positively correlated with CD severity (Fig. 2b). The correlation between A. parvulum and CD was also confirmed by 454 pyrosequencing of a subset of samples followed by linear discriminant analysis effect size method analysis ( Supplementary Fig. 9).
Functional alterations in the paediatric CD host proteome. To gain mechanistic insights into the role of microbes in CD severity, we conducted an unbiased, shotgun, quantitative proteomic analysis of mucosal biopsies from controls (n ¼ 10) and CD (n ¼ 21) subjects with various levels of disease severity and tested for associations between these host-proteomes and the intestinal microbes. A total 3,323 proteins were quantified using the superSILAC proteomics approach described. The majority of proteins were quantifiable within a 10-fold tissue/super-SILAC ratio (85.7%: non-normalized L/H ratios, 89.3%: normalized L/H ratios) in all the samples ( Supplementary Fig. 10a). Within the 475 proteins quantified by ratios outside the range of 10-fold in at least one sample (non-normalized L/H ratios), only 91 proteins had a median ratio above 10-fold ( Supplementary Fig. 10b). This is comparable to a recently published superSILAC proteome study 16 that showed that 83.7% of identified proteins were within a 10-fold ratio. There were 39 proteins identified in the light samples that did not have a heavy counterpart ( Supplementary  Fig. 10c). Of these, only one protein (Ribonuclease pancreatic; Uniprot ID: P07998) was detected in more than half of the samples without a corresponding SILAC-labelled peak from super-SILAC standard. Taken together, these results highlight the applicability of the reference proteome for use with the human biopsy sample proteomes.
Three hundred and twenty of the 3,323 quantified proteins were identified as differentially expressed by comparing the CD patients versus control (t-test with qo0.05). PCA of the proteomic profiles grouped the control subjects away from the CD patients with the first two components explaining 61.7% of the variation (Fig. 2c). KEGG pathway analysis, through the DAVID bioinformatics resources, revealed the top 10 enriched pathways (Fisher's exact test Po0.01; Supplementary Fig. 11). Gene ontology analysis identified the mitochondrial proteins as the major discriminant feature of all differentially expressed proteins (Fig. 2d). Most mitochondrial proteins (93.5%; 100/107 mitochondrial proteins) were found to be significantly downregulated in the CD patients as compared with controls (Fisher's exact test Po0.05) with the extent of downregulation rising with increased severity (Supplementary Fig. 12). Interestingly, components of the mitochondrial hydrogen sulfide detoxification  ; an increasing red intensity indicates OTUs whose relative abundance increased, whereas an increasing blue intensity indicates OTUs whose relative abundance decreased in CD patients with severe inflammation as compared with mild (outer circle) or severe as compared with moderate (inner circle); the maximum colour output for this figure was set at a Log 2 value of ±3. (d) The qPCR quantification of butyrate producing bacteria in control and CD microbiota. Butyryl-CoA: acetyl-CoA transferase gene (BCoAT); n ¼ 20 and 34 for control and CD, respectively; statistical comparison by Mann-Whitney two-tailed test; **Po0.01). (e) Log 2 sum of rarefied reads assigned to taxa known to produce H 2 S through amino acid fermentation were plotted for the control and as a function of CD severity (n ¼ 39, 11, 11, 43 for control subjects and mild, moderate and severe CD, respectively). (f) The qPCR quantification of sulfate reducing bacteria in control and CD patient microbiota as a function of disease severity. Dissimilatory sulfide reductase gene (DSR); n ¼ 16, 9, 9 and Supplementary Fig. 13). These findings indicate that the decreased abundance of these H 2 S-detoxification proteins is a hallmark of CD disease activity and possibly UC as well. We next performed 'transkingdom' correlation analysis between the levels of differentially expressed mitochondrial proteins and the OTUs that correlated with CD inflammation severity (Fig. 3). In total, 46 OTUs displayed significant associations with 96 host mitochondrial proteins with a Po0.05 in using both Kendall and Spearman correlations. Interestingly, OTUs that were found to be decreased in CD displayed strong positive correlations with mitochondrial proteins while OTUs found to be increased mainly displayed negative correlations. A. parvulum causes pancolitis in Il10 À / À mice. Altogether these observations suggest that A. parvulum might play a key role   in the development or progression of inflammation severity in the CD patients. To evaluate the colitogenic potential of A. parvulum, we utilized colitis-susceptible Il10 À / À mice 18,19 . Germ-free Il10 À / À ;NF-kB EGFP mice were transferred to specific pathogen free (SPF) housing and after 2 weeks acclimatization, gavaged with A. parvulum (10 8 colony-forming units (CFUs) per mouse). Thereafter the mice were gavaged with A. parvulum weekly for 6 weeks and killed 6 weeks later. Compared with control uninfected Il10 À / À ;NF-kB EGFP mice, A. parvulum-colonized Il10 À / À mice displayed colitis (crypt hyperplasia, ulcers, goblet cell depletion and immune cell infiltration) with histologic inflammation scores significantly higher compared with controls (Mann-Whitney U-test Po0.05; Fig. 4a,b). Colonoscopy imaging revealed mucosal erythema, friability and mucosal ulceration in A. parvulumcolonized Il10 À / À mice compared with the healthy mucosa observed in control mice ( Fig. 4c). At the molecular level, the colon of A. parvulum-colonized Il10 À / À mice showed increased expression of Cxcl1 and Il17 compared with controls (Mann-Whitney U-test Po0.01; Supplementary Fig. 14). To determine whether the colitogenic effect of A. parvulum was dependent on the presence of commensal bacteria, we monoassociated germ-free Il10 À / À mice with A. parvulum for 8 weeks.
Interestingly, A. parvulum mono-associated GF Il10 À / À mice did not cause significant colitis compared with controls suggesting that this bacterium cannot induce disease on its own ( Fig. 4d; Histological score: 1.6 versus 0.67 respectively; not significant). Bismuth (III)-subsalicylate (hereafter denoted as 'bismuth') is a known H 2 S scavenger 20 . To test the potential role of H 2 S in A. parvulum-mediated colitis in Il10 À / À mice, gnotobiotic Il10 À / À mice were transferred to SPF conditions and randomized into four groups; SPF only, SPF plus A. parvulum, SPF plus bismuth and finally, SPF plus A. parvulum and bismuth.
As expected A. parvulum worsened the development of colitis ( Fig. 5a-c). Interestingly, treatment with bismuth significantly improved the colitis scores in A. parvulum-colonized mice ( Fig. 5a-c) and prevented expression of Il1b, Il12p40 and Cxcl1 messenger RNA (mRNA; Supplementary Fig. 15). Although these findings suggest a role for H 2 S production in A. parvuluminduced colitis, the reduced colitis observed with bismuth administration could also be due to bismuth's potential antimicrobial activity on A. parvulum and/or the intestinal microbiota.

Discussion
In this study, we found key alterations in the microbiota-host interplay in newly diagnosed children with IBD providing insight into the mechanism of IBD pathogenesis and potentially opening new avenues for treatment interventions. Our study found that mitochondrial proteins were the primary proteins downregulated in CD indicating a central role for the mitochondria in CD pathogenesis. The significant downregulation of these proteins was associated with a depletion of butyrate producers together with a bloom of pathobionts many of which are known to be potent H 2 S producers. These results suggest a regulatory relationship between the mitochondria and microbiota and a disturbance of this relationship in the CD patients.
Increased intestinal permeability has been reported in active CD further suggesting a role for mitochondrial dysfunction in the pathogenesis of IBD 28,29 .
Butyrate is the main energy source of colonocytes and is produced by the bacterial fermentation of unabsorbed carbohydrate. Here, we demonstrate that the intestinal microbiota of paediatric CD patients is characterized by a significant decrease in the relative abundance of butyrate producers as has been reported previously in the context of paediatric IBD 3 . Our data also revealed a positive correlation between the relative abundance of the butyrate producers and mitochondrial proteins, suggesting a signalling role for butyrate in host mitochondrial gene expression. In support of this observation, butyrate is known to activate the expression of the genes encoding the host mitochondrial H 2 S detoxification components 30 and our proteomic analyses indicate a diminished capacity for H 2 S detoxification by paediatric CD patients. Therefore, we postulate that the depletion of butyrate-producing microbes from the colonic microbiota may dampen host H 2 S defense systems. This 'impaired' host would be highly susceptible to further damage caused by enhanced H 2 S production, resulting in additional metabolic stress and subsequently increased mucosal inflammation. Notably, this decrease in butyrate producers is associated with a significant increase in pathobionts, many of which are known to produce H 2 S. These include Fusobacterium, Veillonella and Atopobium parvulum whose relative abundance correlated negatively with the expression of the mitochondrial proteins. Butyrate exhibits anti-inflammatory properties, whereas H 2 S is an important mediator of many physiological and pathological processes and has been previously associated with UC and colorectal cancer 17,31 . Interestingly, A. parvulum was identified by network analysis as the most prominent microbe associated with mitochondrial dysfunction. Importantly, the increased abundance of A. parvulum as a function of inflammation severity in paediatric CD patients was not observed in UC and therefore is not simply a consequence of inflammation, suggesting a causal effect in CD.
Although causal relationships would be better identified by monitoring changes in microbiota composition before the onset of the disease while following disease establishment, logistical constraints preclude this experiment. Instead we tested the hypothesis that changes in microbial composition between distinct and progressive stages of the disease would unravel causal interactions among the host and specific members of the gut microbes. Indeed, the observation that the relative abundance of A. parvulum correlates positively with inflammation severity and negatively with both the mitochondrial protein expression and the level of butyrate producers prompted us to assess the colitogenic potential of A. parvulum. Interestingly, we found that A. parvulum triggers the development of colitis in Il10 À / À mice under specific pathogen-free conditions but not under mono- association conditions. These results indicate a key role for both A. parvulum and the gut microbiota in colitis development. In support to our findings, a recent report on CD patient response to exclusive enteral nutrition (EEN) therapy has reported that a putative A. parvulum OTU was enriched in CD patients before treatment and was negatively associated with days on EEN 32 . Although a role for H 2 S production by A. parvulum in colitis development remains to be proven directly, excess H 2 S has been recently shown to act as an autocrine T-cell activator 33 , potentially contributing to unwanted T-cell responses against commensal bacteria, consistent with our observation that the intestinal microbiota is required for A. parvulum-induced experimental colitis. Accordingly, potential H 2 S trapping by bismuth may have contributed to the mitigation of Atopobiuminduced colitis development in our mouse model. This finding is in agreement with a prospective mechanistic role for H 2 S in inflammation. A role for H 2 S in colitis development is supported by the recent finding that microbial H 2 S formation contributes to mucus degradation opening the intestinal barrier to toxic compounds and pathobionts 34 .
H 2 S is an important mediator of many biological processes (angiogenesis, cytoprotection, metabolism, inflammation) and can be pro-or anti-inflammatory depending on its concentration and the particular circumstance 35,36 . In light of these opposite effects, a recent study reported that paediatric CD patients undergoing EEN treatment and presenting with reduced colonic inflammation had increased levels of fecal H 2 S 37 . The findings from the latter study challenge the concept of a pro-inflammatory role for H 2 S in CD. However, as EEN feed is enriched in sulfurcontaining amino-acids, the observed increase in H 2 S levels might simply be a consequence of a change in diet. Clearly, complex modes of action of H 2 S need to be further explored to reconcile these apparent controversial observations. Furthermore, the impact of bacterial-derived exogenous H 2 S on the intestine may be quite different than endogenous cell-based H 2 S.
Altogether, our results emphasize the importance of the microbial community and its interaction with the host in the pathogenesis of IBD. We propose a central role for mitochondrial dysfunction in colitis development. Although it is difficult to identify the cause underpinning observed downregulation of mitochondrial proteins, the loss of butyrate producers in paediatric CD patients will lead to decreased butyrate production which in turn will impair mitochondrial functions likely resulting in ROS production, increased epithelial permeability and translocation of commensal microbes across the epithelial barrier. Intriguingly, antibiotic exposure has been shown to reduce the abundance of butyrate producers 3 and also markedly increases the risk of CD in children (odds ratio 2.75, 95% confidence interval 1.72-4.38; ref. 38). We have shown that A. parvulum causes colitis in a susceptible mouse model and that the host microbiota is required for colitis development. Therefore, the increased abundance of A. parvulum in paediatric CD patients with severe inflammation suggests a key role for this microbe in driving mucosal inflammation and disease exacerbation. Our results point toward a mechanism of CD pathogenesis involving a disruption of the mitochondria-microbiota relationship leading to dysfunctional mitochondria in the presence of microbes themselves involved in a complex interactive microenvironment at the MLI required for mucosal inflammation.

Methods
Ethics statement. The collection of samples from paediatric patients was approved by the Research Ethics Board of the Children's Hospital of Eastern Ontario (CHEO). The protocol for the use of Il10 À / À mice was approved by the Institutional Animal Care and Use Committee of University of North Carolina at Chapel Hill and the University of Florida. Written informed consent was obtained from the participants or parents.
Study design and sampling. We conducted a cross-sectional study of all eligible patients under 18 years of age scheduled to undergo colonoscopy for their initial diagnostic work-up for suspected IBD. Those who were not diagnosed with IBD acted as controls. Exclusion criteria relating to known conditions affecting the intestinal microbiota composition included: (1) a body mass index greater than the 95th percentile for age; (2) diabetes mellitus (insulin and non-insulin dependent); (3) infectious gastroenteritis within the preceding 2 months; and (4) use of any antibiotics or probiotics within 4 weeks before colonoscopy, or (5) irritable bowel syndrome. These same exclusion criteria were applied to the non-IBD control group. All IBD cases met the standard diagnostic criteria following thorough clinical, endoscopic, histologic and radiological evaluation 39 . Disease location was based on endoscopic and radiologic appearance and characterized using the Paris modification of the Montreal Classification for IBD 40 . Clinical disease activity of CD was determined using the PCDAI 41 and of UC using the PUCAI (Pediatric Ulcerative Colitis Activity Index) 42 . All the controls had macroscopically and histologically normal mucosa, and did not carry a diagnosis for any known chronic intestinal disorder (for example, celiac disease, eosinophilic enterocolitis, irritable bowel syndrome). Subject clinical data were collected and managed using REDCap (Research Electronic Data Capture) hosted at the CHEO Research Institute. REDCap is a secure, web-based application designed to support data capture for research studies 43 . Mucosal-luminal interface samples were collected from the right (ascending) colon (RC); n ¼ 94, 63 and 37 for CD, control and UC, respectively at the time of diagnostic colonoscopy. Colonoscopy preparation was done as per standard protocol modified to one day 44 . During colonoscopy, once the cecum and proximal ascending colon was reached, any loose fluid and debris was aspirated. Thereafter, sterile water was flushed onto the mucosa to dislodge the mucus layer from mucosal epithelial cells and the mixture of water, mucus and intestinal cells was aspirated into a sterile container through the colonoscope. These samples were immediately placed on ice in the endoscopy suite and immediately transferred to the lab for processing and storage at À 80°C. In addition, mucosal biopsies (n ¼ 21 and 10 for CD and control, respectively) were collected from macroscopically involved and non-involved (if the latter were identified) areas of the mid right colon. The biopsies were flash-frozen on dry ice in the endoscopy suite and immediately stored at À 80°C until further processing.
Metagenomic DNA extraction. Metagenomic DNA was extracted using the Fast DNA Spin Kit (MP Biomedicals, Solon, OH, USA) and a FastPrep-24 instrument (MP Biomedicals) with two mechanical lysis cycles at speed 6.0 m s À 1 for 40 s, with 5 min cooling on ice between the two cycles. Next, the DNA was isolated and purified as per the Fast DNA Spin Kit protocol. The extracted DNA was quantified by Qubit fluorometer (Invitrogen, Carlsbad, CA, USA) and stored at À 20°C until use.
16S rDNA-V6 library construction for Illumina sequencing. The V6 region of 16S rDNA was amplified using two successive PCR reactions. The first PCR added the Illumina paired-end sequencing adaptors and the barcode sequences using modified universal 16S rDNA-V6 primers 45 (Supplementary Table 2, Supplementary Fig. 16). Each reaction was prepared in a total volume of 50 ml using 50 ng of the extracted DNA, 0.5 mM of each primer and 1 Â Phusion Flash High-Fidelity PCR Master Mix (Thermo Scientific, Vilnius, Lithuania). The reaction was heated to 98°C for 30 s and then subjected to 10 cycles of 98°C for 5 s, 61°C for 15 s with 1°C drop each cycle and 72°C for 15 s, followed by additional 15 cycles with an annealing temperature of 51°C for 15 s and a final extension at 72°C for 5 min. The second PCR was carried out using 10 ml of the first PCR products in a final volume of 50 ml using the primers PCRFWD1/PCRRVS1 (Supplementary Table 2). The PCRFWD1/PCRRVS1 primers are complementary to the flow cell primers at the 5 0 end and Illumina paired-end sequencing adaptor at their 3 0 end. The second PCR conditions were 1 min at 98°C, 15 cycles of 10 s at 98°C, 30 s at 65°C and 30 s at 72°C followed by a final extension step at 72°C for 5 min. The amplicons of each sample were visualized on 1.5% agarose gel and purified using a Montage PCR 96 Cleanup Kit (Millipore, Billerica, MA, USA). Afterwards, the DNA concentration in each reaction was quantified using the Qubit fluorometer (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. Finally, equimolar quantities of the amplicon from all the samples were pooled, gel purified using a QIAquick Gel Purification Kit (Qiagen, Hilden, Germany) and sent to The Center for Applied Genomics, Toronto, for 100 bp paired-end Illumina HiSeq2500 sequencing. were grown as required under either anaerobic, microaerophilic (10% O 2 , 5% CO 2 , 85% N 2 ) or aerobic conditions at 37°C. DNA for each sample was extracted using the Fast DNA Spin Kit as described above (hereafter referred to as 'pure' samples) and Sanger sequenced to confirm the identity of each bacterial species. In addition, aliquots of the extracted DNA were repurified a second time using the FastDNA kits to assess the impact of potential kit contamination (hereafter referred to as 're-extract' samples). 16S rDNA-V6 libraries for all the samples were constructed, with triplicate libraries constructed for the re-extract samples and sent for Illumina sequencing as described above.
Illumina microbiota sequencing data analysis. Raw data were first analysed for decreasing quality along the read. As expected, we noted decreasing quality near the end of the reads. Given that our paired end reads are expected to overlap by B21-25 nt (Supplementary Fig. 16), we trimmed four nucleotides from the 3 0 end of the sequences using the fastx_trimmer script (http://hannonlab.cshl.edu/) to remove error-prone bases that could result in incorrect read merging. To note, this trimming step reduced the total number of singletons/doubletons in the final rarefied OTU table by 5% (data not shown). The trimmed paired-end sequences were merged using Fast Length Adjustment of Short reads 46 . More than 95% of the reads were successfully merged, while the sequences that failed to merge were discarded. The merged reads were then quality filtered with a minimum quality score of 20 over 90% of the sequence using the fastq_quality_filter command from the Fastx toolkit (http://hannonlab.cshl.edu/). High-quality reads were demultiplexed according to the forward and the reverse barcode sequences and the barcode trimmed using NovoBarCode (Novocraft.com). Afterwards, the reads were converted from fastq to fasta and were fed to QIIME 1.8.0 (ref. 47) to determine the taxonomic and diversity profiles of the samples. First, reads were clustered into OTUs using a closed-reference OTU picking workflow against the Greengenes reference set (release 4 February 2011) based on an average percentage of identity of 97%. Note that the same Greengenes database was used for the Illumina, Ion Torrent and 454 pyrosequencing data sets discussed below. Singletons and doubletons were removed and a table of OTU counts per sample was generated. Next, the OTU table was randomly subsampled to a total number of reads per sample of 200,000. The resulting rarefied OTU table was used to analyse the microbiota structure and diversity using the microbial ecology tools available in the QIIME package and as input for all other downstream analyses. Any further analysis-specific filtering of the input OTU table is described within each individual 'Methods' section. The phylogenetic tree was constructed using the FastTree method of QIIME 1.8.0 and the generated file was exported to the Interactive Tree of Life (iTOL) software 48,49 to view and format the constructed tree.
Illumina pure culture sequencing data analysis. The raw reads obtained for the pure culture sequencing experiments were processed as described above for the Illumina metagenome analysis to generate an OTU table rarefied to 200,000 reads per sample and analysed in phyloseq 50 . Each pure culture consisted of single 'dominate' OTU representing with a median of 499% of the total reads and with a taxonomic classification matching the expected bacteria at the genus level ( Supplementary Fig. 5, Supplementary Data 5). Any OTUs present with a taxonomic classification matching the expected bacteria at the family level were considered to be potential sequencing errors. The reads that were matched against these erroneous OTUs were extracted and re-aligned using BLASTN against the dominant OTU to determine the PCR/sequencing error rate (Supplementary Fig. 6, Supplementary Data 5). OTUs present in 490% of the samples (in either the pure or re-extract sample sets) were considered to be background contamination originating from either the DNA extraction kits or reagents used during PCR amplification ( Supplementary Fig. 5).
Library construction for Ion Torrent sequencing. The V4 or V6 region of 16S rRNA was amplified in a single PCR reaction that incorporated both the Ion Torrent adaptors and an 11 bp barcode sequence using modified universal 16S rDNA-V4 (ref. 51) and V6 (ref. 45 Table 3). Note that the conserved V6 sequence used for the Ion Torrent sequencing is identical to those used in the 16S rDNA-V6 Illumina library construction described above. Each reaction was prepared in a total volume of 50 ml using 50 ng extracted metagenomic DNA, 0.5 mM of each primer, and 1 Â Phusion Flash High-Fidelity PCR Master Mix. For the V6 reactions, the PCR cycling conditions were identical to the first PCR reaction described in the 16S rDNA-V6 Illumina library construction 'Methods' section. For the V4 reactions, the reaction was heated to 98°C for 30 s and then subjected to 34 cycles of 98°C for 30 s, 55°C for 30 s and 72°C for 90 s, followed by a final extension at 72°C for 5 min as previously described 51 . The samples were purified using Purelink PCR purification columns and quantified using a Qubit fluorometer (Life Technologies). PCR products were visualized on a 1.5% agarose gel to ensure successful amplification and that the products were the expected size. Two hundred nanograms of each sample was subsequently pooled together for both the V4 and V6 libraries. The pooled libraries were size-selected using Agencourt AMPure XP DNA purification beads to remove primer dimers. The molar concentrations of each library pool were determined using an Agilent Bioanalyzer and the V4/V6 libraries were combined in an equimolar ratio. This equimolar V4/V6 pool was subjected to emulsion PCR using the Ion OneTouch HiQ Template Kit and sequencing of the templated beads was completed on a single Proton Chip on an Ion Torrent Proton using an Ion Proton HiQ sequencing kit according to the manufacturer's instructions (Life Technologies).

) primers (Supplementary
Ion Torrent sequencing data analysis. Ion Torrent sequencing was done for both the V4/V6 regions from 36 right colon samples (n ¼ 13, 12, 11 for control, CD and UC). The unaligned BAM file generated by the Ion Torrent Proton (note that adaptor sequences/primer dimers are automatically removed by the Proton analysis pipeline) was downloaded, converted to fastq format and reads split according to the expected sizes for the V4/V6 regions plus the barcodes (234-266 and 162-194, respectively, representing a ± 16 bp window), using samtools 52 in conjunction with bioawk (https://github.com/lh3/bioawk). The reads were quality trimmed using the modified Mott trimming algorithm implemented in seqtk (https://github.com/lh3/ seqtk) and reads that were shorter than the expected size for either the V4 or V6 regions (as described above) after trimming were discarded. The V4/V6 reads were then demultiplexed using FastqMultX 53 , converted from fastq to fasta and fed to QIIME 1.8.0 47 to determine the taxonomic profiles of the samples. The reads were clustered into OTUs using a closed-reference OTU picking workflow against the Greengenes reference set (release 4 February 2011) based on an average percentage of identity of 97%. The singletons/doubletons were removed from both the V4/V6 OTU tables and the number of reads/sample was normalized across sample groups (100,000 reads per sample for V4 and 200,000 reads per sample for V6). The rarefied tables were used to calculate alpha/beta diversities in phyloseq 50 and to identify the microbial biomarkers for each tested category using the linear discriminant analysis effect size algorithm 11 with the default parameters. To compare the Ion Torrent sequencing results to the Illumina results, the Illumina V6 data from the samples subjected to Ion Torrent sequencing was extracted from the Illumina rarefied OTU table (note that the Illumina V6 and Ion Torrent V6 samples were rarefied to the same depth) and analysed as described above for the Ion Torrent data.
16S rDNA-V6 454-pyrosequencing. The 454-pyrosequencing sequencing library of the 16S rRNA-V6 region was constructed using a two-step PCR strategy using the primers; 16SF 5 0 -AAACTCAAAKGAATTGACGG-3 0 and 16SR 5 0 -ACGAGCTGACGACARCCATG-3 0 (ref. 45). Ten replicates from each sample were amplified in a 25 ml reaction containing 50 ng DNA template, 1 Â PCR buffer, 2 mM MgCl 2 , 0.2 mM dNTPs mix, 0.2 mM of each primer and 2.5 U Platinum Taq polymerase (Invitrogen, Carlsbad, CA, USA). The reactions were heated to 95°C for 5 min, followed by 15 cycles of 94°C for 40 s, 46°C for 1 min and 72°C for 30 s, and a final extension at 72°C for 5 min. The amplicons from each sample were pooled and purified with Qiagen's MiniElute PCR purification columns (Qiagen, Hilden, Germany) following its standard protocol and eluted in 25 ml molecular biology-grade water. Two microlitres of the pooled purified amplicons were then used as a template for the second PCR following the same PCR conditions except using 30 amplification cycles instead of 15 cycles. The second PCR added the titanium tails to the 16S-V6 amplicons, which are essential for the 454 sequencing procedure. The amplicons from the second PCR were pooled and purified again using Qiagen MiniElute PCR purification columns. Negative control reactions (no DNA template) were included in all the experiments. PCR products were visualized on a 1.5% agarose gel to check the amplification success. The purified amplicons from different samples were normalized to the same concentration (100 ng ml À 1 ), captured to streptavidin coated sepharose beads and exposed to emulsion PCR.
Afterwards, the immobilized double stranded amplicons were denatured into single-stranded DNA, which was then annealed to the sequencing primer by heating at 65°C for 5 min. The single-stranded DNA-containing beads were sequenced on a 454 Genome Sequencer FLX System (Roche Diagnostics GmbH) using GS Titanium chemistry according to the standard amplicon sequencing protocol. Samples covering the three tested phenotypes (two to four samples each) were sequenced on the same run with each being sequenced in a 1/16 section of 70 Â 75 picotiter plate. Note that as the samples were physically separated from each other during the sequencing reaction, there was no need to use barcoded primers.
454-pyrosequencing data analysis. A total of 346,160 reads were generated from the 454 pyrosequencing of 16S rDNA-V6 region from 26 right colon samples. The raw pyrosequencing reads were de-noised and processed to remove low quality and short reads using Quantitative Insights Into Microbial Ecology pipeline release 1.8.0 (QIIME 1.8.0; ref. 47) according to the following parameters: (1) minimum read length of 100 bp, (2) exact matching to the sequencing primers, (3) no ambiguous nucleotides and (4) a minimum average quality score of 20. Next, sequences were de novo clustered using UCLUST 54 based on average percentage of identity of 97%. The most abundant read from each OTU was picked as a representative sequence for that cluster. The representative sequences were then aligned using PyNAST with a minimum alignment length of 100 and a minimum percentage identity of 75%, followed by checking the chimeric OTUs with the blast_fragments approach. Only six representative sequences were identified as chimeric and therefore were removed. Taxonomy assignments were made with BLAST 55 by searching the representative sequences against the Greengenes database (release 4 February 2011) with an e value o1e À 8 and a confidence score of Z0.5. Next, singletons (OTUs that had only one matching sequence) were filtered out from the resulting OTU table. The OTU table was then used to determine the alpha and beta diversity within and between the samples using the QIIME'S default criteria. To identify the microbial biomarkers for each tested category, the relative abundance of different phylogenetic levels computed by QIIME was analysed by linear discriminant analysis effect size algorithm 11 using its default parameters. The 454-pyrosequencing reads assigned as Atopobium by QIIME analysis were retrieved and found to match to A. parvulum following alignment of the reads against the RDP 56 and NCBI databases (the aligned region covered the entire 454 sequence length with 499% sequence identity to A. parvulum).
Statistical analysis of the microbiota data. Several statistical approaches were used to identify taxa significantly associated with disease status and severity. Identification of OTUs exhibiting differential abundance between patients with different disease severity (mild versus moderate versus severe) was performed using the metagenomeSeq R package-a statistical approach that controls for confounding factors and accounts for undersampling 12 . Briefly, the OTU counts were normalized using a cumulative sum scaling approach and OTUs with a normalized count o5 in at least one of the groups were discarded. OTUs exhibiting differential abundance as a function of disease severity were identified using a zero-inflated Gaussian mixture model by incorporating gender and the inflammation status of the ascending colon (inflamed or not inflamed) as confounding covariates. Differentially abundant OTUs were selected according to the following algorithm: (1) OTU present in more than 50% of the samples from at least one group; (2) a fold change in relative abundance Z2 and (3) a statistical significance with an adjusted P value (calculated using the Benjamini-Hochberg method) o0.05. A Kruskal-Wallis test with a post hoc Dunn's test was performed to compare the relative abundance of specific taxa or groups of bacteria as a function of disease status (CD versus UC versus control) and disease severity (mild versus moderate versus severe). The relative abundance of the taxa identified was also analysed by principal coordinate analysis (PCoA). All the statistical analyses were performed using XLSTAT (Addinsoft, NY, USA) and/or GraphPad Prism version 6 (GraphPad, La Jolla, CA, USA).
Validation of V6-16S sequencing. The quantification of A. parvulum and sulfate reducing bacteria relative to total bacteria was determined by conducting quantitative PCR (qPCR) on the extracted metagenomic DNA using an Applied Biosystems 7,300 and primers listed in Supplementary Table 4. Each sample was tested in duplicate in a total volume of 25 ml per reaction. Fifty nanograms of template DNA was added to a reaction mixture containing 0.3 mM of each primer (0.5 mM for 16S rDNA universal primers) and 1 Â QuantiFast SYBR Green PCR master mix (Qiagen, Hilden, Germany). The amplification conditions were 5 min at 95°C followed by 40 cycles of 95°C for 10 s and 66°C for 1 min with data collection at the second step of each cycle. Ct values were then extracted using the Applied Biosystems 7300 software version 1.3.1 and the relative abundance of each taxon was calculated as a DCt value (taxon Ct-universal 16S rRNA Ct). To validate the specificity of Apar-711F and Aparv-881R, fresh PCR amplicons from total DNA extracted from two different mucosal aspirates was cloned using the TOPO TA cloning vector (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. Next, the plasmid containing the 16S rDNA gene fragment was extracted from six different clones by the QIAprep Spin Miniprep kit (Qiagen, Hilden, Germany) followed by capillary sequencing using M13F and M13R primers.
Quantification of butyrate-producers by qPCR. The overall abundance of butyrate-producing bacteria was determined by quantifying the amount of the butyryl CoA:acetate CoA-transferase (BCoAT) gene using the primers BCoATscrF/ BCoATscrR (Supplementary Table 4) as described elsewhere 57,58 . The BCoAT gene was amplified from 50 ng of metagenomic DNA in a 25 ml qPCR reaction containing 1 Â QuantiTect SYBR Green PCR master mix (Qiagen) and 2.5 mM of BCoATscrF/BCoATscrR primers. The amplification conditions were as follows: one cycle of 95°C for 15 min; 40 cycles of 94°C for 15 s, 53°C and 72°C each for 30 s with data acquisition at 72°C. The 16S rDNA gene was used to normalize total bacterial content in each sample and was amplified using adapted universal primers UniF/UniR (Supplementary Table 4) 59 . The assay was done in duplicate for each sample. Delta-Ct (DCt) for each target was calculated by subtracting the Ct of the 16S rDNA from the BCoAT Ct. Then, the DCt values were compared between the CD and control groups using a Mann-Whitney two-tailed test with a Dunn's post hoc test and P values below 0.05 were considered significant.
Microbial correlation network analysis. A correlation network was constructed to identify co-occurrence and mutual exclusion among OTUs from the control subjects and CD patients. To focus on bacteria potentially associated with disease, only the OTUs exhibiting differential abundance as a function of disease severity were chosen for this network analysis. In addition, OTUs that were found to be absent in more than two-third of the samples or with a minimum occurrence o15 across all the samples were removed from the data set, leaving a matrix of 97 OTUs. The network was built as previously described 60 using the CoNet Cytoscape plug-in. The correlation scores were calculated for each OTU pair using a combination of two correlation (Pearson and Spearman) and two dissimilarity measures (Bray-Curtis and Kullback-Leibler). The 400 top-and bottom-ranking edges from each method were retrieved. Edge-and method-specific permutations (100) were used to control for potential false-positive correlations. The resulting distribution was used to calculate statistical significance for each edge using 100 bootstrap iterations. The P value was adjusted for multiple comparisons using a Benjamini-Hochberg correction and a value o0.05 was deemed significant. The P value was obtained for each correlation or dissimilarity measure and only edges supported by at least two significant measures were included. The final network consisted of 89 nodes (OTUs) and 341 edges (correlations). The resulting network was visualized using Cytoscape 3.2.1 (ref. 61).
Stable isotope labelling by amino acids in cell culture. The human hepatic HuH7 cells (HuH-7) and human embryonic kidney 293 cells (HEK-293) were originally acquired from ATCC and the human colorectal cancer 116 cells (HCT-116) were obtained from the Japanese Collection of Research Bioresources Cell Bank. HuH-7, HEK-293 and HCT-116 were individually grown at 37°C in a 5% CO 2 humidified incubator (to note, all three cell lines used in this study were found to be mycoplasma positive). Stable isotope labelling by amino acids in cell culture (SILAC) medium was prepared as follows: DMEM lacking lysine, arginine and methionine was custom prepared by AthenaES (Baltimore, MD, USA) and supplemented with 30 mg l À 1 methionine (Sigma-Aldrich; Oakville, ON, Canada), 10% (v/v) dialysed FBS (GIBCO-Invitrogen; Burlington, ON,Canada), 1 mM sodium pyruvate (Gibco-Invitrogen), 28 g ml À 1 gentamicin (Gibco-Invitrogen), and [ 13 C 6 , 15 N 2 ]-L-lysine, [ 13 C 6 , 15 N 4 ]-L-arginine (heavy isotopic form of amino acids; Heavy Media) from Sigma-Aldrich (Oakville, ON, Canada) at final concentrations of 42 and 146 mg l À 1 for arginine and lysine, respectively. For HCT-116, the concentration of arginine was increased to 84 mg l À 1 . The cells were grown for at least 10 doublings in SILAC media to allow for complete incorporation of the isotopically labelled amino acids into the cells.
Determination of the rate of SILAC amino acids incorporation. The cells were grown to 80% confluency in SILAC medium (5 Â 10 6 cells were plated in 10-cm dish). Next, the cells were washed twice with ice-cold phosphate-buffered saline and lysed by addition of 1 ml of 1X RIPA buffer (50 mM Tris (pH 7.6), 150 mM NaCl, 1% (v/v) NP-40, 0.5% (w/v) deoxycholate, 0.1% (w/v) SDS with protease inhibitor cocktail (Complete Mini Roche; Mississauga, ON,Canada) and phosphatase inhibitor (PhosStop Roche tablet)). The lysates were then transferred to 15 ml conical tubes and the proteins were precipitated by addition of 5 ml ice-cold acetone followed by incubation at À 20°C overnight. The proteins were collected by centrifugation (3,000g, 10 min, 4°C), washed with ice-cold acetone two times and the protein pellets were re-solubilized in 300 ml of a 50 mM NH 4 HCO 3 solution containing 8 M urea. The protein concentrations were determined by the Bradford dye-binding method using Bio-Rad's Protein Assay Kit (Mississauga, ON, Canada). For the general in-solution digestion, 200 mg of protein lysates were reconstituted in 50 mM NH 4 HCO 3 (200 ml) and the proteins were reduced by mixing with 5 ml of 400 mM DTT at 56°C for 15 min. The proteins were then subjected to alkylation by mixing with 20 ml of 400 mM iodoacetamide in darkness (15 min at room temperature) followed by addition of 800 ml of 50 mM NH 4 HCO 3 to reduce the urea concentration to B0.8 M. Next, the proteins were digested with TPCK-trypsin solution (final ratio of 1:20 (w/w, trypsin:protein) at 37°C for 18 h. Finally, the digested peptides were desalted using C18 Sep-Pack cartridges (Waters), dried down in a speed-vac, and reconstituted in 0.5% formic acid before mass spectrometric analysis (as described below) and the determination of labelling efficiency. The incorporation efficiency was calculated according to the following equation: (1-1/Ratio(H/L)); where H and L represents the intensity of heavy and light peptides detected by mass-spectrometry, respectively. Labelling was considered complete when values reached at least 95% for each cell type.
Proteomic analysis of biopsies. Biopsies were lysed in 4% SDS, 50 mM Tris-HCl (pH 8.0) supplemented with proteinase inhibitor cocktail (Roche) and homogenized with a Pellet pestle. The lysates were sonicated three times with 10 s pulses each with at least 30 s on ice between each pulse. The protein concentrations were determined using the Bio-Rad DC Protein Assay. The proteins were processed using the Filter Aided Sample Preparation method as previously described with some modifications 62  Mass-spectrometry analyses. All resulting peptide mixtures were analysed by high-performance liquid chromatography/electrospray ionization tandem mass spectrometry (HPLC-ESI-MS/MS). The HPLC-ESI-MS/MS consisted of an automated ekspert nanoLC 400 system (Eksigent, Dublin, CA, USA) coupled with an LTQ Velos Pro Orbitrap Elite mass spectrometer (ThermoFisher Scientific, San Jose, CA, USA) equipped with a nanoelectrospray interface operated in positive ion mode. Briefly, each peptide mixture was reconstituted in 20 ml of 0.5% (v/v) formic acid and 12 ml was loaded on a 200 mm Â 50 mm fritted fused silica pre-column packed in-house with reverse phase Magic C18AQ resins (5 mm; 200 Å pore size; Dr Maisch GmbH, Ammerbuch, Germany). The separation of peptides was performed on an analytical column (75 mm Â 10 cm) packed with reverse phase beads (3 mm; 120 Å pore size; Dr Maisch GmbH, Ammerbuch, Germany) using a 120 min gradient of 5-30% acetonitrile (v/v) containing 0.1% formic acid (v/v) (JT Baker, Phillipsburg NJ, USA) at an eluent flow rate of 300 nl min À 1 . The spray voltage was set to 2.2 kV and the temperature of heated capillary was 300°C. The instrument method consisted of one full MS scan from 400 to 2,000 m/z followed by data-dependent MS/MS scan of the 20 most intense ions, a dynamic exclusion repeat count of 2 and a repeat duration of 90 s. The full mass was scanned in an Orbitrap analyzer with R ¼ 60,000 (defined at m/z 400), and the subsequent MS/MS analyses were performed in LTQ analyzer. To improve the mass accuracy, all the measurements in the Orbitrap mass analyzer were performed with on-thefly internal recalibration ('Lock Mass'). The charge state rejection function was enabled with charge states 'unassigned' and 'single' states rejected. All the data were recorded with Xcalibur software (ThermoFisher Scientific, San Jose, CA, USA).
Mass-spectrometry database search and bioinformatic analysis. All the raw files were processed and analysed by MaxQuant, Version 1.2.2.5 against the decoy Uniprot-human database (86,749 entries), including commonly observed contaminants. The following parameters were used: cysteine carbamidomethylation was selected as a fixed modification, with methionine oxidation, protein amino (N)-terminal acetylation and heavy proline set as variable modifications. Enzyme specificity was set to trypsin. Up to two missing cleavages of trypsin were allowed. SILAC double labelling (light: K0R0; heavy: K8R10) was set as the search parameter to assess the conversion efficiency. The precursor ion mass tolerances were 7 p.p.m. and the fragment ion mass tolerance was 0.5 Da for MS/MS spectra. The false discovery rate for peptides and proteins was set at 1% and a minimum length of six amino acids was used for peptide identification. The peptides file was imported into Perseus (version 1.2.0.17) to extract the lysine-and argininecontaining peptides, respectively. The protein-group file was imported into Perseus (version 1.3.0.4) for statistical analysis. Differences between control and CD protein expression were determined using a t-test with an adjustment for multiple comparisons (Benjamini-Hochberg method; false discovery rate ¼ 0.05) and q values of less than 0.05 were considered significant. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was achieved using the DAVID Bioinformatics Resources (http://david.abcc.ncifcrf.gov/). DAVID statistical analyses were performed against the whole genome. Proteomics has a tendency to oversample proteins from the cytosol and nucleus while under-sampling membrane-associated proteins. Therefore, the results from DAVID were re-tested against the set of proteins that were not changing in our data set to eliminate any pathway/GO enrichment biases.
Total RNA extraction from intestinal mucosal aspirates. RNA integrity was preserved by adding an equal volume of RNAlater (Ambion) to the mucosal aspirates before freezing at À 80°C. The frozen aliquot (2 ml) was thawed on ice and the total RNA was extracted following the hot phenol protocol described previously 63 . Briefly, 4 ml of each sample in RNAlater were pelleted by centrifugation at 13,000 Â g for 5 min at 4°C. The pellets were washed twice by resuspension in 50% RNAlater/PBS buffer and centrifugation at 13,000g for 5 min at 4°C. Next, the pellets were resuspended and lysed in 2 ml denaturing buffer (4 M guanidium thiocyanate, 25 mM sodium citrate, 0.5% N-laurylsarcosine and 1% N-acetyl cysteine, 0.1 M 2-mercaptoethanol). The lysate was divided into 500 ml aliquots, to which 4 ml of 1 M sodium acetate (pH 5.2) was added. Each aliquot was then incubated with 500 ml of buffer saturated phenol (pH 4.3) at 64°C for 10 min with intermittent mixing. The tube was placed on ice for 10 min followed by centrifugation at maximum speed for 30 min at 4°C. One millilitre of chloroform was added to the aqueous layer and incubated for 15 min on ice, followed by centrifugation at 18,000g for 30 min at 4°C. Afterwards, RNA was precipitated from the aqueous layer by the addition of one-tenth volume 3 M sodium acetate, 500 mM DEPC-treated EDTA and two volumes of cold ethanol followed by overnight incubation at À 80°C. Later, the RNA was pelleted by centrifugation at maximum speed for 30 min at 4°C, aspirated with 80% cold ethanol and resuspended in 100 ml of RNAse free dH 2 O. The extracted RNA was treated twice with Dnase I (Epicentre) followed by PCR amplification using the 16S rDNA universal primers; Bact-8F and 1391-R (ref. 64), to confirm the absence of genomic DNA. The quality and the quantity of the extracted RNA were determined by NanoDrop 2000 spectrophotometer (ThermoScientific) and confirmed by Bio-Rad's Experion StdSens RNA system according to the manufacturer's instructions and stored at À 80°C until use.
Quantification of H 2 S detoxification genes expression level. The quantification of the expression level of TST, SQRDL (Sulfide Quinone Reductase Like) and COX4-1 (Cytochrome C oxidase subunit IV isoform 1) relative to hGAPDH (Glyceraldehyde-3-Phosphate Dehydrogenase) was determined using an Applied Biosystems 7300 and Quantitect SYBR Green RT-PCR kit (Qiagen, Hilden, Germany). The primers used were either designed using the NCBI Primer-BLAST tool 65 or extracted from the literature (Supplementary Table 5). The specificity of the primers was confirmed by TOPO TA cloning and capillary DNA sequencing as previously mentioned in the qPCR validation of V6-16S sequencing methods section. Each reaction contained 100 ng RNA template, 0.5 mM of each primer, 1 Â Quantitect SYBR Green RT-PCR master mix and 0.25 ml Quantitect RT-mix. The one-step qRT-PCR conditions were 50°C for 30 min, 95°C for 15 min followed by 40 cycles of 15 s at 94°C, 30 s at 60°C and 30 s at 72°C with data collection at the third step of each cycle. The amplification specificity was checked by the melting profile of the amplicon and 2% agarose gel electrophoresis. The Ct values were then extracted using the Applied Biosystems 7300 software version 1.3.1. The Ct values of TST, SQRDL or COX4 were normalized to the Ct values of hGAPDH generating DCt. Next, DDCt was calculated by subtracting the average DCt of the control group from the DCt of each sample. The relative quantification was then calculated as 2 À DDCt as described previously 66 .
Microbiota-mitochondrial proteins correlation analysis. Pairwise correlations between the relative abundance of the significant OTUs and mitochondrial proteins identified via MS were calculated using the non-parametric Kendall Tau and Spearman's rank correlation coefficients. To avoid spurious correlations, OTUs and mitochondrial proteins absent in more than half of the samples in each group were excluded from each pairwise comparison. As a result, pairwise correlations were calculated between 62 OTUs and 106 mitochondrial proteins from the 21 patients for which we had both data sets. To further avoid spurious correlations, correlations were considered significant only if the upper-tail probability occurrence was below 0.05 for both the Kendall Tau and the Spearman's rank statistics.
Il10 À / À mouse experiments and tissue processing. Germ-free SvEv129/ C57BL6 Il10 À / À ;NF-kB EGFP male/female mice (8-12 weeks old, n ¼ 12) were transferred to specific pathogen free (SPF) conditions and after 2 weeks mice from a randomly selected cohort (n ¼ 6) were gavaged once weekly with A. parvulum (1 Â 10 8 CFUs) for 6 weeks and sacrificed 6 weeks later (14 weeks total). A. parvulum ATCC 33793 was grown in fastidious anaerobic broth (Lab M, Canada). To investigate the involvement of complex microbiota in the development of colitis, we performed two subsequent experiments using C57BL6 Il10 À / À male/female mice. In the first experimental setting, gnotobiotic Il10 À / À mice (n ¼ 16) were randomized into two groups; 1: GF only (n ¼ 6) and 2: GF þ A. parvulum (1 Â 10 8 CFUs; n ¼ 10). The mice were killed after 8 weeks of monoassociation. The mice were fed normal diet (Teklan Global 18% Protein Rodent Diet). In the second experimental setting, gnotobiotic Il10 À / À male/female mice (n ¼ 31) were transferred to SPF conditions for 2 weeks and then randomized into two groups; 1: SPF only (n ¼ 7), 2: SPF plus A. parvulum (1 Â 10 8 CFUs; n ¼ 8), 3: SPF plus bismuth (III) diet (n ¼ 8) and 4: SPF plus A. parvulum and bismuth (III) diet (n ¼ 8). As described for the first cohort, the mice were gavaged weekly with A. parvulum (1 Â 10 8 CFUs) for 6 weeks and then killed after 6 weeks of colonization (14 weeks total). Bismuth (III) subsalicylate (Sigma-Aldrich, Saint Louis, MO, USA) was incorporated to the chow (Teklan Global 18% Protein Rodent Diet) at a concentration of 7 g kg À 1 (Harlan Laboratories, Madison, WI, USA) and then irradiated for gnotobiotic experiments. The mice were fed with this diet 1 week before the colonization with A. parvulum. The tissue samples from the colon were collected for RNA and histology as previously described 67 . All the animal procedures were approved by the University of North Carolina at Chapel Hill Animal Care and Use Committee. The histological images were acquired using a DFC310 FX (LEICA) coupled to LEICA Application Suite AFv4.6 software.
Intestinal inflammation was blindly scored as previously described 18 . The tissue was divided into four quarters to allow individual scoring of the mid-section, which was affected by A. parvulum more so than in other colitis models. A score was given to each quarter separately and then added to generate a final colitis score on a scale of 0-16.
Statistical analyses of Il10 À / À mice results. Unless specifically noted, the statistical analyses were performed using GraphPad Prism version 6 (GraphPad, La Jolla, CA, USA). The comparisons of mouse studies were made with a nonparametric analysis of variance, and then a Mann-Whitney U-test. The experiments were considered statistically significant if Po0.05.
Mouse endoscopy. Colonoscopy was performed using a 'Coloview System' (Karl Storz Veterinary Endoscopy) as described previously 68 . The mice were anaesthetized using 1.5 to 2% isoflurane and B4 cm of the colon from the anal verge from the splenic flexure was visualized. The procedures were digitally recorded on an AIDA Compaq PC.
qRT-PCR on mouse intestinal samples. Total RNA from intestinal tissues was extracted from the distal part of the colon using TRIzol (Invitrogen, Carlsbad, CA, USA) following the manufacturer's protocol. The complementary DNA was reverse-transcribed using M-MLV (Invitrogen, Carlsbad, CA, USA) and mRNA expression levels were measured using SYBR Green PCR Master mix (Applied Biosystems) on an ABI 7900HT Fast Real-Time PCR System and normalized to b-actin. The primers used are listed in Supplementary Table 5. The PCR reactions were performed for 40 cycles according to the manufacturer's recommendations, and RNA fold changes were calculated using the DDCt method 66 .
Data availability. The demultiplexed reads for all V6-16S experiments that support the findings of this study have been deposited to the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra). The pure culture experiments were submitted under accession number SRP075131, the Illumina microbiota reads under accession numbers SRP034595/SRP056939, the 454 microbiota reads under accession number SRP034632 and the Ion Torrent reads under accession number SRP075131. All the raw data files for the mass spectrometry that support the findings of this study have been submitted to ProteomeXchange under accession number PXD002882. The remaining data are available within the Article and Supplementary Information files or available from the author on request.