Ulcerative colitis (UC) is driven by disruptions in host–microbiota homoeostasis, but current treatments exclusively target host inflammatory pathways. To understand how host–microbiota interactions become disrupted in UC, we collected and analysed six faecal- or serum-based omic datasets (metaproteomic, metabolomic, metagenomic, metapeptidomic and amplicon sequencing profiles of faecal samples and proteomic profiles of serum samples) from 40 UC patients at a single inflammatory bowel disease centre, as well as various clinical, endoscopic and histologic measures of disease activity. A validation cohort of 210 samples (73 UC, 117 Crohn’s disease, 20 healthy controls) was collected and analysed separately and independently. Data integration across both cohorts showed that a subset of the clinically active UC patients had an overabundance of proteases that originated from the bacterium Bacteroides vulgatus. To test whether B. vulgatus proteases contribute to UC disease activity, we first profiled B. vulgatus proteases found in patients and bacterial cultures. Use of a broad-spectrum protease inhibitor improved B. vulgatus-induced barrier dysfunction in vitro, and prevented colitis in B. vulgatus monocolonized, IL10-deficient mice. Furthermore, transplantation of faeces from UC patients with a high abundance of B. vulgatus proteases into germfree mice induced colitis dependent on protease activity. These results, stemming from a multi-omics approach, improve understanding of functional microbiota alterations that drive UC and provide a resource for identifying other pathways that could be inhibited as a strategy to treat this disease.
This is a preview of subscription content
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Metabolomic data, proteomic data and additional supplementary files for reanalyzing the data collected here are available online at https://massive.ucsd.edu (Cohort 1 proteomics and metabolomics study ID MSV000082094, Cohort 2 study ID MSV000086509, Cohort 2 metabolomics study ID MSV000084908). Proteomic data and supplementary files for reanalyzing data collected from the faecal transplant study and Bacteroides supernatant are under MassIVE identifiers MSV000086510 and MSV000086511, respectively. Genomic data has been uploaded through EBI https://www.ebi.ac.uk/ena under the study identifiers PRJEB42151 for Cohort 1 and PRJEB42155 for Cohort 2. Comparisons with data generated in this study were also made with proteomics data downloaded from the IBD multi-omics database (https://ibdmdb.org/tunnel/public/HMP2/Proteomics/1633/rawfiles). Databases used in this study include UniRef50 (https://www.uniprot.org/downloads), the human proteome (https://www.uniprot.org/proteomes/UP000005640), mouse proteome (https://www.uniprot.org/proteomes/UP000000589), B. vulgatus proteome (https://www.uniprot.org/proteomes/UP000002861), B. theta proteome (https://www.uniprot.org/proteomes/UP000001414), B. dorei proteome (https://www.uniprot.org/proteomes/UP000005974), a microbial genome database (https://biocore.github.io/wol/) and a human gut microbiome database (https://db.cngb.org/microbiome/genecatalog/genecatalog_human/). Source data are available for in vitro and in vivo experiments. Source data are provided with this paper.
The code used in the analysis and visualization of data is available at https://github.com/knightlab-analyses/uc-severity-multiomics.
Fumery, M. et al. Natural history of adult ulcerative colitis in population-based cohorts: a systematic review. Clin. Gastroenterol. Hepatol. 16, 343–356.e343 (2018).
Dulai, P. S., Siegel, C. A., Colombel, J. F., Sandborn, W. J. & Peyrin-Biroulet, L. Systematic review: monotherapy with antitumour necrosis factor alpha agents versus combination therapy with an immunosuppressive for IBD. Gut 63, 1843–1853 (2014).
Sartor, R. B. & Wu, G. D. Roles for intestinal bacteria, viruses, and fungi in pathogenesis of inflammatory bowel diseases and therapeutic approaches. Gastroenterology 152, 327–339.e324 (2017).
Schirmer, M. et al. Compositional and temporal changes in the gut microbiome of pediatric ulcerative colitis patients are linked to disease course. Cell Host Microbe 24, 600–610.e604 (2018).
Shen, Z. H. et al. Relationship between intestinal microbiota and ulcerative colitis: mechanisms and clinical application of probiotics and fecal microbiota transplantation. World J. Gastroenterol. 24, 5–14 (2018).
Halfvarson, J. et al. Dynamics of the human gut microbiome in inflammatory bowel disease. Nat. Microbiol. 2, 17004 (2017).
Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).
Franzosa, E. A. et al. Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nat. Microbiol. https://doi.org/10.1038/s41564-018-0306-4 (2018).
Campieri, M. & Gionchetti, P. Bacteria as the cause of ulcerative colitis. Gut 48, 132–135 (2001).
Khan, I. et al. Alteration of gut microbiota in inflammatory bowel disease (IBD): cause or consequence? IBD treatment targeting the gut microbiome. Pathogens https://doi.org/10.3390/pathogens8030126 (2019).
Mills, R. H. et al. Evaluating metagenomic prediction of the metaproteome in a 4.5-year study of a patient with Crohn’s disease. mSystems 4, e00337-00318 (2019).
Verberkmoes, N. C. et al. Shotgun metaproteomics of the human distal gut microbiota. ISME J. 3, 179–189 (2009).
Zhang, X., Li, L., Butcher, J., Stintzi, A. & Figeys, D. Advancing functional and translational microbiome research using meta-omics approaches. Microbiome 7, 154 (2019).
Liu, C. W. et al. Isobaric labeling quantitative metaproteomics for the study of gut microbiome response to arsenic. J. Proteome Res. 18, 970–981 (2019).
Tran, H. Q. et al. Associations of the fecal microbial proteome composition and proneness to diet-induced obesity. Mol. Cell. Proteomics https://doi.org/10.1074/mcp.RA119.001623 (2019).
Jansson, J. K. & Baker, E. S. A multi-omic future for microbiome studies. Nat. Microbiol. 1, 16049 (2016).
Zhang, X. et al. Metaproteomics reveals associations between microbiome and intestinal extracellular vesicle proteins in pediatric inflammatory bowel disease. Nat. Commun. https://doi.org/10.1038/s41467-018-05357-4 (2018).
Erickson, A. R. et al. Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn’s disease. PLoS ONE 7, e49138 (2012).
Vergnolle, N. Protease inhibition as new therapeutic strategy for GI diseases. Gut 65, 1215–1224 (2016).
Galipeau, H. J. et al. Novel fecal biomarkers that precede clinical diagnosis of ulcerative colitis. Gastroenterology 160, 1532–1545 (2021).
Lewis, J. D. et al. Use of the noninvasive components of the Mayo score to assess clinical response in ulcerative colitis. Inflamm. Bowel Dis. 14, 1660–1666 (2008).
Narula, N., Alshahrani, A. A., Yuan, Y., Reinisch, W. & Colombel, J. F. Patient-reported outcomes and endoscopic appearance of ulcerative colitis: a systematic review and meta-analysis. Clin. Gastroenterol. Hepatol. https://doi.org/10.1016/j.cgh.2018.06.015 (2018).
Dulai, P. S., Levesque, B. G., Feagan, B. G., D’Haens, G. & Sandborn, W. J. Assessment of mucosal healing in inflammatory bowel disease: review. Gastrointest. Endosc. 82, 246–255 (2015).
Walsh, A. J., Bryant, R. V. & Travis, S. P. Current best practice for disease activity assessment in IBD. Nat. Rev. Gastroenterol. Hepatol. 13, 567–579 (2016).
Bakir, M. A., Sakamoto, M., Kitahara, M., Matsumoto, M. & Benno, Y. Bacteroides dorei sp. nov., isolated from human faeces. Int. J. Syst. Evol. Microbiol. 56, 1639–1643 (2006).
Kulagina, E. V. et al. Species composition of Bacteroidales Order bacteria in the feces of healthy people of various ages. Biosci. Biotechnol. Biochem. 76, 169–171 (2012).
O’Donoghue, A. J. et al. Global substrate profiling of proteases in human neutrophil extracellular traps reveals consensus motif predominantly contributed by elastase. PLoS ONE 8, e75141 (2013).
Nemoto, T. K. & Ohara-Nemoto, Y. Exopeptidases and gingipains in Porphyromonas gingivalis as prerequisites for its amino acid metabolism. Jpn Dent. Sci. Rev. 52, 22–29 (2016).
Kumagai, Y. et al. Enzymatic properties of dipeptidyl aminopeptidase IV produced by the periodontal pathogen Porphyromonas gingivalis and its participation in virulence. Infect. Immun. 68, 716–724 (2000).
Deacon, C. F. & Lebovitz, H. E. Comparative review of dipeptidyl peptidase-4 inhibitors and sulphonylureas. Diabetes Obes. Metab. 18, 333–347 (2016).
Mimura, S. et al. Dipeptidyl peptidase-4 inhibitor anagliptin facilitates restoration of dextran sulfate sodium-induced colitis. Scand. J. Gastroenterol. 48, 1152–1159 (2013).
Donaldson, G. P., Lee, S. M. & Mazmanian, S. K. Gut biogeography of the bacterial microbiota. Nat. Rev. Microbiol. 14, 20–32 (2016).
Wexler, H. M. Bacteroides: the good, the bad, and the nitty-gritty. Clin. Microbiol. Rev. 20, 593–621 (2007).
Foley, M. H., Cockburn, D. W. & Koropatkin, N. M. The Sus operon: a model system for starch uptake by the human gut Bacteroidetes. Cell. Mol. Life Sci. 73, 2603–2617 (2016).
Onderdonk, A. B., Franklin, M. L. & Cisneros, R. L. Production of experimental ulcerative colitis in gnotobiotic guinea pigs with simplified microflora. Infect. Immun. 32, 225–231 (1981).
Bamba, T., Matsuda, H., Endo, M. & Fujiyama, Y. The pathogenic role of Bacteroides vulgatus in patients with ulcerative colitis. J. Gastroenterol. 30, 45–47 (1995).
Waidmann, M. et al. Bacteroides vulgatus protects against Escherichia coli-induced colitis in gnotobiotic interleukin-2-deficient mice. Gastroenterology 125, 162–177 (2003).
Sellon, R. K. et al. Resident enteric bacteria are necessary for development of spontaneous colitis and immune system activation in interleukin-10-deficient mice. Infect. Immun. 66, 5224–5231 (1998).
Vich Vila, A. et al. Gut microbiota composition and functional changes in inflammatory bowel disease and irritable bowel syndrome. Sci. Transl. Med. https://doi.org/10.1126/scitranslmed.aap8914 (2018).
Zhou, Y. & Zhi, F. Lower level of Bacteroides in the gut microbiota is associated with inflammatory bowel disease: a meta-analysis. BioMed. Res. Int. 2016, 5828959 (2016).
García-López, M. et al. Analysis of 1,000 type-strain genomes improves taxonomic classification of Bacteroidetes. Front. Microbiol. 10, 2083 (2019).
Shimshoni, E., Yablecovitch, D., Baram, L., Dotan, I. & Sagi, I. ECM remodelling in IBD: innocent bystander or partner in crime? The emerging role of extracellular molecular events in sustaining intestinal inflammation. Gut 64, 367–372 (2015).
Van Spaendonk, H. et al. Regulation of intestinal permeability: the role of proteases. World J. Gastroenterol. 23, 2106–2123 (2017).
Steck, N., Mueller, K., Schemann, M. & Haller, D. Bacterial proteases in IBD and IBS. Gut 61, 1610–1618 (2012).
Carroll, I. M. & Maharshak, N. Enteric bacterial proteases in inflammatory bowel disease – pathophysiology and clinical implications. World J. Gastroenterol. 19, 7531–7543 (2013).
Kriaa, A. et al. Serine proteases at the cutting edge of IBD: focus on gastrointestinal inflammation. FASEB J. 34, 7270–7282 (2020).
Denadai-Souza, A. et al. Functional proteomic profiling of secreted serine proteases in health and inflammatory bowel disease. Sci. Rep. 8, 7834 (2018).
O’Sullivan, S., Gilmer, J. F. & Medina, C. Matrix metalloproteinases in inflammatory bowel disease: an update. Mediators Inflamm. 2015, 964131 (2015).
Biancheri, P. et al. Proteolytic cleavage and loss of function of biologic agents that neutralize tumor necrosis factor in the mucosa of patients with inflammatory bowel disease. Gastroenterology 149, 1564–1574.e1563 (2015).
Gordon, M. H. et al. N-terminomics/TAILS profiling of proteases and their substrates in ulcerative colitis. ACS Chem. Biol. 14, 2471–2483 (2019).
Roka, R. et al. Colonic luminal proteases activate colonocyte proteinase-activated receptor-2 and regulate paracellular permeability in mice. Neurogastroenterol. Motil. 19, 57–65 (2007).
Ordas, I., Eckmann, L., Talamini, M., Baumgart, D. C. & Sandborn, W. J. Ulcerative colitis. Lancet 380, 1606–1619 (2012).
Sałaga, M., Sobczak, M. & Fichna, J. Inhibition of proteases as a novel therapeutic strategy in the treatment of metabolic, inflammatory and functional diseases of the gastrointestinal tract. Drug Discov. Today 18, 708–715 (2013).
Riepe, S. P., Goldstein, J. & Alpers, D. H. Effect of secreted Bacteroides proteases on human intestinal brush border hydrolases. J. Clin. Invest. 66, 314–322 (1980).
Obiso, R. J. Jr., Lyerly, D. M., Van Tassell, R. L. & Wilkins, T. D. Proteolytic activity of the Bacteroides fragilis enterotoxin causes fluid secretion and intestinal damage in vivo. Infect. Immun. 63, 3820–3826 (1995).
Valguarnera, E. & Wardenburg, J. B. Good gone bad: one toxin away from disease for Bacteroides fragilis. J. Mol. Biol. 432, 765–785 (2020).
Elhenawy, W., Debelyy, M. O. & Feldman, M. F. Preferential packing of acidic glycosidases and proteases into Bacteroides outer membrane vesicles. MBio 5, e00909–e00914 (2014).
Marotz, C. et al. DNA extraction for streamlined metagenomics of diverse environmental samples. Biotechniques 62, 290–293 (2017).
Caporaso, J. G. et al. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 6, 1621–1624 (2012).
Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Li, D. H., Liu, C. M., Luo, R. B., Sadakane, K. & Lam, T. W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016).
Zhu, Q. et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat. Commun. 10, 5477 (2019).
Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336 (2010).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Koontz, L. TCA precipitation. Methods Enzymol. 541, 3–10 (2014).
Villen, J. & Gygi, S. P. The SCX/IMAC enrichment approach for global phosphorylation analysis by mass spectrometry. Nat. Protoc. 3, 1630–1638 (2008).
Haas, W. et al. Optimization and use of peptide mass measurement accuracy in shotgun proteomics. Mol. Cell. Proteomics 5, 1326–1337 (2006).
Wessel, D. & Flugge, U. I. A method for the quantitative recovery of protein in dilute solution in the presence of detergents and lipids. Anal. Biochem. 138, 141–143 (1984).
Van Rechem, C. et al. Lysine demethylase KDM4A associates with translation machinery and regulates protein synthesis. Cancer Discov. 5, 255–263 (2015).
Tolonen, A. C. & Haas, W. Quantitative proteomics using reductive dimethylation for stable isotope labeling. J. Vis. Exp. https://doi.org/10.3791/51416 (2014).
Lapek, J. D., Jr et al. Defining host responses during systemic bacterial infection through construction of a murine organ proteome atlas. Cell Syst. https://doi.org/10.1016/j.cels.2018.04.010 (2018).
Tolonen, A. C. et al. Proteome-wide systems analysis of a cellulosic biofuel-producing microbe. Mol. Syst. Biol. 7, 461 (2011).
Thompson, A. et al. Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal. Chem. 75, 1895–1904 (2003).
Wang, Y. et al. Reversed-phase chromatography with multiple fraction concatenation strategy for proteome profiling of human MCF10A cells. Proteomics 11, 2019–2026 (2011).
Lapek, J. D., Jr, Lewinski, M. K., Wozniak, J. M., Guatelli, J. & Gonzalez, D. J. Quantitative temporal viromics of an inducible HIV-1 model yields insight to global host targets and phospho-dynamics associated with protein Vpr. Mol. Cell. Proteomics https://doi.org/10.1074/mcp.M116.066019 (2017).
Eng, J. K., McCormack, A. L. & Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass. Spectrom. 5, 976–989 (1994).
Beausoleil, S. A., Villen, J., Gerber, S. A., Rush, J. & Gygi, S. P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 24, 1285–1292 (2006).
Huttlin, E. L. et al. A tissue-specific atlas of mouse protein phosphorylation and expression. Cell 143, 1174–1189 (2010).
Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).
Elias, J. E., Haas, W., Faherty, B. K. & Gygi, S. P. Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat. Methods 2, 667–675 (2005).
Peng, J., Elias, J. E., Thoreen, C. C., Licklider, L. J. & Gygi, S. P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J. Proteome Res. 2, 43–50 (2003).
Jagtap, P. et al. A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies. Proteomics 13, 1352–1357 (2013).
Wang, M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016).
Pluskal, T., Castillo, S., Villar-Briones, A. & Oresic, M. MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics 11, 395 (2010).
Tripathi, A. et al. Chemically informed analyses of metabolomics mass spectrometry data with Qemistree. Nat. Chem. Biol. https://doi.org/10.1038/s41589-020-00677-3 (2020).
Duhrkop, K., Shen, H., Meusel, M., Rousu, J. & Bocker, S. Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proc. Natl Acad. Sci. USA 112, 12580–12585 (2015).
Djoumbou Feunang, Y. et al. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminform. 8, 61 (2016).
Zhang, J. et al. PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol. Cell. Proteomics 11, M111.010587 (2012).
Quinn, R. A. et al. Neutrophilic proteolysis in the cystic fibrosis lung correlates with a pathogenic microbiome. Microbiome 7, 23 (2019).
Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. 32, 834–841 (2014).
Gonzalez, A. et al. Qiita: rapid, web-enabled microbiome meta-analysis. Nat. Methods 15, 796–798 (2018).
Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).
Szklarczyk, D. et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43, D447–D452 (2015).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Colaert, N., Helsens, K., Martens, L., Vandekerckhove, J. & Gevaert, K. Improved visualization of protein consensus sequences by iceLogo. Nat. Methods 6, 786–787 (2009).
Wang, F. et al. Interferon-gamma and tumor necrosis factor-alpha synergize to induce intestinal epithelial barrier dysfunction by up-regulating myosin light chain kinase expression. Am. J. Pathol. 166, 409–419 (2005).
Tremelling, M. et al. IL23R variation determines susceptibility but not disease phenotype in inflammatory bowel disease. Gastroenterology 132, 1657–1664 (2007).
Wakula, M. et al. Quantification of cell–substrate adhesion area and cell shape distributions in MCF7 cell monolayers. J. Vis. Exp. https://doi.org/10.3791/61461 (2020).
Legland, D., Arganda-Carreras, I. & Andrey, P. MorphoLibJ: integrated library and plugins for mathematical morphology with ImageJ. Bioinformatics 32, 3532–3534 (2016).
Moschen, A. R. et al. Lipocalin 2 protects from inflammation and tumorigenesis associated with gut microbiota alterations. Cell Host Microbe 19, 455–469 (2016).
Hecht, G. et al. A simple cage-autonomous method for the maintenance of the barrier status of germ-free mice during experimentation. Lab. Anim. 48, 292–297 (2014).
Katakura, K. et al. Toll-like receptor 9-induced type I IFN protects mice from experimental colitis. J. Clin. Invest. 115, 695–702 (2005).
Chassaing, B. et al. Fecal lipocalin 2, a sensitive and broadly dynamic non-invasive biomarker for intestinal inflammation. PLoS ONE 7, e44328 (2012).
Xiao, Y. et al. A novel significance score for gene selection and ranking. Bioinformatics 30, 801–807 (2014).
Mills, R. H. et al. Organ-level protein networks as a reference for the host effects of the microbiome. Genome Res. 30, 276–286 (2020).
P.S.D., R.H.M. and C.S. were supported through a UCSD training grant from the NIH/NIDDK Gastroenterology Training Program (T32 DK007202). P.S.D. was also supported by an American Gastroenterology Association Research Scholar Award. We thank E. Griffis, D. Bindels and the Nikon Imaging Center at UCSD for help with confocal microscopy, and the UCSD Neuroscience Microscopy Shared Facility (NS047101). This study was supported in part by NIDDK-funded San Diego Digestive Diseases Research Center (P30 DK120515, D.J.G., P.S.D.) and the UCSD Collaborative Center of Multiplexed Proteomics.
R.H.M., P.S.D. and D.J.G. have jointly filed for a patent based on this work (International Application No. PCT/US2020/057784). Over the course of the publication process, R.H.M. started employment at Precidiag Inc., a company that has licensed the patent based on this work. All other authors declare no competing interests.
Peer review information
Nature Microbiology thanks Daniel Figeys and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Paired faecal and serum samples were collected from 40 patients with varying severity of Ulcerative Colitis. A separately analyzed cohort of faecal samples was also collected on 210 samples with 73 UC, 117 CD and 20 healthy controls. Samples were processed for proteomics using a Tandem Mass Tag multiplexing workflow. Faecal samples were also subjected to both 16S and shotgun metagenomic analyses for microbial composition and gene quantification respectively. In parallel, a metabolomics workflow was performed on faecal samples where collected MS2 spectra were analyzed for both metabolites and peptides in two separate computational pipelines. A custom database was compiled from the metagenome of faecal samples to mediate a comparative analysis between shotgun metagenomic and metaproteomic data sets. This eliminated database dependent bias and the shared reference was used for estimating copy number.
Extended Data Fig. 2 A multiplexing approach improves the depth and sparsity of metaproteomics data.
a, Multiplexed metaproteomic methods increase the total number of proteins quantified. Shown is a bar graph showing the total number of proteins identified when using identical database methodology between the 102 UC samples from the IBD multiomics database, the 40 UC samples from cohort 1 of this study, and the 205 samples from cohort 2 of this study. b, Multiplexed metaproteomic methods improve the number of proteins quantified per sample. Displayed are the mean + /- SD of the proteins identified per sample from studies shown in (a). Data derived from n = 102, 40, 205 biologically independent samples as described for (a). One-way ANOVA p-values adjusted for multiple comparisons are shown (P < 0.0001). c, Multiplexed metaproteomic methods decrease the sparsity of metaproteomic studies. The percentage of missing quantification values for proteins in each data set is shown.
a, Alpha diversity (using Pielou’s evenness metric) by disease activity as shown in Fig. 1b, but highlighting classification of samples as uneven when below Pielou Evenness of 0.5. Best-fit linear regression lines with 95% confidence intervals are shown and an R2 statistic is reported from an ordinary least-squares regression using the formula (Disease Activity + Diagnosis + Disease Activity:Diagnosis). b, 16 S beta-diversity is strongly influenced by community evenness. The weighted UniFrac distance metric was used and each sample was classified by community evenness, diagnosis and whether the most abundant 16 S feature was from the family Enterobacteriaceae. c, Characterizing the most abundant 16 S features. Each sample was classified as either “Uneven” (Pielou Evenness < 0.5) or “Other” as shown in (a). Abundances of each amplicon sequence variant were summed by their highest resolution taxonomic annotation and the most abundant feature of samples are represented in a donut plot. The inside ring represents the fractional composition of each patient subgroup and the outside rings represents the number of patients within each subgroup whom share a similar most abundant feature. Less common features for each patient subgroup are counted as “Other”.
Extended Data Fig. 4 Comparison of genera annotations from genes and proteins correlated to disease severity.
The genus composition of genes and proteins correlated to disease activity were compared with different levels of sparsity as a requirement for being deemed “correlated”. Stacked bar charts summarize the number of genes or proteins from the 10 most common genus assignments when correlated to either partial Mayo severity in UC cohorts or CDAI in CD patients. Only genes or proteins with |r | > 0.3 from linear regression were included. a, Genus composition of significant positively and negatively correlated genes from the MG with no sparsity requirement. b, Genus composition of significantly positively and negatively correlated proteins from the MP with no sparsity requirement. c, Genus composition of associated proteins as in Fig. 3a, but without removing host proteins (genus Homo). d, Genes correlated to disease activity from the MG when filtering out genes appearing in less than 40% of patients within each category. e, Summary of comparing the portions of positively and negatively correlated genes and proteins from each patient cohort when examining the top 10 genera identified in the MG. This analysis is analogous to Fig. 3b, but displaying the top MG genera.
Extended Data Fig. 5 Comparison of genera and functional annotations from genes and proteins correlated to disease severity in CD subtypes.
a, Genus level barcharts of significantly correlated genes or proteins stratified by CD subtype. The genus composition of genes and proteins from either the MG or MP were correlated to CDAI and shown in stacked bar charts. Only genes or proteins with |r | > 0.3 from linear regression were included, and the top 10 genera are displayed with other genera compiled into an “Others” category. b, CD subtypes genus level association comparison. The portion of genes or proteins correlated with disease activity from (a) are plotted by a Log10 comparison between the proportion of positive to negative correlations. Genes correlated to disease activity from the MG when filtering out genes appearing in less than 40% of patients within each category. c, CD subtypes functional association comparison. This analysis is analogous to (b) but summarizing the associations to KEGG functional category annotations in the MP.
Extended Data Fig. 6 Patients with overproduction of Bacteroides vulgatus proteases have increased endoscopic and histological severity.
a, Bacteroides protease production corresponds to increased endoscopic severity. The disease activity of overproducers, underproducers, and other patients are individually plotted over boxplots. Two-tailed, t-test p-values are displayed above the boxplots. Sample sizes include n = 16, 14 and 71 for overproducers, underproducers and others respectively. Boxplots are defined by the median, quartiles and 1.5x inter-quartile range. b, Bacteroides protease production corresponds to a patient population with a decreased proportion of patients in histological remission. Each UC patient sample was categorized by Bacteroides vulgatus protease production category and the percent of patients in histological remission is shown in a bargraph with the number of samples in each category displayed above each bar. Histological remission is defined here as Geboes Grade 3 = 0.
Extended Data Fig. 7 Peptide fragments are increased in active UC patients and Bacteroides protease enriched patients.
a, Comparison of peptide fragments identified in patients with varying abundance of Bacteroides proteases. Overproducers from UC cohort 1 had increased peptide fragments in comparison to other patients (Two-tailed t-test P = 3.5E-2). Data was derived from n = 8, 9, 23 UC cohort 1 samples and n = 6, 6, 49 UC cohort 2 samples from patients classified as underproducer, overproducer and other respectively. b, Peptide termini indicate unique proteolysis of human and microbial proteins. The frequency of each amino acid within the N and C terminus of human and de-novo peptides was compared to either the human proteome or the total amino acid content of de novo peptides. The Y-axis represents the percent difference of each residue and the letter indicates the amino acid associated with the difference. The N and C terminus are shown separately and each residue is colored by chemical property (Green = polar, Black = Hydrophobic, Red = Acidic, Blue = Basic, Purple = Neutral). c, Peptide fragment identification comparison by disease activity in UC cohort 1. Boxplots with a two-tailed t-test p-value is shown (P = 4.7E-3). Data was derived from n = 18, 12, 10 patient samples with low moderate or high disease activity respectively. d, Peptide fragment identification comparison by disease and disease activity state for cohort 2 samples. Boxplots are shown with overlaid two-tailed t-test p-values. Data was derived from n = 19 healthy controls, n = 39, 30, 12 UC samples, and n = 64, 30, 8 CD samples from patients of low, moderate and high activity respectively. Boxplots in (a,c,d) are defined by the median, quartiles and 1.5x inter-quartile range.
Extended Data Fig. 8 Determining the impact of Bacteroides species on TEER using co-culture, supernatants and protease inhibitors.
a, Bacteroides vulgatus and Bacteroides dorei, but not other Bacteroides species disrupt Caco-2 epithelial barriers. Barplots are showing the mean and standard deviation of the change in TEER at different time points. Data was derived from n = 3 independent cultures collected over n = 2 independent experiments. b, Growth curves of Bacteroides vulgatus with protease inhibitors under different growth conditions. OD600 was measured at indicated time points and a non-linear fit is shown. Data was derived from n = 3 independent cultures collected over n = 1 independent experiments. c, Supernatants from Bacteroides in mid-log phase growth do not significantly impact TEER. B. vulgatus and B. theta were grown to mid-log phase, and their supernatants were concentrated and added to Caco-2 monolayers. TEER was measured at the initial time-point and compared to TEER measured after 1, 4, and 8 h of incubation. Plotted are the mean and SEM from n = 3 independent experiments each representing the mean of n = 3 independent wells/experiment (n = 4 wells/experiment for B. vulgatus group). No significant differences were found at any timepoint.
a-f Barplots showing the mean + /- SD of macroscopic organ measurements from fecal transplant of UC patients samples in IL10-/- mice with or without administration of a protease inhibitor. Dots represent one mouse, with each group representing results from 3 UC patient fecal samples with each sample given to 3 co-housed mice. Measurements include final weight of the mice (a), colon weight (b), ratios of the colon weight to length (c), caecum weight (d), fat pad weight (e), liver weight (f). g-h Barplots showing the mean + /- SEM for the concentration of an intestinal inflammatory marker, fecal lipocalin2 (g), and amount of 16 S rRNA in the spleen of mice for an estimate of the splenic bacterial load (h). Each dot in g-h represents the mean of n = 3 mice transplanted with the same UC fecal sample (with the exception of a mean from n = 2 mice for one patient sample in the Abundant Proteases + Inhibitor Cocktail group) from n = 2 independent experiments. i, Metaproteome genera composition of mice transplanted with UC fecal samples. Fecal samples taken at 8-weeks from mice transplanted with one high protease containing sample (H19) and one control patient sample (L3) were analyzed by mass spectrometry based metaproteomics. Stacked barplots are shown for each mouse displaying the proportion of protein signal derived from the most common genera. j, Molecular function of B. vulgatus proteases identified in mice receiving UC fecal samples. The relative abundance of each B. vulgatus protease is shown in stacked barplots grouped by the Gene Ontology molecular function associated with each protein. k, Top B. vulgatus or B. dorei proteases associated with the fecal samples of mice receiving the H19 sample. Each protein is ranked by pi-score, which combines two-sided t-test p-values and the fold-change difference between all H19 and L3 samples. l, Cumulative protease comparisons. A venndiagram is shown comparing the protein names of B. vulgatus or B. dorei proteases from four independent proteomics experiments performed in this study. A full list of the Bacteroides proteases identified in this analysis can be found in Supplementary Table 4.
The results of our study may indicate that certain species from the genus Bacteroides, particularly those recently reclassified under the genus Phocaeicola (for example Bacteroides vulgatus & Bacteroides dorei), may be implicated in the transition from remission to active disease in UC. We hypothesize that a stressor in the UC gut such as nutrient deprivation or cell-to-cell competition may increase protease production, and a switch in the utilization of carbohydrates to proteins as a nutrient source. Some of these proteases may be involved in the disruption of the epithelial barrier, allowing an influx of innate immune cells which further exacerbate disease.
Supplementary Table 1 and Figs. 1–6.
Supplementary Table 2 Association of UC clinical variables to alpha and beta diversity. For alpha diversity, Kruskal–Wallis tests were performed on each of the listed categorical variables, and linear regression was applied for quantitative variables. P values are reported for all tests, and r values are reported for quantitative variables. When more than two categories were present in categorical variables, the P value between the two largest categories was reported. For beta diversity, categorical variables were tested using PERMANOVA, continuous variables were tested using Adonis. P values followed by pseudo-F values for each category are reported. For continuous variables, R2 values are reported after pseudo-F values. Testing was based on Bray–Curtis distance matrices, unless otherwise specified. Significance level is indicated according to P value (* <0.05, ** <0.01, *** <0.001).
Supplementary Table 3 Features of importance to predicting UC disease activity. Feature importance values from the 100 random forest iterations predicting UC disease activity are reported for each omic data type from both UC cohorts. The top-100 features from each data type are provided, ranked by the summed importance scores from both cohorts. From the combined datasets, the top-100 features and annotation information related to each feature’s data type are also provided. These data correspond to the random forest results shown in Fig. 2c.
Supplementary Table 4 Bacteroides proteases identified in UC patients, bacterial supernatants and faecal material from humanized mice. Protein names of peptidases or proteases identified throughout the multiple proteomic experiments from this study are listed. Lists are provided for proteases from B. vulgatus or B. dorei that were positively associated (r > 0.3) with UC patient disease activity from either cohort. Additionally, proteases or peptidases identified in the supernatant from different species of Bacteroides are listed. Finally, from a metaproteomic analysis of the faecal material from mice humanized by UC patient faecal samples, B. vulgatus or B. dorei proteases increased (π > 1) in mice transplanted with a sample overabundant in proteases compared with mice transplanted without overabundant proteases.
Tables containing de novo peptide identifications and data from experiments of B. vulgatus supernatant protease activity with different inhibitors.
Multiple files related to in vitro and in vivo studies displayed in Fig. 5. This includes the raw data from Caco-2 co-cultures with B. vulgatus and protease inhibitors (Fig. 5b,c), the original image files from Fig. 5d, the quantification of cell morphology (Fig. 5e), original data from the monocolonization experiments (Fig. 5f–i) and the original data from faecal transplant studies (Fig. 5j–n).
Tables containing the underlying data from Extended Data Fig. 2.
Tables containing the underlying data from Extended Data Figure 8.
About this article
Cite this article
Mills, R.H., Dulai, P.S., Vázquez-Baeza, Y. et al. Multi-omics analyses of the ulcerative colitis gut microbiome link Bacteroides vulgatus proteases with disease severity. Nat Microbiol 7, 262–276 (2022). https://doi.org/10.1038/s41564-021-01050-3