Revealing proteome-level functional redundancy in the human gut microbiome using ultra-deep metaproteomics

Li, Leyuan; Wang, Tong; Ning, Zhibin; Zhang, Xu; Butcher, James; Serrana, Joeselle M.; Simopoulos, Caitlin M. A.; Mayne, Janice; Stintzi, Alain; Mack, David R.; Liu, Yang-Yu; Figeys, Daniel

doi:10.1038/s41467-023-39149-2

Download PDF

Article
Open access
Published: 10 June 2023

Revealing proteome-level functional redundancy in the human gut microbiome using ultra-deep metaproteomics

Nature Communications volume 14, Article number: 3428 (2023) Cite this article

8422 Accesses
6 Citations
38 Altmetric
Metrics details

Subjects

Abstract

Functional redundancy is a key ecosystem property representing the fact that different taxa contribute to an ecosystem in similar ways through the expression of redundant functions. The redundancy of potential functions (or genome-level functional redundancy ${{{{{{\rm{FR}}}}}}}_{g}$) of human microbiomes has been recently quantified using metagenomics data. Yet, the redundancy of expressed functions in the human microbiome has never been quantitatively explored. Here, we present an approach to quantify the proteome-level functional redundancy ${{{{{{\rm{FR}}}}}}}_{p}$ in the human gut microbiome using metaproteomics. Ultra-deep metaproteomics reveals high proteome-level functional redundancy and high nestedness in the human gut proteomic content networks (i.e., the bipartite graphs connecting taxa to functions). We find that the nested topology of proteomic content networks and relatively small functional distances between proteomes of certain pairs of taxa together contribute to high ${{{{{{\rm{FR}}}}}}}_{p}$ in the human gut microbiome. As a metric comprehensively incorporating the factors of presence/absence of each function, protein abundances of each function and biomass of each taxon, ${{{{{{\rm{FR}}}}}}}_{p}$ outcompetes diversity indices in detecting significant microbiome responses to environmental factors, including individuality, biogeography, xenobiotics, and disease. We show that gut inflammation and exposure to specific xenobiotics can significantly diminish the ${{{{{{\rm{FR}}}}}}}_{p}$ with no significant change in taxonomic diversity.

Data-independent acquisition boosts quantitative metaproteomics for deep characterization of gut microbiota

Article Open access 24 January 2023

Critical Assessment of MetaProteome Investigation (CAMPI): a multi-laboratory comparison of established workflows

Article Open access 15 December 2021

Combining proteogenomics and metaproteomics for deep taxonomic and functional characterization of microbiomes from a non-sequenced host

Article Open access 05 June 2020

Introduction

The human gut microbiome is a complex ecosystem harboring trillions of microorganisms. Its taxonomic composition, functional activity and ecosystem processes have important consequences on human health and disease. It is crucial to study the human gut microbiome in the context of ecological communities¹. The most frequently applied ecosystem measures of the gut microbiome have been taxonomic diversity calculators², which are not sufficiently informative in assessing the functional states of the gut microbial ecosystem. Moreover, diversity indices have widely varied between studies with some studies reporting lower diversity indices in disease microbiomes than in healthy controls³, while other studies reported higher diversity indices in disease versus controls^4,5. Numerous examples including those two are raising the concern of correlating diversity values to health and disease status⁶. Indeed, in a microbial ecosystem, diversity is just one of the properties which does not necessarily correlate with its functionality. Much more valuable and informative insight should be gained by considering the microbiome from a functional perspective.

Functional redundancy (FR) is a key property of ecosystems⁷. FR describes the ability of multiple taxonomically distinct organisms to contribute in similar ways to an ecosystem through having redundant functional traits^8,9,10. A high level of FR implies that members in a community may be substitutable with little impact on the overall ecosystem functionality⁸. In the human gut microbiome, it has been found that FR is a common event. For example, dietary carbohydrates can be processed by either Prevotella (from the phylum Bacteroidetes) or Ruminococcus (from the phylum Firmicutes), and short-chain fatty acids can be produced by multiple predominant genera: Phascolarctobacterium, Roseburia, Bacteroides, Blautia, Faecalibacterium, Clostridium, Subdoligranulum, Ruminococcus and Coprococcus. Moreover, large-scale, consortium-driven metagenomic projects such as the Human Microbiome Project (HMP) have found that, regardless of the body site, within a healthy population the carriage of microbial taxa varies tremendously, while the gene compositions or functional profiles remain remarkably stable². This is a strong signal of FR in the human microbiomes.

We emphasize that most of the previous studies on FR of human microbiomes are very conceptual, rather than quantitative. Recently, a computational pipeline was developed to quantify the genome-level FR of microbiome samples from whole-metagenome shotgun sequencing data¹¹. This pipeline was based on the genomic content network (GCN), a bipartite graph in which taxa are connected to the genes in their genomes. Importantly, this GCN-based FR calculation was derived without any regard for whether these genes are expressed. In other words, the FR calculated from the GCN only represents the redundancy of potential functions of a microbiome sample (i.e., the genome-level FR), rather than the redundancy of expressed functions (e.g., the proteome-level FR).

In this work, we present a pipeline to quantify the proteome-level FR of microbiome samples from deep metaproteomics data. Metaproteomics is a powerful tool that measures expressed proteins in a microbiome based on liquid chromatography-tandem mass spectrometry (LC-MS/MS) techniques^12,13,14. Our pipeline is based on the construction of the proteomic content network (PCN) for each microbiome sample by linking the taxa to their proteins. For each PCN, its proteome-level FR (${{{{{{\rm{FR}}}}}}}_{p}$) is defined as the part of its taxonomic diversity that cannot be explained by its functional diversity. ${{{{{{\rm{FR}}}}}}}_{p}$ is an informative metric carrying contributions from the presence/absence of the functions, protein abundances of the functions and biomasses of the proteomes. We demonstrate that ${{{{{{\rm{FR}}}}}}}_{p}$ sensitively responds to environmental factors affecting the microbiome.

Results

Construction of proteomic content networks from an ultra-deep metaproteomic approach

We define the proteomic content network (PCN) of a microbiome sample as a bipartite graph connecting each microbial taxon to all expressed functions from the taxon’s proteome. In order to gain the deepest possible understanding of sample-specific PCN of the human gut microbiome, we first developed an ultra-deep metaproteomics approach based on high-pH reversed-phase fractionation¹⁵ and high-resolution LC-MS/MS analysis (Fig. 1a–e and Methods). Briefly, aliquots from gut microbiome samples were subjected to protein extraction and digestion, followed by being separated into 48 fractions at a 1 min interval and pooled at an interval of 12 wells. Using this workflow, we analyzed samples from four individuals’ ascending colons (Supplementary Table S1). Database searches of the resulting 12 LC-MS/MS RAW files of each individual microbiome sample were performed using the MetaPro-IQ approach¹⁶. We first used the Integrated Gene Catalog (IGC) database of the human gut microbiome¹⁷ to perform the database search. On average, 69,280 unique peptides were identified, and 30,686 protein groups were quantified per microbiome sample. Using a “protein-peptide bridge” method (Fig. 1f and Supplementary Note S1), functions that were annotated by protein groups and taxonomy that were identified by unique peptides were linked to constructing the sample-specific PCN (Fig. 1g, h). Since some proteins or peptide sequences are shared between two or more organisms in complex microbial communities, as a trade-off between taxonomic resolution and protein coverage, we computed the PCN on the genus level. Protein-level biomass contributions of each genus is represented by its summed unique peptide abundances (Fig. 1i). In terms of functional annotations, the Kyoto Encyclopedia of Genes and Genomes (KEGG) database has been widely used in functional metrics such as the genomic-level functional redundancy¹¹. However, it is common in metaproteomic studies that a certain proportion of proteins does not have a KEGG annotation. Indeed, in this dataset, there were a total of 50,216 protein groups identified, among which, 46,095 (91.7%) were successfully annotated with clusters of orthologous groups (COGs), while only 37,795 (75.3%) were annotated with KEGG KOs. Therefore, we complemented the KEGG annotations with COG to achieve better coverage (denoted as KEGG-COG annotation; see Supplementary Note S2 for more comparisons). This annotation will be applicable to metaproteomic-based functional redundancy computations without the need for the samples’ metagenomes.

**Fig. 1: Computation of within-sample proteomic-content network and functional, taxonomic profiles using an ultra-deep metaproteomics approach.**

In addition, to facilitate direct comparisons of redundancy or network metrics between the GCN and the PCN, we further used the samples’ paired metagenomes to generate Prodigal-predicted protein sequences as the database to perform another metaproteomic database search. An average of 65,541 unique peptides and 29,392 protein groups per sample were obtained from the search. The Prodigal sequences were blasted against the UHGP database for taxonomic matches. In addition, KEGG-COG annotations were performed. GCNs or PCNs were then computed by summing read counts or protein intensities at each taxon-function incidence (see Methods).

Computation of within-sample proteome-level functional redundancy

We define proteome-level functional redundancy (${{{{{{\rm{FR}}}}}}}_{p}$) of a microbiome sample as the part of the alpha diversity of taxonomic protein biomass contributions (TD_p) that cannot be explained by the alpha functional diversity (FD_p) (Supplementary Fig. S1a):

$${{{{{{\rm{FR}}}}}}}_{p}\equiv {{{{{{\rm{TD}}}}}}}_{p}-{{{{{{\rm{FD}}}}}}}_{p}=\mathop{\sum }\limits_{i=1}^{S}\mathop{\sum }\limits_{j\ne i}^{S}\left(1-{d}_{{ij}}\right){p}_{i}{p}_{j}$$

(1)

where $S$ is the number of taxa in the sample, and ${p}_{i}$ is the proportion of protein-level biomass of taxon $i$ within the sample. The protein-level biomass in the PCN can be approximated by summing up the intensities of unique peptides in each taxon, which has been previously shown to be a good representation of microbiome diversity¹⁸. ${d}_{{ij}}$ denotes the functional distance between taxa $i$ and $j$ measured by the weighted Jaccard distance between their proteomic contents (see Methods), ${{{{{{\rm{TD}}}}}}}_{p}=\mathop{\sum }\nolimits_{i=1}^{S}\mathop{\sum }\nolimits_{j\ne i}^{S}{p}_{i}{p}_{j}=1-{\sum }_{i=1}^{S}{p}_{i}^{2}$ is the alpha taxonomic diversity measured by the Gini-Simpson index, and ${{{{{{\rm{FD}}}}}}}_{p}=\mathop{\sum }\nolimits_{i=1}^{S}\mathop{\sum }\nolimits_{j\ne i}^{S}{d}_{{ij}}{p}_{i}{p}_{j}$ is the alpha functional diversity measured by Rao’s quadratic entropy. Notably, in contrast to computing a genome-level functional redundancy for which the ${d}_{{ij}}$ values can be computed from a reference GCN of all samples in the dataset, for PCN, since protein expression levels can sensitively respond to environmental changes, within-sample proteomes should be used. We illustrate why it is impossible to construct and use a reference PCN in Supplementary Fig. S2.

By normalizing the ${{{{{{\rm{FR}}}}}}}_{p}$ with the microbiome’s alpha taxonomic diversity, we have:

$${{{{{{\rm{nFR}}}}}}}_{p}=\frac{{{{{{{\rm{FR}}}}}}}_{p}}{{{{{{{\rm{TD}}}}}}}_{p}}$$

(2)

We highlight that ${{{{{{\rm{FR}}}}}}}_{p}$ and ${{{{{{\rm{nFR}}}}}}}_{p}$ are complex metrics that are derived from several key components and properties, i.e., functional abundances, taxonomic proteome-level biomass, network topology, and functional distances between different taxa. Using simple conceptual communities, we showed that when members of a community express the same list and abundances of functions, ${{{{{{\rm{nFR}}}}}}}_{p}$ equals to 1 (Supplementary Fig. S1b); while when members of a community have totally different list of functions, ${{{{{{\rm{nFR}}}}}}}_{p}$ equals to 0 (Supplementary Fig. S1c). Other than the two extreme cases, the value ${{{{{{\rm{nFR}}}}}}}_{p}$ falls between 0 and 1 (Supplementary Fig. S1d). We emphasize that ${{{{{{\rm{nFR}}}}}}}_{p}$ of a microbiome sample can be interpreted as the average functional similarity (or overlap) of two randomly chosen members in the sample.

We demonstrate the sensitivity of ${{{{{{\rm{FR}}}}}}}_{p}$ and ${{{{{{\rm{nFR}}}}}}}_{p}$ using in silico communities generated with genomes and proteomes of single bacterial strains (i.e., Phocaeicola (Bacteroides) vulgatus ATCC 8482, Bacteroides ovatus ATCC 8483, Bacteroides uniformis ATCC 8492, Blautia hydrogenotrophica DSM 10507, Escherichia coli DSM 101114). Proteomes of these strains cultured in four different media (basal medium with or without added glucose, sucrose or kestose) were obtained from our previous study¹⁹ (Supplementary Fig. S3a). We first used the genomes and proteomes (in basal media) to generate different three-member communities in silico (Supplementary Fig. S3b and S3c). When all three members belong to Bacteroides or Phocaeicola genera, the community’s genome- and proteome-level functional redundancy were both higher compared with the other combinations. The redundancies decreased as the community becomes more diverse on the genus level. In Supplementary Fig. S3b and S3c, despite genome-level functional redundancy may seem predictive of the proteome-level functional redundancy, we emphasize that, in principle, genome-level functional redundancy only responds to the change of microbial abundances. When we further replaced the proteomes of strains with the ones cultured in the presence of different sugars (and maintained microbial abundances), the levels of ${{{{{{\rm{FR}}}}}}}_{p}$ and ${{{{{{\rm{nFR}}}}}}}_{p}$ showed fluctuations (Supplementary Fig. S3d). This suggests that ${{{{{{\rm{FR}}}}}}}_{p}$ and ${{{{{{\rm{nFR}}}}}}}_{p}$ are sensitive to a community’s functional responses, even induced solely by proteome alterations while microbial abundances are unchanged.

Exploration of PCN topology and its ecological implication

We investigated the topological properties of these PCNs, and compared them with their corresponding GCNs obtained from shotgun metagenomics. The PCNs of metagenome and IGC databases-based search yielded similar depth, both achieved reasonable depths compared with each individual’s respective GCNs (Supplementary Fig. S4). Figure 2a shows a tripartite plot connecting microbial phyla and functional categories annotated according to genes and proteins from one individual microbiome (HM454, IGC database-based search with COG annotations). This demonstrated that while some functional categories (e.g., energy production and conversion (C), carbohydrate metabolism and transport (G) etc.) showed expression from predicted functions in most phyla, some genes linked to functions such as RNA processing and modification (A) and mobilome (X) that were rarely expressed from the genes. Similar results were found for other samples (Supplementary Figs. S5–7). We wondered what is the ecological implication of such selective functional expression from the GCN to the PCN. To investigate this, we explored the community assembly results in silico through a model framework²⁰ that incorporates cross-feeding interactions into the classical MacArthur’s Consumer-Resource Model²¹ (see Methods and Fig. 2b–d). We hypothesized that the topological property of the GCN and PCN such as connectance (i.e., the number of links divided by the number of maximal possible links) strongly influence the assembly results. Therefore, we modeled system dynamics under variable environmental conditions and investigated how the difference in the connectance between the GCN (0.22 ± 0.02; Mean ± SD, N = 4) and the PCN (0.05 ± 0.02; Mean ± SD, N = 4; metagenome database-based search) affects the richness of assembled communities. Simulations of system dynamics show that the model’s resource consumption matrix C following the PCN’s connectance always maintains higher microbiome richness and survival of more diverse species than following the GCN’s connectance, despite variations in dilution rates (Fig. 2e–g), byproduct fractions (Supplementary Fig. S8a–c), and externally supplied nutrient diversity (Supplementary Fig. S8d–f), as well as different ratios between initial species abundances and initial metabolite/resource concentrations (Supplementary Fig. S9).

**Fig. 2: Topology of proteomic content networks (PCNs) implies high richness.**

The intuitive relationship between biodiversity and functional redundancy in a community has been difficult to be quantified²². We sought to explore this relationship by examining the network topology. By visualizing incidence matrices of these PCNs, we observe highly nested structures (Fig. 3a and Supplementary Fig. S10) and found that the Nestedness metric based on Overlap and Decreasing Fill (NODF) were high in the PCNs (NODF = 0.28 ± 0.01; Mean ± SD, N = 4, metagenome database-based search), which are close to those of the respective GCNs (NODF = 0.36 ± 0.05; Mean ± SD, N = 4). Similarly, the PCNs based on IGC database-based search also resulted in high NODF values (0.34 ± 0.01; Mean ± SD, N = 4). We then calculated the degree distributions of genera and functions in the PCNs and the GCNs, respectively. On the functional dimension, similar to previous observations in GCNs¹¹, the degree distributions of functions in both the GCN and PCN have fat tails, represented by some functions being associated with a high number of taxa (Fig. 3b and Supplementary Fig. S10). Similar nested topology and functional degree distributions can be observed in the PCNs generated with the IGC-based search (Supplementary Fig. S11). The high nestedness and fat-tail degree distribution of the PCN together suggests that specialist taxa are playing functional roles that are a subset of active functions from generalist taxa, which further indicates the existence of high redundancy of expressed functions in the human gut microbiome.

**Fig. 3: Topology of proteomic content networks (PCNs) implies high functional redundancy.**

Human gut microbiome has high protein-level FR

Since a potential function of any member in the GCN of the microbiome sample may or may not be expressed under a certain environmental condition, we anticipate that the proteome-level FR (i.e., ${{{{{{\rm{FR}}}}}}}_{p}$ or ${{{{{{\rm{nFR}}}}}}}_{p}$) of any microbiome sample should be no greater than its genome-level FR (i.e., ${{{{{{\rm{FR}}}}}}}_{g}$ or ${{{{{{\rm{nFR}}}}}}}_{g}$). Indeed, as shown in Fig. 4a, b, for the four individual microbiomes, we found that ${{{{{{\rm{FR}}}}}}}_{p}$(or ${{{{{{\rm{nFR}}}}}}}_{p}$) preserved a high level of the ${{{{{{\rm{FR}}}}}}}_{g}$ (or ${{{{{{\rm{nFR}}}}}}}_{g}$; no significant difference by paired t test) as a result of maintenance of the difference between the corresponding TD and FD values (Fig. 4c, d).

**Fig. 4: Naturally occurring proteomic-content networks contribute to high functional redundancy in human gut microbiomes.**

How does the topology of the PCN contribute to the observed functional redundancy? To answer this question, we created four null networks by randomizing natural PCNs to change the network’s inherent structural properties such as degree and nestedness. Considering the fact that PCNs are expressed from the GCNs, we randomized the PCNs under the background of each corresponding GCN (i.e., all edges on a randomized PCN are only selected from the edges of the sample’s GCN). For the four null PCNs, Null-1 PCN is a completely randomized network under the GCN background, Null-2 PCN preserves the taxon degrees, Null-3 PCN preserves the function degrees, and Null-4 PCN preserves both taxon and function degrees (Fig. 4e). Each of the four null networks was randomized 10 times to validate the reproducibility of the results. We show that all four null networks had significantly lower ${{{{{{\rm{FR}}}}}}}_{p}$ and ${{{{{{\rm{nFR}}}}}}}_{p}$ compared to those of the natural PCNs (Fig. 4f). Figure 4f also suggests that both taxon and function degrees contributed to FR, as for each individual, there’s an increasing pattern of FR and nFR values from Null-1 to Null-4 PCNs. However, both taxon and function degrees are the same as the natural PCN in Null-4 PCN. Notably, there was a 18% decrease of nestedness in Null-4 PCNs (NODF_Null-4 = 0.23 ± 0.03; Mean ± SD, N = 40) in comparison to the natural PCN; in contrast, the corresponding decrease of ${{{{{{\rm{FR}}}}}}}_{p}$ was 55% ± 11% (Mean ± SD, N = 4), indicating that the network topology component of ${{{{{{\rm{FR}}}}}}}_{p}$ does not fully explain the high functional redundancy. The functional relationships between specific pairs of taxa are also an important factor that shapes the high ${{{{{{\rm{FR}}}}}}}_{p}$ in human gut microbiomes. We observed that all four null PCNs had altered distribution of d_ij values from the natural PCNs (Fig. 4g).

Protein-level FR outcompetes diversity indices in detecting microbiome responses to environmental factors

We next examined whether different metaproteomic approaches could affect the network properties of gut microbiomes’ PCNs and values of ${{{{{{\rm{FR}}}}}}}_{p}$. Routine metaproteomic analyses are often performed without fractionation. In addition, samples are analyzed with different analytical protocols, different parameters and using different models of LC-MS/MS platforms, etc. Therefore, we evaluated four previously published datasets, briefly referred to as SISPROT²³, RapidAIM²⁴, Berberine²⁵ and IBD²⁶ datasets, which vary considerably in the experimental protocol/platform and in the types of environmental factors (xenobiotics, biogeography, diseases status etc.) being interrogated (see details in Supplementary Tables S1–S3). It was notable that identification depths of these four datasets vary markedly, from 5612 protein groups and 4345 peptides per sample (Berberine) to 20,558 protein groups and 44,955 unique peptides per sample (SISPROT) (Supplementary Table S4). We found that PCNs in all the four datasets displayed highly similar topological structures with our new deep metaproteomics dataset, i.e., highly nested structure, and heterogeneous degree distributions of both taxa and functions (Supplementary Figs. S11 versus S12).

Given that topological structures of PCN appeared to be universal across platforms, we believe that ${{{{{{\rm{FR}}}}}}}_{p}$ is applicable to different metaproteomic datasets. Indeed, we found that ${{{{{\rm{n}}}}}}{{{{{{\rm{FR}}}}}}}_{p}$ of the four datasets were comparable to our four ultra-deep metaproteomics samples (Fig. 5a, b, f–h versus Fig. 4b). In patients diagnosed with inflammatory bowel disease (IBD), ${{{{{\rm{n}}}}}}{{{{{{\rm{FR}}}}}}}_{p}$ levels were significantly lower than that of the non-IBD control individuals. There was no significant difference between the two different IBD subtypes Crohn’s disease (CD) and ulcerative colitis (UC) (Fig. 5a). A significant decrease in ${{{{{\rm{n}}}}}}{{{{{{\rm{FR}}}}}}}_{p}$ was found in inflamed regions from the terminal ileum (Fig. 5b), the diagnosis factor, as well as the inflammation factor which was nested in the biogeography factor significantly contributed to variance in ${{{{{\rm{n}}}}}}{{{{{{\rm{FR}}}}}}}_{p}$ (Supplementary Table S5). We next selected to compare the TI region between CD and the control microbiomes, as inflammation is likely to develop in the terminal ileum of CD patients²⁷. ${{{{{\rm{n}}}}}}{{{{{{\rm{FR}}}}}}}_{p}$ significantly differentiated between inflamed and uninflamed regions (Fig. 5c, Wilcoxon rank-sum test, p < 0.0001), while ${{{{{{\rm{TD}}}}}}}_{p}$ showed no significant difference between the two classes (Fig. 5d), and ${{{{{{\rm{FD}}}}}}}_{p}$ showed significant difference but at the p < 0.01 level (Fig. 5e).

**Fig. 5: Functional redundancy, taxonomic and functional diversity comparisons in different metaproteomics datasets.**

Inter-individual differences in ${{{{{\rm{n}}}}}}{{{{{{\rm{FR}}}}}}}_{p}$ levels were observed in the SISPROT, RapidAIM and Berberine datasets (Fig. 5f–h). Similarly, we detected significant differences in ${{{{{{\rm{TD}}}}}}}_{p}$ and ${{{{{{\rm{FD}}}}}}}_{p}$ between individual microbiomes (Wilcoxon rank-sum test; Fig. 5i, j). In the RapidAIM and Berberine datasets, a few xenobiotic compounds reduced ${{{{{\rm{n}}}}}}{{{{{{\rm{FR}}}}}}}_{p}$ levels (Fig. 5k and Supplementary Fig. S13a). Among these, the antibiotic rifaximin showed the most impact on the individual microbiomes with ${{{{{\rm{n}}}}}}{{{{{{\rm{FR}}}}}}}_{p}$ values decreased by 21.2 ± 10.7% (Mean ± SD, N = 5). Two-way ANOVA suggested that both the inter-individual variation and effect of compounds significantly contributed to ${{{{{\rm{n}}}}}}{{{{{{\rm{FR}}}}}}}_{p}$ variance (Supplementary Tables S6–S7). In contrast, ${{{{{{\rm{TD}}}}}}}_{p}$ and ${{{{{{\rm{FD}}}}}}}_{p}$ values were not as sensitive in detecting significant responses to drug treatments (Fig. 5l,m and Supplementary Fig. S13b,c). These together suggest that ${{{{{\rm{n}}}}}}{{{{{{\rm{FR}}}}}}}_{p}$ outcompetes diversity indices ${{{{{{\rm{TD}}}}}}}_{p}$ and ${{{{{{\rm{FD}}}}}}}_{p}$ in sensitively detecting microbiome responses to environmental factors.

Similar to the ${{{{{\rm{n}}}}}}{{{{{{\rm{FR}}}}}}}_{p}$ results, significant differences in NODF values were observed (Supplementary Fig. S14) in agreement with the within-sample ${{{{{\rm{n}}}}}}{{{{{{\rm{FR}}}}}}}_{p}$, which further suggests that a nested topological structure contributes to the values of ${{{{{{\rm{nFR}}}}}}}_{p}$ in a microbiome sample.

Alteration of between-proteome functional distances in disease and compound-treated microbiomes

To further find out the contribution of functional distances behind the response of within-sample ${{{{{\rm{n}}}}}}{{{{{{\rm{FR}}}}}}}_{p}$ to environmental factors, we examined the functional distance d_ij between proteomes in each individual microbiome sample of the datasets. For the IBD dataset, PERMANOVA test showed that d_ij values differed significantly between diagnosed patients (especially inflamed regions) and the non-IBD controls (Supplementary Table S8). Overall, the d_ij distributions in both UC and CD samples showed a rightward shift from the control samples (Fig. 6a). Moreover, there was a rightward shift of the d_ij distribution from healthy to inflamed gut regions (Fig. 6b). The volcano plot further showed that most of the d_ij values were increased in the presence of inflammation (Fig. 6c), suggesting a significant contribution of between-proteome functional distance increase to the overall decrease of ${{{{{{\rm{nFR}}}}}}}_{p}$ in IBD samples. We visualized the d_ij values of genus pairs that significantly (at the p < 0.001 level by Wilcoxon rank-sum test) altered in inflammation using a heatmap coupled with hierarchical clustering (Supplementary Fig. S15a). Cluster 1 samples, which were mostly from inflamed regions of patients diagnosed with UC or CD, showed an overall increase of d_ij values between proteome pairs. We showed that Cluster 1 was significantly higher in ${{{{{\rm{n}}}}}}{{{{{{\rm{FR}}}}}}}_{p}$, despite both its TD and FD were significantly lower than those in Cluster 2 (Supplementary Fig. S15b).

**Fig. 6: Between-proteome functional distances in IBD and drug-treated microbiomes.**

In contrast to the disease-induced alteration of d_ij distributions, we also visualized the response of d_ij values to xenobiotics and discovered that overall the d_ij values have strong individual signatures and were significantly affected by the xenobiotic factor (Fig. 6d and Supplementary Table S9). We quantified the Jensen-Shannon (J-S) divergence of d_ij distributions between drug treatments and the DMSO control (Fig. 6e). These results showed that ciprofloxacin, berberine, rifaximin, FOS, metronidazole, isoniazid, diclofenac and flucytosine significantly increased J-S divergence with the DMSO when compared to most other compounds (Fig. 6f). This was in agreement with our previous findings that these compounds (except flucytosine) resulted in global alterations in individual microbiome functionality²⁴. Interestingly, consistent rightward shifts of d_ij values were observed in response to a subset of compounds (Supplementary Fig. S16), which was similar to the pattern observed in the subset of IBD samples. Similar results were observed in the Berberine dataset (Supplementary Fig. S17 and Supplementary Table S10).

Discussion

A systems-oriented approach to understanding microbial ecosystems can be employed by constructing networks^11,28,29. Nevertheless, there has been a substantial gap between interpretating networks constructed from metagenomics and understanding microbiomes’ active functionalities. Recently, metaproteomics has experienced exponential growth in its identification coverage²³, providing invaluable deep insights into the expressed functions of microbiomes. In this study, we demonstrated a method to use metaproteomics dataset to gain a system-level understanding of microbiomes’ functionality by computing the functional redundancy of the proteomic content networks.

Using an ultra-deep metaproteomics approach, we showed that the human gut microbiome’s taxon-function network on the proteome level is highly nested. In a microbiome PCN, the high nestedness of the network implies that specialist taxa tend to play functional roles that are a subset of active functions from generalist taxa^30,31 Such functional network structures have been frequently found in macro-ecosystem networks of mutualistic interactions (food-webs)³². We found that the within-sample FR profiles differed markedly between expressed protein functions and genomic functional capacities. Using an ecological model for microbial communities, we demonstrated that such a selective functional expression and the resulting PCN topology favor high richness during community assembly. In the meantime, PCN still maintains high nestedness which contributes to high functional redundancy.

We found that PCNs built by different metaproteomics platforms showed universal properties: datasets generated by shallower analysis approaches still capture the highly nested topology of PCN. This allowed us to make use of routinely generated metaproteomics datasets to observe the effects of multiple environmental factors, such as inter-individual variation, xenobiotics, disease, and biogeography on the ${{{{{{\rm{FR}}}}}}}_{p}$ of the gut microbiome. We first showed that compounds with pharmacological activity can affect the redundancy of expressed functions in individual microbiomes. Overall distributions of functional distances between genera pairs were changed in response to some compounds, which was related to changes in a subset of between-genera functional distances. This suggests that xenobiotic compounds may affect ${{{{{{\rm{FR}}}}}}}_{p}$ by partially modifying the functional interrelationship (d_ij) between proteomes.

Interestingly, in contrast to a partial modification of d_ij between proteomes, there was a global shift of ${{{{{{\rm{FR}}}}}}}_{p}$ and d_ij in a subset of the IBD microbiomes. This finding may support the hypothesis of alternative stable states (bi-stability or multi-stability) in the gut ecosystem^33,34. One frequently discussed mechanism behind these alternative states has been the continuous exposure of the microbiome to an altered environmental parameter³⁵. An inflamed area in the gut will have a reduced mucus layer³⁶ and elevated host defense responses²⁶. The host mucus layer is a nutritional source of cross-feeding in the gut microbiome^37,38,39. Loss of this layer decreases the number of available nutrients in the gut environment. We show with our population dynamics model that in agreement with classical ecology⁴⁰, the reduction in the number of available environmental nutrients decreases the richness of the microbiome. This could be a factor that contributes to the alteration of functional states. In addition, host defense responses attenuate microbial oxidative stress responses (Pacheco, Osborne and Segrè, 2021), which have been associated with microbiome dysfunction⁴¹. A decrease in within-sample FR has been associated with impaired microbiome stability and resilience⁴². Resilient microbiota resists external pressures (e.g., antibiotics/dietary shifts) and returns to their original state. Being non-resilient, a microbiome is likely to shift its composition permanently and stay in an altered new state instead of restoring to its original state of equilibrium^43,44. Collectively, we disassembled the ${{{{{{\rm{FR}}}}}}}_{p}$ into one-to-one comparisons of between-taxa functional activities, and found that a global shift in functional roles of microbes toward a more heterogeneous pattern was a factor driving the decrease of ${{{{{{\rm{FR}}}}}}}_{p}$ and alteration of states in inflamed areas in IBD patients.

A current limitation of our work is the use of genus-level proteomes. This is because many identified protein groups are shared between different species due to insufficient coverage of species-unique peptide sequences. Future development in the increase of sequence coverage and bioinformatics are warranted. Techniques such as matrix/tensor decomposition⁴⁵ or machine learning⁴⁶ may support species-level proteome-resolved metaproteomics for a deeper insight into microbiome ${{{{{{\rm{FR}}}}}}}_{p}$. Nevertheless, based on the current resolution, we argue that our approach still provides invaluable information on how selective functional expression among taxa shapes the redundancy of expressed functions in a human gut microbiome, and how this redundancy can be affected by the environment. Future work can focus on revealing the relationship between ${{{{{{\rm{FR}}}}}}}_{p}$ and the resilience of a microbiome, so as to gain deeper mechanistic insight for the development of ecological recovery strategies for human gut microbiomes to combat diseases.

Methods

Protein extraction and digestion

The sample collection was approved by the Research Ethics Board of the Children’s Hospital of Eastern Ontario (CHEO), Ottawa, ON, Canada. Written informed consent form was obtained from their parents. Aspirates of the proximal ascending colon were obtained by colonoscopy, with more details described by Zhang et al.¹⁶. Protein extraction and digestion of the individual gut aspirate samples were performed as described previously⁴⁷, with minor modifications. Frozen aliquots of aspirate samples HM454, HM455, HM466 and HM503 were thawed and subjected to differential centrifugation for microbial cell purification: the samples were first centrifuged at 300 g, 4 °C for 5 min to remove debris; the resulting supernatant was centrifuged at 14,000 g for 20 min to pellet microbial cells; the pellets were then washed three times by resuspending in cold phosphate-buffered saline (PBS) and centrifuging at 14,000 g, 4 °C for 20 min. Next, the washed microbial cell pellets were resuspended in a cell lysis buffer containing 4% sodium dodecyl sulfate (w/v), 8 M urea, 50 mM Tris-HCl (pH = 8.0), and one Roche cOmplete™ mini tablet per 10 mL buffer, followed by ultra-sonication (30 s on, 1 min off, amplitude of 25%, two rounds) using a Q125 Sonicator (Qsonica, LLC). Cell debris was then removed by a centrifugation at 16,000 g, 4 °C for 10 min.

Each of the protein extract was then precipitated in five times its volume of precipitation solution (acetone: ethanol: acetic acid = 49.5: 49.5: 1, v-v:v) at −20 °C overnight. The precipitated proteins were pelleted by centrifuging at 16,000 g, 4 °C for 20 min, followed by being washed with ice-cold acetone for three times to remove excess SDS that may affect trypsin activity. Next, the washed proteins were resuspended in a buffer containing 6 M urea and 1 M Tris-HCl (pH = 8.0). Protein concentration was determined by the DC™ assay (Bio-Rad Laboratories, Canada) following the manufacturer’s manual.

Finally, proteins were subjected to an in-solution tryptic digestion. The samples were reduced in 10 mM dithiothreitol (DTT) at 56 °C for 30 min, then were alkylated by 20 mM iodoacetamide (IAA) at room temperature in dark for 40 min. The samples were then diluted 10 times with 1 M Tris-HCl buffer (pH = 8.0), followed by trypsin digestion (at a concentration of 1 μg trypsin per 50 μg proteins) at 37 °C for 24 h. The digests were then acidified to pH = 3 using 10% formic acid, followed by a desalting step using Sep-Pak C18 Cartridge (Waters, Milford, MA, USA). The cartridges were first activated using 100% acetonitrile, and then equilibrated using 0.1% formic acid (v/v) before passing samples through the columns for three times. Samples bonded to the cartridges were then washed using 0.1% formic acid (v/v), and finally the samples were eluted from the cartridges using the elution solution containing 80% acetonitrile and 0.1% formic acid (v/v).

High-pH reversed-phase fractionation

Eluted samples were evaporated in a SAVANT SPD1010 SpeedVac Concentrator (Thermo Fisher Scientific, USA), and resuspended in 0.1% formic acid (v/v) to a concentration of 1 μg/μL for high-pH reversed-phase fractionation following a previous workflow¹⁵, with minor adaptations: 30 μL sample were loaded to a ZORBAX Bonus-RP column (with 3.5 µm C18 resins, ID 2.1 mm, length 50 mm; Agilent Technologies, USA), and fractionated with an Agilent 1200 series HPLC System (Agilent Technologies, Germany). A 60 min gradient consisting of 5–35% buffer B (v/v) in 1–42 min, and 35–50% buffer B in 42–45 min at a flow rate of 100 μL/min was used for the fractionation. Here, 10 mM ammonium formate was used as buffer A, and 10 mM ammonium formate with 90% acetonitrile (v/v) was used as buffer B. Ammonium hydroxide was used to adjust the pH of both buffers A and B to 10. Sample fractions were continuously collected into 96 well plates by an Agilent 1100 Series Micro-FC G1364D micro fraction collector (Agilent Technologies, Germany). For each sample, 48 fractions were collected into different wells at 1 min intervals over the first 48 min. The samples were then pooled by combining four fractions at an interval of 12 wells, resulting in 12 fractionated samples per individual microbiome (Fig. 1a).

HPLC-ESI-MS/MS analysis

After evaporation and resuspension in 0.1% formic acid, each fraction was analyzed by HPLC-ESI-MS/MS consisting of an UltiMate 3000 RSLCnano system (Thermo Fisher Scientific, USA) and an Orbitrap Exploris 480 mass spectrometer (Thermo Fisher Scientific, USA). A 60 min gradient of 5– 35% (v/v) buffer B at a 300 μL/min flow rate was used to separate the peptides on a tip column (75 μm inner diameter × 10 cm) packed with reverse phase beads (3 μm/120 Å ReproSil-Pur C18 resin, Dr. Maisch GmbH, Ammerbuch, Germany). Here, 0.1% formic acid (v/v) was used as buffer A, and 0.1% formic acid with 80% acetonitrile (v/v) was used as buffer B. The MS full scan ranging from 350 to 1400 m/z was recorded in profile mode with the resolution of 60,000. Data-dependent MS/MS scan was performed with the 12 most intense ions with the resolution of 15,000. Dynamic exclusion was enabled for a duration of 30 s with a repeat count of one. Raw data was collected using Thermo Fisher Scientific Xcalibur™ Software (version 4.4).

Database search

Database search for the fractionated metaproteomics samples was performed based on the MetaPro-IQ workflow¹⁶. Briefly, a two-step database search was first performed using X!Tandem (version 2015.12.15.2). All sample fractions were searched against the IGC database of human gut microbiome (http://meta.genomics.cn/)¹⁷ to generate a reduced database, then a classical target-decoy database search was performed using the reduced database to generate confidently identified peptide and protein lists based on a strict filtering criteria of FDR = 0.01. The protein lists for all sample fractions were then combined, and duplicated proteins were removed to generate a combined non-redundant FASTA database using an in-house PERL script. Next, MaxQuant (version 1.5.2.8) was used to generate quantified protein groups and peptides in each sample using the combined non-redundant FASTA database. Carbamidomethylation of cysteine was set as a fixed modification, oxidation of methionine and N-terminal acetylation were set as potential modifications. The maximum missed cleavages of trypsin was set as two. The resulting peptide and protein group lists from MaxQuant were then inputted to MetaLab (version 1.2.0) for taxonomic analysis and functional annotation⁴⁸. For the taxonomic analysis, identified peptides were mapped to taxonomic lineages based on a built-in pep2tax database in MetaLab. Functional annotation to COG⁴⁹ was performed using Diamond (version0.8.35). In addition, KEGG KOs were annotated using GhostKOALA⁵⁰. The dataset was deposited to the ProteomeXchange Consortium (http://www.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD027297.

For the metagenomics analysis, matched metagenome data of samples HM454, HM455, HM466 and HM503 were obtained from the previous MetaPro-IQ study¹⁶, accessible from the NCBI sequence read archive (SRA) under the accession of SRP068619. To enable the comparison between GCN and PCN, we used the metagenomics data to generate a protein database for another database search. Briefly, the raw reads were processed using MOCAT for trimming and quality filtering, and for human reads removal as previously described¹⁶. Next, reads were assembled by MEGAHIT (https://github.com/voutcn/megahit) into contigs. Gene prediction from the contigs were performed using Prodigal (https://github.com/hyattpd/Prodigal) to generate FASTA files. Combined FASTA files were then used for metaproteomic database search following the MetaPro-IQ pipeline.

Metaproteomic and metagenomic content networks

For the generation of proteomic content networks (PCNs), we developed a ‘peptide-protein bridge’ approach for the generation of IGC-based PCNs (see details in Supplementary Notes 1).

For the ‘peptide-protein bridge’ approach, we matched functions to taxa based on four database search output files, i.e., peptides, protein groups, taxonomy and function. The protein groups table (generated by MaxQuant) contains information on the identified proteins, and identifiers of peptide sequence associated to each protein group. The taxonomy table generated by MetaLab contains peptide sequences and their corresponding lowest common ancestor (LCA) taxa. The function table contains identified proteins and their corresponding functional annotations. Proteins were first matched to KOs, then for those that missed KO annotations, COG information were supplemented. Therefore, at first, we matched the protein groups to taxa through the peptides. Next, functions of the proteins were combined to the list to generate a taxon-to-function table that was bridged by the peptide-protein identification relationship. Protein group intensity was used as the quantification information in PCNs.

To generate GCNs from the metagenomics result, the same functional annotation method was used to annotate identified proteins to KEGG-COG. The taxonomic information of proteins was obtained by blasting the Prodigal-predicted sequences against the UHGP V2.0 database⁵¹. The count of raw reads corresponding to each protein was used as the quantification information in GCNs. Metagenomics-database search based PCNs were generated accordingly using the taxonomic and functional annotation strategy as the GCN.

Next, a PCN of S taxa and F functions can then be represented by an S × F incidence matrix ${{{{{\bf{P}}}}}}=[{{{{{{\rm{P}}}}}}}_{{{{{{\rm{ia}}}}}}}]$, where ${{{{{{\rm{P}}}}}}}_{{{{{{\rm{ia}}}}}}}\ge 0$ is the total intensity of proteins of function-a in taxon-i normalized by the total intensities of functional proteins in taxon-i. Similarly, the GCN can then be represented as ${{{{{\bf{G}}}}}}=[{{{{{{\rm{G}}}}}}}_{{{{{{\rm{ia}}}}}}}]$, where ${{{{{{\rm{G}}}}}}}_{{{{{{\rm{ia}}}}}}}\ge 0$ is the raw read counts of proteins of function-a in taxon-i normalized by the total counts of raw reads in taxon-i.

Calculation of functional distance and functional redundancy

Weighted Jaccard distance d_ij between metagenomic (or metaproteomic) contents of taxon-i and j can then be calculated with the GCN and PCN profiles ${{{{{\bf{G}}}}}}$ and ${{{{{\bf{P}}}}}}$, respectively, as described previously (Tian et al.¹¹,). For GCN, we have

$${d}_{{ij}}=1-\frac{{\sum }_{a}{{\min }}({G}_{{ia}},{G}_{{ja}})}{{\sum }_{a}{{\max }}({G}_{{ia}},{G}_{{ja}})},$$

(3)

and for PCN, we have

$${d}_{{ij}}=1-\frac{{\sum }_{a}{{\min }}({P}_{{ia}},{P}_{{ja}})}{{\sum }_{a}{{\max }}({P}_{{ia}},{P}_{{ja}})}.$$

(4)

The relative abundance of taxon-i in each community was denoted as ${p}_{i}$. In each metagenomics sample, ${p}_{i}$ was quantified using MetaPhlAn3 with default settings. In each metaproteomics sample, ${p}_{i}$ was quantified using the total abundances of unique peptides corresponding to taxon-i.

With the ${d}_{{ij}}$ and ${p}_{i}$ values, within-sample FR of the metagenomic and the metaproteomic profiles, denoted as ${{{{{{\rm{FR}}}}}}}_{g}$ and ${{{{{{\rm{FR}}}}}}}_{p}$, respectively, were then calculated according to Eq. (1) given in the Results section.

Consumer-resource model with cross-feeding interactions

Consider a microbiome with a resource pool of M nutrients and a pool of S taxa to start the community assembly, each taxon-$i$ consumes resource-$\alpha$ with a consumption flux of ${J}_{i\alpha }^{{{{{{\rm{in}}}}}}}={C}_{i\alpha }{R}_{\alpha }$ (Fig. 2b), where ${C}_{i\alpha }$ is the resource consumption rate of taxon-$i$ on resource-α (Fig. 2c), and ${R}_{\alpha }$ represents the concentration of resource-$\alpha$. Internal metabolic process of taxon-$i$ converts resource-$\alpha$ into two fractions, i.e., a fraction $l$ of consumed resources for byproduct generation, and the remaining fraction $(1-l)$ contributes to the biomass increase of taxon-$i$ (Fig. 2b). Once resource-α is consumed, multiple byproducts (which are also resources) are generated and this is encoded in the byproduct generation matrix $P$, which describes whether resource-α can be converted to a byproduct or resource-$\beta$ (Fig. 2d). To conserve the total amount of resources, we assumed that $\mathop{\sum}\nolimits_{\beta }{P}_{\alpha \beta }=1$. Overall, the production flux of resource-$\alpha$ by taxon-$i$ is ${J}_{i{\alpha }}^{{{{{{\rm{out}}}}}}}=l\mathop{\sum}\nolimits_{\beta }{P}_{\alpha \beta }\,{J}_{i{\beta }}^{{{{{{\rm{in}}}}}}}$ which sums over all byproduct generation activities from all consumed resources. Resource-$\alpha$ is externally supplied with a supply rate ${h}_{\alpha }$. All resources and microbes are diluted by the same dilution rate $D$. Overall, the dynamics of the resource concentrations (${R}_{\alpha }$, $\alpha=1,\ldots,M$) and taxa abundances (${N}_{i}$, $i=1,\ldots,S$) are given by a set of coupled ordinary differential equations:

$$\frac{d{R}_{\alpha }}{dt}={h}_{\alpha }-D{R}_{\alpha }+\mathop{\sum}\limits_{i}{N}_{i}({J}_{i\alpha }^{{{{{{\rm{out}}}}}}}-{J}_{i\alpha }^{{{{{{\rm{in}}}}}}}),$$

(5)

$$\frac{d{N}_{i}}{dt}={N}_{i}\left[Y(1-l)\mathop{\sum}\limits_{\alpha }{J}_{i\alpha }^{{{{{{\rm{in}}}}}}}-D\right].$$

(6)

$Y$ is the resource yield. For simplicity, we assume $Y$ is the same for all resources across taxa. Note that the model in Marsland et al.²⁰ considers two separate dilution rates for resources and microbes. Here, under the assumption of the fed-batch culture, dilution rate $D$ is the same for both resources and microbes.

Simulations of the Consumer-Resource Model with cross-feeding interactions were performed by considering dynamics of 100 microbial taxa and 100 resources. To simulate functionalities represented by the GCN and PCN, the consumption matrices that derive from the GCN (${C}^{{GCN}}$) or the PCN (${C}^{{PCN}}$) are assumed to have the same connectance as the GCN and the PCN respectively. More specifically, we used a binomial distribution with the probability equal to the connectance of the GCN to determine the number of consumable resources for each species and thus formed the incidence matrix of ${C}^{{GCN}}$. The consumption matrix ${C}^{{PCN}}$ is generated via a subsampling of the ${C}^{{GCN}}$ with the subsampling probability to keep the connectance of ${C}^{{PCN}}$ the same as the connectance of the PCN. The consumption rates are drawn from the uniform distribution ${{{{{\mathscr{U}}}}}}[{{{{\mathrm{0,1}}}}}]$. A universal byproduct generation from consumed nutrients is assumed and its connectance is assumed to be 50%. The entries of byproduct generation matrix are drawn from the uniform distribution between 0 and 1 and even eventually normalized for each consumed nutrient (i.e., $\mathop{\sum}\nolimits_{\beta }{P}_{\alpha \beta }=1$). Simulations were performed 100 times for each comparison of one pair in the violin plots. For one pair of simulations linked by a thin gray line in the violin plots, two simulations are performed using the same parameters (such as the same byproduct generation matrix) except for the consumption matrix (i.e., ${C}^{{GCN}}$ and ${C}^{{PCN}}$). The richness of the assembled communities is counted as the number of microbial taxa that have positive abundances at the end of the simulation.

Statistical analysis and visualization

The statistical details of analysis can be found in the figure legends and in the main texts, including the statistical tests used and significance criteria. Computation of GCN, PCN and functional redundancy were performed using in-house Python codes. NODF values were computed using the R package RInSp. Jensen-Shannon divergence was calculated using the R package textmineR. Two-way ANOVA was performed using R function aov(). PERMANOVA tests were performed using R packages “vegan” and “BiodiversityR”. Kruskal–Wallis and Wilcoxon rank-sum tests were performed using R functions kruskal.test() and wilcox.test(), respectively. Network incidence matrices, degree distributions, bar plots, box plots, and violin plots were visualized using the R package ggplot2. Unipartite networks were visualized using the R package igraph. Tripartite networks were visualized using the R package networkD3. Heatmaps were visualized using the R package pheatmap. Volcano plot was analyzed by MetaboAnalyst (version 4.0) under non-parametric test setting. The interactive webpage (https://shiny2.imetalab.ca/shiny/rstudio/PCN_visualizer/) for visualization PCNs of metaproteomic datasets analyzed in this paper was created using the R packages shiny and shinydashboard. The Consumer-Resource Model model was implemented in Python 3.8 and Python packages Pandas, Numpy, Seaborn, and matplotlib.pyplot were used.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The ultra-deep metaproteomics datasets were deposited to the ProteomeXchange Consortium (http://www.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD027297. Database search outputs from the SISPROT²³, RapidAIM²⁴, Berberine²⁵ and IBD²⁶ studies have been previously deposited to the ProteomeXchange Consortium with the dataset identifiers PXD005619, PXD012724, PXD015934 and PXD007819, respectively. Proteomics dataset of the cultured singles strain samples (Wang et al.¹⁹,) has been previously deposited to the ProteomeXchange Consortium with the dataset identifier PXD037923. The four metagenomic datasets matching the ultra-deep metaproteomics datasets were obtained from the previous MetaPro-IQ study¹⁶, accessible from the NCBI sequence read archive (SRA) under the accession of SRP068619. Source data are provided with this paper.

Code availability

Custom codes for the construction of PCN and calculation of ${{{{{{\rm{FR}}}}}}}_{p}$ are available at GitHub: https://github.com/yvonnelee1988/Metaproteome_FRp.

References

Gilbert, J. A. & Lynch, S. V. Community ecology as a framework for human microbiome research. Nat. Med. 25, 884–889 (2019).
Article CAS PubMed PubMed Central Google Scholar
The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Article ADS PubMed Central Google Scholar
Clooney, A. G. et al. Ranking microbiome variance in inflammatory bowel disease: a large longitudinal intercontinental study. Gut 70, 499–510 (2021).
Article CAS PubMed Google Scholar
Wang, Z. et al. Altered diversity and composition of the gut microbiome in patients with cervical cancer. AMB Express 9, 40 (2019).
Article PubMed PubMed Central Google Scholar
Ren, Z. et al. Gut microbiome analysis as a tool towards targeted non-invasive biomarkers for early hepatocellular carcinoma. Gut 68, 1014–1023 (2019).
Article CAS PubMed Google Scholar
Johnson, K. V.-A. & Burnet, P. W. J. Microbiome: Should we diversify from diversity? Gut Microbes 7, 455–458 (2016).
Article CAS PubMed PubMed Central Google Scholar
Loreau, M. Does functional redundancy exist? Oikos 104, 606–611 (2004).
Article Google Scholar
Root, R. B. The niche exploitation pattern of the blue-gray gnatcatcher. Ecol. Monogr. 37, 317–350 (1967).
Article Google Scholar
Louca, S. et al. Function and functional redundancy in microbial systems. Nat. Ecol. Evol. 2, 936–943 (2018).
Article PubMed Google Scholar
Wu, G., Zhao, N., Zhang, C., Lam, Y. Y. & Zhao, L. Guild-based analysis for understanding gut microbiome in human health and diseases. Genome Med. 13, 22 (2021).
Article PubMed PubMed Central Google Scholar
Tian, L. et al. Deciphering functional redundancy in the human microbiome. Nat. Commun. 11, 6217 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Kleiner, M. Metaproteomics: much more than measuring gene expression in microbial communities. mSystems 4, e00115–19 (2019).
Article CAS PubMed PubMed Central Google Scholar
Li, L. & Figeys, D. Proteomics and metaproteomics add functional, taxonomic and biomass dimensions to modeling the ecosystem at the mucosal-luminal interface. Mol. Cell. Proteom. 19, 1409–1417 (2020).
Article CAS Google Scholar
Salvato, F., Hettich, R. L. & Kleiner, M. Five key aspects of metaproteomics as a tool to understand functional interactions in host-associated microbiomes. PLOS Pathog. 17, e1009245 (2021).
Article CAS PubMed PubMed Central Google Scholar
Batth, T. S., Francavilla, C. & Olsen, J. V. Off-line high-ph reversed-phase fractionation for in-depth phosphoproteomics. J. Proteome Res. 13, 6176–6186 (2014).
Article CAS PubMed Google Scholar
Zhang, X. et al. MetaPro-IQ: a universal metaproteomic approach to studying human and mouse gut microbiota. Microbiome 4, 31 (2016).
Article CAS PubMed PubMed Central Google Scholar
Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. 32, 834–841 (2014).
Article CAS PubMed Google Scholar
Kleiner, M. et al. Assessing species biomass contributions in microbial communities via metaproteomics. Nat. Commun. 8, 1558 (2017).
Article ADS PubMed PubMed Central Google Scholar
Wang, T., Li, L., Figeys, D. & Liu, Y.-Y. Pairing metagenomics and metaproteomics to pinpoint ecological niches and metabolic essentiality of microbial communities. bioRxiv 2022.11.04.515228 https://doi.org/10.1101/2022.11.04.515228 (2022).
Marsland, R. III et al. Available energy fluxes drive a transition in the diversity, stability, and functional structure of microbial communities. PLOS Comput. Biol. 15, e1006793 (2019).
Article CAS PubMed PubMed Central Google Scholar
MacArthur, R. Species packing and competitive equilibrium for many species. Theor. Popul Biol. 1, 1–11 (1970).
Article CAS PubMed Google Scholar
Ulanowicz, R. E. Biodiversity, functional redundancy and system stability: subtle connections. J. R. Soc. Interfac. 15, 20180367 (2018).
Article Google Scholar
Zhang, X. et al. Deep metaproteomics approach for the study of human microbiomes. Anal. Chem. 89, 9407–9415 (2017).
Article CAS PubMed Google Scholar
Li, L. et al. RapidAIM: a culture- and metaproteomics-based Rapid Assay of Individual Microbiome responses to drugs. Microbiome 8, 33 (2020b).
Article CAS PubMed PubMed Central Google Scholar
Li, L. et al. Berberine and its structural analogs have differing effects on functional profiles of individual gut microbiomes. Gut Microbes 11, 1348–1361 (2020a).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X. et al. Metaproteomics reveals associations between microbiome and intestinal extracellular vesicle proteins in pediatric inflammatory bowel disease. Nat. Commun. 9, 2873 (2018).
Article ADS PubMed PubMed Central Google Scholar
Park, I. et al. Characterization of terminal-ileal and colonic Crohn’s disease in treatment-naïve paediatric patients based on transcriptomic profile using logistic regression. J. Transl. Med. 19, 250 (2021).
Article PubMed PubMed Central Google Scholar
Xiao, Y. et al. Mapping the ecological networks of microbial communities. Nat. Commun. 8, 2042 (2017).
Article ADS PubMed PubMed Central Google Scholar
Angulo, M. T., Moog, C. H. & Liu, Y.-Y. A theoretical framework for controlling complex microbial communities. Nat. Commun. 10, 1045 (2019).
Article ADS PubMed PubMed Central Google Scholar
Bascompte, J., Jordano, P., Melián, C. J. & Olesen, J. M. The nested assembly of plant–animal mutualistic networks. Proc. Natl. Acad. Sci. U.S.A. 100, 9383–9387 (2003).
Article ADS CAS PubMed PubMed Central Google Scholar
Bascompte, J. & Jordano, P. Plant-animal mutualistic networks: the architecture of biodiversity. Annu. Rev. Ecol. Evol. Syst. 38, 567–593 (2007).
Article MATH Google Scholar
Kondoh, M., Kato, S. & Sakato, Y. Food webs are built up with nested subwebs. Ecology 91, 3123–3130 (2010).
Article PubMed Google Scholar
Gonze, D., Lahti, L., Raes, J. & Faust, K. Multi-stability and the origin of microbial community types. ISME J. 11, 2159–2166 (2017).
Article PubMed PubMed Central Google Scholar
Van de Guchte, M. et al. Alternative stable states in the intestinal ecosystem: proof of concept in a rat model and a perspective of therapeutic implications. Microbiome 8, 153 (2020).
Article PubMed PubMed Central Google Scholar
Stein, R. R. et al. Ecological modeling from time-series inference: insight into dynamics and stability of intestinal microbiota. PLOS Comput. Biol. 9, e1003388 (2013).
Article PubMed PubMed Central Google Scholar
van der Post, S. et al. Structural weakening of the colonic mucus barrier is an early event in ulcerative colitis pathogenesis. Gut 68, 2142 (2019).
Article PubMed Google Scholar
Bunesova, V., Lacroix, C. & Schwab, C. Mucin cross-feeding of infant Bifidobacteria and Eubacterium hallii. Microb. Ecol. 75, 228–238 (2018).
Article CAS PubMed Google Scholar
Schroeder, B. O. Fight them or feed them: how the intestinal mucus layer manages the gut microbiota. Gastroenterol. Rep. 7, 3–12 (2019).
Article Google Scholar
Kosciow, K. & Deppenmeier, U. Characterization of three novel β-galactosidases from Akkermansia muciniphila involved in mucin degradation. Int. J. Biol. Macromol. 149, 331–340 (2020).
Article CAS PubMed Google Scholar
Tilman, D. Resource competition and community structure. vol. 17 (1982).
Luca, M., Di Mauro, M., Di Mauro, M. & Luca, A. Gut microbiota in Alzheimer’s disease, depression, and type 2 diabetes mellitus: the role of oxidative stress. Oxid. Med. Cell. Longevity 2019, 4730539 (2019).
Google Scholar
Moya, A. & Ferrer, M. Functional redundancy-induced stability of gut microbiota subjected to disturbance. Trends Microbiol. 24, 402–413 (2016).
Article CAS PubMed Google Scholar
Folke, C. et al. Regime Shifts, Resilience, and Biodiversity in Ecosystem Management. Annu. Rev. Ecol. Evol. Syst. 35, 557–581 (2004).
Article Google Scholar
Dogra, S., Doré, J. & Damak, S. Gut Microbiota Resilience: Definition, Link to Health and Strategies for Intervention. Front. Microbiol. 11, 572921 (2020).
Article PubMed PubMed Central Google Scholar
Hu, H. et al. StrainPanDA: Linked reconstruction of strain composition and gene content profiles via pangenome-based decomposition of metagenomic data. iMeta 1, e41 (2022).
Article Google Scholar
Liang, Q., Bible, P. W., Liu, Y., Zou, B. & Wei, L. DeepMicrobes: taxonomic classification for metagenomics with deep learning. NAR Genom. Bioinf. 2, lqaa009 (2020).
Article Google Scholar
Zhang, X. et al. Assessing the impact of protein extraction methods for human gut metaproteomics. J. Proteom. 180, 120–127 (2018).
Article CAS Google Scholar
Cheng, K. et al. MetaLab: an automated pipeline for metaproteomic data analysis. Microbiome 5, 157 (2017).
Article PubMed PubMed Central Google Scholar
Galperin, M. Y. et al. COG database update: focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Res. 49, D274–D281 (2021).
Article CAS PubMed Google Scholar
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 39, 105–114 (2021).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was partially funded by the Government of Canada through Genome Canada and the Ontario Genomics Institute (OGI-114 and OGI-149) and the Ontario Ministry of Economic Development and Innovation (Project 13440). Y.-Y.L. acknowledges grants from National Institutes of Health (R01AI141529, R01HD093761, RF1AG067744, UH3OD023268, U19AI095219 and U01HL089856). The shotgun metagenomic analysis presented here was enabled in part by WestGrid () and Compute Canada (www.computecanada.ca). D.R.M. is partially supported through University of Ottawa Faculty of Medicine Distinguished Clinical Research Chair in Pediatric Inflammatory Bowel Disease. C.M.A.S. was funded by a stipend from the NSERC CREATE in Technologies for Microbiome Science and Engineering (TECHNOMISE) Program. The authors acknowledge Ruth Singleton (Clinical Research Coordinator) for participant recruitment and data collection.

Author information

These authors contributed equally: Leyuan Li, Tong Wang.

Authors and Affiliations

State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, 102206, Beijing, China
Leyuan Li
School of Pharmaceutical Sciences and Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, ON, K1H 8M5, Canada
Leyuan Li, Zhibin Ning, Xu Zhang, Joeselle M. Serrana, Caitlin M. A. Simopoulos, Janice Mayne & Daniel Figeys
Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, 02115, USA
Tong Wang & Yang-Yu Liu
Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON, K1H 8M5, Canada
James Butcher & Alain Stintzi
Department of Paediatrics, Faculty of Medicine, University of Ottawa and Children’s Hospital of Eastern Ontario Inflammatory Bowel Disease Centre and Research Institute, Ottawa, ON, K1H 8L1, Canada
David R. Mack
Center for Artificial Intelligence and Modeling, The Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Yang-Yu Liu

Authors

Leyuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Tong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhibin Ning
View author publications
You can also search for this author in PubMed Google Scholar
Xu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
James Butcher
View author publications
You can also search for this author in PubMed Google Scholar
Joeselle M. Serrana
View author publications
You can also search for this author in PubMed Google Scholar
Caitlin M. A. Simopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Janice Mayne
View author publications
You can also search for this author in PubMed Google Scholar
Alain Stintzi
View author publications
You can also search for this author in PubMed Google Scholar
David R. Mack
View author publications
You can also search for this author in PubMed Google Scholar
Yang-Yu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Figeys
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, Y.-Y.L, D.F., L.L., and T.W.; Methodology, D.F., Y.-Y.L, L.L., Z.N., and T.W.; Formal Analysis: L.L.; Investigation L.L., T.W., and Z.N.; Resources, D.R.M., A.S., J.B., and J.M.; Data Curation: L.L., J.B., X.Z., Z.N., J.S., and C.M.A.S.; Writing–Original Draft, L.L., Y.-Y.L, and D.F.; Writing–Review & Editing, J.B., J.M., A.S., C.M.A.S., D.R.M., X.Z., T.W., and Z.N.; Visualization: L.L., T.W., and Z.N.; Supervision: D.F. and Y.-Y.L. The authors declare that they follow principles of inclusion & ethics in global research.

Corresponding authors

Correspondence to Yang-Yu Liu or Daniel Figeys.

Ethics declarations

Competing interests

D.F., A.S., and D.R.M. have co-founded MedBiome, a clinical microbiomics company. All other authors declare no potential competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer Review File

Reporting Summary

Source data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, L., Wang, T., Ning, Z. et al. Revealing proteome-level functional redundancy in the human gut microbiome using ultra-deep metaproteomics. Nat Commun 14, 3428 (2023). https://doi.org/10.1038/s41467-023-39149-2

Download citation

Received: 05 November 2022
Accepted: 26 May 2023
Published: 10 June 2023
DOI: https://doi.org/10.1038/s41467-023-39149-2

This article is cited by

Identifying keystone species in microbial communities using deep learning
- Xu-Wen Wang
- Zheng Sun
- Yang-Yu Liu
Nature Ecology & Evolution (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.