High-throughput sequencing and its applications have produced new insights into the human gut microbiome’s structural diversity [1] and functional potential [2]. In health and disease, the gut microbiome confers essential functionalities [3] by interfacing directly with human metabolism [4] as well as ensuring intestinal homeostasis and immune system stimulation [3], among others [2]. Microbiome-secreted molecules, including nucleic acids, (poly)peptides, enzymes, and metabolites, play key roles in microbiome-host signaling [5] and are released into the human gastrointestinal tract via secretory systems and/or outer membrane vesicles (OMVs) [5]. Substantial differences exist between predicted functionalities based on metagenomic analyses and actual microbial phenotypes in the gut [2]. The immunogenic potential of commensals and pathobionts thereby remains largely unexplored, especially as the emergent properties of the microbiome in relation to host interactions remain to be comprehensively characterized and understood. Moreover, the fraction of genes encoding proteins of unknown function constitutes between 40 and 70% of genes, and such proteins constitute half of those that are identifiable in metaproteomic data from fecal protein extracts [2]. Further exacerbating the situation concerning such unknowns is the fact that the majority of gut microbiome-derived small molecules (>90%) do not have any references in public databases despite their immediate relevance to host physiology [6]. Finally, RNA transcripts reflect microbial viability and affect antibody responses [7] but microbiome-derived extracellular small and large RNAs in the gastrointestinal tract remain largely uncharacterized [8]. Collectively, the diversity of microbiome-secreted biomolecules involved in host-microbiome interactions is vast and comprises an extensive array of so far unexplored material.

To obtain an overview of this diversity, we developed a framework to systematically characterize the extracellular complement of microbiome-derived molecules including DNA (ex-DNA), small and large RNA (ex-sRNA and ex-lRNA), (poly)peptides (ex-Prot), and metabolites [polar metabolites, short-chain fatty acids (SCFAs), and bile acids (BAs)] from the human gut by integrated multi-omics (Supplementary Materials and Methods). The present work thereby represents a systematic and extensive expansion of the previous methodological workflow by Roume et al. [9], which focused on the intracellular biomolecular complements. Moreover, we analyze and contextualize the resulting extracellular high-resolution multi-omics data. Briefly, using our new method, snap-frozen stool samples from four healthy individuals are homogenized and are subjected to an optimized biomolecular isolation method [9] (Fig. 1A). Isolation and purification of the intracellular molecules are performed after cell lysis on the resuspended pellet using silica-column-based techniques. For the extracellular fractions, fecal water is recovered using low-speed centrifugation and low-flow filtration to avoid microbial cell lysis [9]. All obtained nucleic acid fractions are subjected to high-throughput sequencing. Peptides are isolated after precipitation using trichloroacetic acid and sodium deoxycholate to ensure recovery of low abundance (poly)peptides. Ex-Prot are subjected to SDS-PAGE electrophoresis followed by LC with tandem mass spectrometry (LC-MS/MS). Metabolites are extracted by adding the respective internal standards, followed by recovery of the phase of interest. Metabolite fractions are analyzed using combinations of gas chromatography-mass spectrometry (GC-MS) and liquid chromatography coupled to high-resolution mass spectrometry (LC-HRMS). To allow integrated taxonomic and functional analyses, reference metagenome assembled genomes (MAGs), against which the extracellular nucleic acid fractions are mapped and which are used for protein identifications, are obtained by co-assembling the intracellular nucleic acid data using the Integrated Meta-omics Pipeline (IMP) [10]. Subsequently, based on the resulting genomic foundation, the metaproteomic data are further integrated via matching of the mass spectra using the contig-derived databases for protein identification. In addition, the identified metabolites may be integrated via their annotation to reactions and their corresponding enzymes derived from the above integrated analyses. An example of an integrated analysis view is shown in Supplementary Figs. S1 and S2.

Fig. 1: Overview of the methodological workflow and characteristics of the obtained biomolecular fractions.
figure 1

A Flowchart of the experimental and bioinformatic analyses. Flash-frozen stool samples are divided into aliquots for subsequent biomolecular extractions. Int-DNA are obtained after elution of the lysate bound onto an AllPrep DNA spin column, the flow-through is loaded onto a RNeasy spin column for int-RNA isolation. To obtain the extracellular fractions, the supernatant is first filtered through a polyethersulfone (PES) membrane. Nucleic acid fractions are isolated using specific columns (NucleoSpin miRNA Plasma kit for ex-DNA and ex-sRNA, NucleoSpin RNA Blood kit for ex-lRNA). Ex-DNAs are subjected to an additional concentration step. All nucleic acid fractions are subjected to high-throughput sequencing. Ex-Prot are obtained from the resulting pellet after protein precipitation and analyzed by SDS-PAGE followed by LC-MS/MS. The sequencing information from the intracellular fractions allows for genome reconstruction by a DNA-RNA co-assembly using IMP [10]. This MG-MT reference allows further mapping and annotation of the extracellular fractions. Polar metabolites, SCFAs, and BAs are extracted from their respective aliquots by addition of specific internal standards (IS) and further processing of the supernatant (Supplementary Materials and Methods). The extracts are then analyzed by GC-MS, GC-MS, and LC-HRMS, respectively. B Masses of biomolecules extracted per mg of original stool sample (logarithmic scale). Error bars represent standard deviation on four independent samples. ex-DNA extracellular DNA, ex-sRNA extracellular small RNA, ex-lRNA extracellular large RNA, ex-Prot extracellular proteins, SCFAs short-chain fatty acids, BAs bile acids.

The individual extracellular complements were effectively extracted using our methodology (Fig. 1B, Supplementary Figs. S3S5, and Supplementary Tables S1S3). Interestingly, proteins were over-represented and nucleic acids under-represented when compared to the average intracellular composition of a bacterium [11]. We compared the intracellular composition of Escherichia coli as defined by Neidhardt et al. [11] to the extracellular fractions we obtained (Supplementary Fig. S6). Our observations, including the overrepresentation of proteins in the extracellular fractions, are expected as most of the macromolecular export machinery within a microbial cell is selective for protein export. An example for this being all proteins tagged with signal peptides and those exported via bacterial secretion systems such as Sec, Tat, Type-1 to Type-9 secretion systems [12]. On the other hand, nucleic acid export is known to occur primarily via conjugation or transduction and occurs between cells rather than the extracellular compartment. The exception to this is the export of nucleic acids via extracellular vesicles (EVs). Our protocol is also designed to capture the EVs in the extracellular fraction, whereby the centrifugation speed is set up to separate cells from the entire extracellular content. Taxonomic assignment based on the MAGs as well as the functional annotations demonstrated the uniqueness of the different biomolecular fractions whereby the int-DNA, as solely used for a typical metagenomic analysis, did not allow inferences regarding the composition of the extracellular complements (Fig. 2A). For example, dominant gut microbiome taxa and organisms of interest, e.g., Roseburia spp., were differentially represented in the different fractions (Supplementary Fig. S7). We also found that Blautia spp. was significantly differentially represented between the various fractions (Supplementary Table S4). In addition, the overall taxonomic composition showed higher variation between fractions and individuals than the corresponding functional representations (Fig. 2, Supplementary Fig. S8, and Supplementary Table S5 and S6). We also observed differences at the functional levels between the int-DNA and other fractions with respect to genes encoding for tRNAs and other functions (Supplementary Table S7). Since int-DNA is solely used in typical metagenomic studies, we assessed the overlap between int-DNA and the other extracellular fractions. The differences were apparent in the overlap between the assessed fractions at the nucleotide (Supplementary Fig. S9), taxonomic (Supplementary Fig. S10), and functional levels (Supplementary Fig. S11), thereby underlining the necessity for the systematic characterization of the individual fractions. Importantly, the resolved inter- and intra-individual variations are in line with our previous work focused on the intracellular fractions [2], thereby reinforcing the notion that the individual is the largest contributor to the observed variation within the microbiome-derived biomolecular fractions.

Fig. 2: Composition of the extracted biomolecular fractions from gut microbiome samples of four healthy human individuals.
figure 2

A Relative abundance (%) of the taxonomic annotations at the genus level based on the co-assembled contigs using Kraken2. Differences in composition are observed between the different fractions as well as between the individuals (Ind). B Relative abundance (%) of the functional classification on the co-assembled contigs according to clusters of orthologous groups (COGs) and non-coding RNA types. Abbreviations of the functional categories: A: RNA processing and modification; B: chromatin structure and dynamics; C: energy production and conversion; D: cell cycle control: cell division: chromosome partitioning; E: amino acid transport and metabolism; F: nucleotide transport and metabolism; G: carbohydrate transport and metabolism; H: coenzyme transport and metabolism; I: lipid transport and metabolism; J: translation: ribosomal structure and biogenesis; K: transcription; L: replication: recombination and repair; M: cell wall/membrane/envelope biogenesis; N: cell motility; O: post-translational modification: protein turnover and chaperones; P: inorganic ion transport and metabolism; Q: secondary metabolites biosynthesis: transport and catabolism; S: function unknown; T: signal transduction mechanisms; U: intracellular trafficking: secretion and vesicular transport; V: defense mechanisms; Z: cytoskeleton. C Heatmap of the bile acid (BA) and short-chain fatty acid (SCFA) concentrations (µg/L; logarithmic scale), measured by GC-MS and LC-HRMS, respectively, for each individual. Lower concentrations are indicated in blue and range from 0 to 98,029.2 µg/L, higher concentrations are shown in red, ranging from 98,029.2 to 196,058.4 µg/L. SCFAs are originally measured in µmol/L in a dynamic range from 10 to 4000 µmol/L, BAs are measured in ng/mL ranging from 50 to 4000 ng/mL. BAs bile acids, SCFAs short-chain fatty acids, int-DNA intracellular DNA, ex-DNA extracellular DNA, ex-sRNA extracellular small RNA, ex-lRNA extracellular large RNA, ex-Prot extracellular proteins.

With respect to host-microbiome interactions especially in relation to immunostimulation, the ex-DNA along with the ex-lRNA contained genes from pathobionts, e.g., Staphylococcus spp., known to alter IL-8 expression via recognition of CpG motifs by TLR9 [13]. The ex-lRNA fraction was enriched in RNAs derived from specific bacterial taxa, e.g., Faecalibacterium spp. (comprising up to 22% of reads; Fig. 2A and Supplementary Table S8), and RNA viruses, e.g., tobacco mosaic virus (up to 8%). Furthermore, we observed a general enrichment in non-coding RNAs (ncRNAs; up to 57%; Fig. 2B and Supplementary Table S9). Interestingly, human gut-associated archaea such as Methanobrevibacter smithii represented up to 5% in Individual 3 (Supplementary Fig. S12 and Supplementary Table S8). M. smithii’s RNA is known to trigger TLR8-dependent NLRP3 inflammasome activation [14]. The ex-sRNA fractions were enriched in sequences from different members of the Clostridiales (up to 43%; Fig. 2A and Supplementary Table S8), mainly being transfer-RNAs (tRNAs; 91–97%), ribosomal RNAs (rRNAs; 0.2–3%), or other non-coding RNAs (ncRNAs; 1–4%; Fig. 2B and Supplementary Table S9).

We captured specific molecules that are typically enriched in bacterial OMVs including several 50S ribosomal proteins encoded by the rplE, rplL, rplM, and rplY genes [15], mainly originating from the Bacteroidales (Supplementary Table S10). Overall, the nucleic acid fractions contained genes coding for various vesicle-associated proteins that were also present among the ex-Prot. Examples include chaperone protein HtpG [15] and the outer membrane proteins OmpA, OmpF, FepA, and BamA [16] (Supplementary Table S11). The majority were derived from Bacteroidales and Gammaproteobacteria (Supplementary Table S10). Furthermore, we detected multiple enzymes, known to be enriched in OMVs, such as, glutamine synthetase (glnA), protein recombinase A (recA) [14], and formate acetyltransferase 1 (pflB) [16] (Supplementary Table S11). These were specifically encoded by different members of the Bacteroidales (Supplementary Table S10). This indicates the ability of our newly developed protocol to resolve vesicle-associated biomolecules along with soluble molecules. The functional repertoires of the ex-Prot exhibited mainly involvement in transport and metabolism of components (60–63%; Fig. 2B and Supplementary Table S9), thereby indicating distinct export mechanisms and specific enrichments in the extracellular space.

The metabolome contained microbiota-secreted molecules such as SCFAs, secondary BAs (Fig. 2C and Supplementary Table S12), and derivatives (Supplementary Fig. S13 and Supplementary Table S13), known to play crucial roles in host metabolism, immune, and inflammatory pathways [4]. For example, lithocholic acid derivatives inhibit Th17 cell differentiation and stimulate Treg differentiation [17]. Furthermore, formate provides a substrate for Enterobacteriaceae expansion in the gut, which intensifies inflammation-associated dysbiosis [18]. Acetate, butyrate, and propionate contribute to the anti/pro-inflammatory equilibrium, their imbalance has been linked to chronic inflammation eventually leading to various autoimmune diseases [19].

It is challenging to distinguish host- versus gut microbiome-derived biomolecules, especially for those that cannot be immediately linked back to the genomic information such as is the case for metabolites. For instance, with respect to DNA, host DNA can be identified in silico during the assembly step (see Methods), allowing the distinction between bacterial and host-derived DNA. Aside from this, mammalian mRNA may be distinguished from microbial transcripts based on the presence of a polyA tail in the former. The exceptions here, however, include commensal eukaryotes such as fungi and Blastocystis, some sRNAs, and non-polyadenylated molecules [20]. For the majority of the proteins, based on the genomic foundation, we have previously described that systematic omic measurements in a tight coupling with experimental approaches allow for the inference of causal relationships via coherent data integration [2, 21]. This approach, in addition to organismal affiliation of metabolites, may be fruitful in the context of organismal assignments of non-ribosomal peptides. Furthermore, in the context of metabolites, a top-down approach has recently been demonstrated by Zimmerman et al. [22], whereby specific microbiota-derived metabolites, especially in the context of drug metabolism, were differentiated from those of the host. More broadly speaking, metabolites may also be attributable to organisms via metabolic reconstructions, either at the community-level [23, 24] or taxon-level [25], in a complementary bottom-up approach. In the context of molecule-to-organism linkages, the generation of systematic high-resolution data along with appropriate data analytical methods can establish relevant associations, which then need to be further validated experimentally [2]. In this context, our expanded biomolecular isolation methodology presented here provides the foundation for identifying such relationships following precise and multi-dimensional analyses from the same original sample that is critical for coherent multi-omics data integration [26]. This is particularly relevant when working on heterogenous microbiome samples such as stool. We note that our herein described biomolecular extraction methodology should be generally applicable to other sample types such as saliva, skin, or vaginal samples. The main limitation in this context is associated with the yield of the extractions, i.e., the mentioned sample types yield lower cell numbers compared to fecal samples. If this bottleneck is carefully considered and related adjustments are made, our method, as it is based inter alia on indiscriminate cryogenic lysis of cells [9], should be generally applicable to extract from other sample types and subsequently perform meaningful omic measurements. Several chronic diseases are thought to have a constitutively (pro)-inflammatory state, potentially underlying disease etiology [27]. Therefore, given the distinctiveness of the extracellular biomolecular fractions and their involvement in modulating immune and inflammatory pathways, deciphering this molecular complex and its effect on the human host represents one of the many challenges to be faced in the coming years. Thereby, our results support the notion that the integration of additional omics data beyond metagenomics (based on int-DNA) adds essential dimensions in terms of taxonomic and functional information, not least in relation to likely effector biomolecules. Our methodology thereby represents the foundation for the systematic study of the gut microbiome’s extracellular molecular complex in the context of human health and disease.