The adult human gastrointestinal tract is home to almost 1014 of bacteria, archaea, and eukarya cells, what is commonly referred to as the gut microbiome; thus it seems appropriate to view ourselves as a composite of many species. Our affiliated microbial partners contain 100 times the number of genes embedded in our Homo sapiens genome, providing us with a genetic landscape and functional features we do not have to evolve on our own (Backhed et al., 2004). These tens of trillions of microbes that live in our gastrointestinal tract represent an anaerobic bioreactor programmed to synthesize several molecules that direct the human immune system locally and systematically (Hooper, 2004). The biosynthetic variability of our gut microbiota, which remains largely unexplored, could shape the human physiological phenotypes and is viewed as a modifier of the host epigenome (Li et al., 2008). As a matter of fact, the indigenous metabolic processes coded in the host metagenome are a source of pharmacologically active secondary metabolites involved in host metabolic regulation (Nicholson, 2006; Sonnenburg et al., 2006). Although we know that the symbiotic gut microbiome has evolved to exert control over a number of important mammalian metabolic regulatory functions, little is understood of the systems level interactions between the chemical signatures that characterize the organism-level metabolic status and host cellular pathways.

The first step in improving our understanding on how epigenetic or genetic changes that uncouple the shared evolutionary fate of humans and their symbiotic bacteria can result in disease, is to describe the composition of the metabolic network of microbial communities with respect to the healthy state. Turnbaugh et al. (2009) have shown that the gut microbial community structures of adult monozygotic twin pairs had a degree of similarity that was comparable to that of dizygotic twin pairs, and only slightly more similar compared with their mothers, whereas Palmer et al. (2007) revealed that gut community assembly during the first year of life followed a more similar pattern in a pair of dizygotic twins compared with unrelated infants. This interpersonal variation in the composition of the human microbiota implies that studying the role of gut bacteria in the development of pathophysiology and as complimentary metabolic machinery for drugs and diets across a small set of individuals may inappropriately treat these diverse phenomena as a single, albeit noisy phenomenon. In contrast, averaging the effects of a disturbance across large cohorts of individuals with distinct disease phenotypes can reveal the links between bacterial dynamics and host physiology and pathology.

These disturbances in a healthy adult microbiota can be the result of several factors, such as urbanization, diet or hygiene; however, there is a major concern that medical therapies may alter the composition of the human microbiota. Although antibiotic treatment is typically followed by a decrease in the diversity of the microbiota (Jernberg et al., 2007), it is unknown how other medical treatments might change the microbiota and deregulate the host immune homeostasis. Recent studies on drug polypharmacology (Chang et al., 2010; Ainsworth, 2011; Wang et al, 2011; Yabuuchi et al., 2011) indicate that many of the known drugs have more than one target and highlight the need to undertake a systems level approach to understand drug efficacy and adverse drug reactions considering the effect of a molecule in a global physiological environment, including the gut microbiota.

To get a deeper understanding of the microbial populations in the human digestive tract and their interactions with the host, we described the genetically defined metabotypes from fecal samples of 124 European adults (≈3.3 million genes). The cohort is divided into healthy and obese individual human adults, as well as inflammatory bowel disease (IBD) patients (Qin et al., 2010). Even though several studies have suggested that gut microflora composition is associated with obesity and IBD (Kau et al., 2011; Maloy and Powrie, 2011) the elucidation of the molecular mechanisms underlying microbe–host reciprocal metabolic interactions is far from complete. The goals of the present study were to (i) evaluate the variability in the metabolic potential of the gut microbiome in healthy individuals based on variations in the metabolic gene content of the bacterial population; (ii) apply a systems chemical biology approach to link the putative meta-metabolome space with the human protein and disease space and (iii) explore how medical treatments disturb the bacterial communities, how gut microbes affect drug metabolism and draw connections between side effects of drugs and alterations in human microbiota (Figure 1a).

Figure 1
figure 1

The genetically defined metabotypes of the bacterial communities in the gut of healthy individuals. (a) The present study focuses on the evaluation of: (1) the interactions between small molecules (metabolites) produced by the microbiome and human proteins, and (2) the interactions between drug molecules and bacterial proteins. (b) The frequency of each of the 1490 metabolic reactions (red: essential reactions, blue: non-essential) that were assigned to the microbiome of the healthy individuals. The frequencies are given as both: (1) total number of reactions for a given frequency, represented by the bar width, and (2) density for essential and non-essential reactions for a given frequency. (c) Visualization of the most common (present in >70% of the samples) metabolic reactions (59 reactions) of the group of healthy individuals. Glycolysis, pyruvate metabolism, fatty acid metabolism, amino acid and nitrogen metabolism were the most conserved parts of the metabolic network. (d) The three distinct groups formed by clustering the healthy individuals based on a binary representation of their microbiome metabolic profile. The three clusters are characterized by high (736–1395), medium (254–758) and low (34–195) number of metabolic reactions. (e) Distribution of reaction counts in healthy samples dominated by the Bacteroides, Prevotella, and Ruminococcus genera (Arumugam et al., 2011).

Materials and methods

Metabolic proteins, reactions and metabolites

Enzymes, reactions and metabolite data were acquired from MetaCyc 14.6, a database that contains information about experimentally demonstrated small-molecule metabolic reactions and pathways, with links to sequences of the responsible enzymes (Caspi et al., 2010). MetaCyc contains a total of 9299 unique reactions, of which protein sequences could be retrieved for 6890. MetaCyc also contains 8869 metabolites, 8468 of which with known structure.

Human metagenome data

The metagenome data of the human microbiome are part of the Metagenomics of the Human Intestinal Tract (MetaHit) project (Qin et al., 2010). The data consist of gene sequences, distribution of genes among individuals (healthy samples=53, obese samples=46, IBD samples=10) and metadata for a subset of the samples, describing other non-genomic characteristics. The raw Illumina read data of all 124 samples have been deposited in EBI, under the accession ERA000116. The contigs and gene set are available to download from EMBL ( and BGI ( websites.

Linking MetaHit with MetaCyc database

All nucleotide sequences from the MetaHit FASTA-files were given in their respective reading frame and could thus be translated directly to the amino acid sequences. The link between the MetaHit data and the MetaCyc database was initially established by finding homolog proteins using the BLAST-Like Alignment Tool (BLAT) (Kent, 2002). Two proteins were considered homologs if there was at least 80% coverage of the MetaHit protein sequence with 85% identity (amino acid level). A set of metabolic proteins and the respective metabolic network were assigned to each sample. For the metabolic proteins that were found in >70% of the samples, the KEGG IDs of the respective metabolic reactions were retrieved and visualized using iPATH (Yamada et al., 2011). The abundance per sample of each metabolic gene is not depicted in the iPATH visualization.

Species diversity

Taxonomical information regarding each MetaHit sequence aligned to a MetaCyc sequence was acquired by aligning all MetaHit sequences to sequences of known origin, using 90% identity threshold (nucleotide level), describing classes ranging from super kingdom to strain (Qin et al., 2010). Of the 3.3 million sequences of the non-redundant database, almost 500 000 were not directly associated to species.

Clustering and statistical analysis

The distance between samples was measured from their metabolic reaction profile using a binary matrix (‘1’: the reaction is present in an individual, ‘0’: the reaction is absent). The similarity between two samples was measured by the Tanimoto coefficient (Tc). The Tc gives a similarity value based on the overlap and size of the bit strings of the two samples under examination (Jonsdottir et al., 2005). Then the distance between samples is simply calculated as 1 minus Tc. The clustering of the samples was performed using hierarchical clustering with Ward’s method (Ward, 1963).

For each reaction, we calculated the mean abundance over all samples, where the reaction was present (for example, mean for healthy, obese and IBD). Subsequently, we calculated the log2-fold change between healthy and obese, and between healthy and IBD, respectively. We kept reactions with a log2-FC <−1 or >+1, and we ended up with 64 reactions for the obese samples and 128 for the IBD samples. We then performed a Wilcoxon rank test (data is sparse, and therefore t-tests are not suitable) on these reactions and P-values were adjusted using the false discovery rate (Benjamini–Hochberg).

Disease genes and complexes

Data on disease-relevant protein complexes were retrieved from our in-house database (Lage et al., 2008) (, last accessed on 31 August 2011), where complexes are linked to tissues through RNA expression data derived from 73 healthy tissues from the Novartis Research Foundation Gene Expression Database ( A positive z-score indicates that the genes participating in a complex are expressed in the particular tissue at a higher level than the average gene of this tissue. The database consists of a list of 1524 protein complexes, assembled by 5202 unique proteins, connected through 45 662 unique interactions. The complexes are mapped to 1054 OMIM diseases. For 346 complexes the z-score for each of the 73 tissues is available. More information for the methodology on tissue specificity of disease complexes could be found in Lage et al. (2008).

Human metabolic network

In order to construct a subset of MetaCyc containing only non-human metabolites, human metabolomics data were acquired from the Homo sapiens Recon 1, a comprehensive literature-based genome-scale metabolic reconstruction of the global human metabolic map (Duarte et al., 2007). H. sapiens Recon 1 is part of the BIGG database (Schellenberger et al., 2010). All structures from MetaCyc and H. sapiens Recon 1 were first converted to Canonical SMILES using Open Babel (O’Boyle et al., 2011). SMILES (Simplified Molecular Input Line System) is a line notation for representing structures of small molecules. The generated SMILES is not unique for a given structure, thus, canonicalization involves the adoption of certain rules, which produce one generic SMILES among all valid possibilities ( Metabolites with identical Canonical SMILES, which were present in both MetaCyc and H. sapiens Recon 1, were subsequently removed from the former and the remaining ones consisted the list of non-human metabolites.

Drugs and drug-target data

Small-molecule drug and target data were acquired from DrugBank (DB) v.3 (Knox et al., 2011) and the Therapeutic Target Database (TTD) 4.3.02 (Chen et al., 2009). DB contains information for 6708 drugs (including 1437 approved, 5086 experimental, 83 nutraceuticals and 134 protein/peptide drugs) linked to 4229 non-redundant targets. Small-molecule structures, as SMILES strings, were obtained for 6628 small molecules, while the sequences of the drug targets were retrieved as FASTA files (available for 4050 targets). The TTD contains information for 17 816 drugs (including approved, clinical, experimental and antisense drugs) linked to 2025 targets (including successful, clinical, discontinued and research targets). Structures were downloaded for 14 783 small molecules as MOL-files and were converted to SMILES strings using Open Babel. FASTA-files for 1502 protein targets were also available at the TTD website. After merging of the two datasets and removal of duplicates, the final list consisted of 19 917 unique small molecule structures and 4595 unique drug targets.

The similarity between the small molecule drugs and bacterial metabolites was accessed by the Tanimoto coefficient. Two molecules were considered similar when they fulfilled both the following criteria: (i) Tc0.85, and (ii) at least one common protein interaction partner in ChEMBLdb.


Metabolic individuality in the healthy human microbiome

To evaluate the metabolome profile of the gut of healthy individuals we defined the metabolic capacity of the bacterial populations by linking the MetaHIT protein sequences to the MetaCyc database (Caspi et al., 2010) for each individual. We could, in this manner, retrieve all the MetaHit genes (from now on called ‘metabolic genes’) coding for proteins that are part of a metabolic pathway (from now on called ‘metabolic proteins’). Using strict criteria, two proteins were considered to be homologs when they shared 85% identity and at least 80% coverage, and a total of 3775 MetaHIT sequences accounting for 1490 metabolic reactions were retrieved (Supplementary Table S1). We should emphasize here that the metabolome profile assigned to each individual is solely based on genomic data that indicate the metabolic potential of the bacterial population with no further evidence that the detected genes are actually expressed to produce the metabolites. A binary presence/absence (1/0) matrix was constructed using these 1490 metabolic reactions for the healthy samples. This matrix revealed a large variation in the total number of metabolic reactions that were assigned in the microbiome of each healthy individual. To investigate further this metabolic diversity in connection to the composition of the microbiome, we visualized the frequency of each metabolic reaction per individual taking into account its essentiality. Reactions were divided into essential and non-essential (Figure 1b) based on the Database of Essential Genes (Zhang and Lin, 2009) (DEG) 6.5 ( By aligning the sequences from 18 prokaryotes in DEG to the MetaHIT sequences we retrieved 310 protein hits responsible for 425 essential metabolic reactions. Assigning a metabolic reaction as essential in our metagenome data should be always interpreted with caution so as not to neglect the symbiotic relationships in the bacterial population. Nevertheless from Figure 1b it is apparent that non-essential metabolic reactions have lower frequency among individuals compared with essential reactions (P-value <2.2 × 10−16 using Welch t-test). To capture a snapshot of the highly conserved among healthy individuals metabolic network, we visualized the metabolic reactions with frequency >70% using iPath2.0 ( The parts of the microbiome metabolism that show a higher conservation among individuals are glycolysis and pyruvate metabolism, fatty acid metabolism, amino acid and nitrogen metabolism (Figure 1c).

Subsequently, the genetically defined metabotypes were visualized in a heat map (Figure 1d) where the distance between individuals is measured from their metabolic reaction profile using the binary matrix described above. Interestingly, three distinct clusters of healthy individuals with high, medium and low metabolic potential were observed. The larger group was the one with high metabolic potential, harboring between 736–1395 metabolic reactions. The number of reactions for the medium and low metabolic potential groups were 254–758 and 34–195, respectively. We have further confirmed that the clusters of metabolic potential are not the outcome of correlations with the general richness of the sample, or with sequencing depth (Supplementary File S1).

To illustrate these results in the context of bacterial population, we studied in what degree the variability in the metabolic potential of individuals is correlated with the abundance levels of the Bacteroides, Prevotella and Ruminococcus genera in each sample (Arumugam et al., 2011). If we attempt to draw direct connections between the abundance of the three genera and the richness of the metabolic network based on Figure 1e, a high correlation is observed only for the Prevotella genera; Prevotella shows strong presence in samples with low metabolic potential and it becomes virtually absent in samples with high number of metabolic reactions. However, we should keep in mind that our correlations between the three genera and metabolic potential are based solely on genomic data.

Generating connections of ‘non-human’ metabolites, protein complexes and diseases

To uncover potentially produced meta-metabolites that may have a significant contribution to health maintenance, we sought interactions between human proteins and bacterial metabolites that are not part of the human metabolic network. To create this set of metabolites we overlaid the putative gut microbiome metabolism with the Homo sapiens Recon 1 (Duarte et al., 2007) metabolic model. We obtained a set of 482 metabolites with known structure, which can be potentially synthesized or degraded only by the gut bacteria. This study revealed at the same time a significant overlap between the human and the bacterial metabolic capacity (376 shared metabolites), comparable to the number of human-specific metabolites (472 metabolites). The set of 482 metabolites that cannot be produced or metabolized by humans (‘non-human’ metabolites) was the base for the subsequent analysis. ChEMBLdb (Gaulton et al., 2011), a manually curated database of bioactive molecules, was used for generating an interactome map of the ‘non-human’ metabolites and the human proteome space. From the 62 metabolites that were linked to the human proteome (Supplementary Table S2), and after filtering for small organic molecules, 33 were found to be associated with a disease-relevant protein complex, while 18 of them were present in DrugBank (Knox et al., 2011) as experimental drugs. In the metabolic reactions of the metagenome, in which the ‘non-human’ metabolites are involved, five appear as substrates (S), eight appear as products (P), and the rest appear as both a substrate and a product (S and P) (Figure 2b). The taxonomy of genes encoding the bacterial enzymes involved in these reactions revealed Escherichia coli, Klebsiella pneumonia, Enterobacter cloacae (and E. cancerogenus), Shigella sp. and Citrobacter rodentium as the most efficient producers/consumers (Figure 2a). However, we should keep in mind that linking sequences with species was possible for only one-third of the enzymes.

Figure 2
figure 2

The interactome space of the 33 ‘non-human’ metabolites with the human proteome. (a) A total of 195 MetaHit sequences (non-redundant) are involved in reactions in which these 33 metabolites participate. (1) Taxonomy distribution (top 20 species) for the MetaHit sequences (1091 non-redundant) involved in all metabolic reactions, (2) available taxonomy distribution for 56 of the 195 MetaHit sequences. (b) Heat map showing the disease space (given as OMIM IDs) targeted by the 33 metabolites that can be synthesized or degraded only by the gut microbiome. Information in parenthesis shows whether a metabolite participates in an irreversible reaction (as substrate (S) or a product (P)) or in a reversible reaction (S and P). When available, the Anatomical Therapeutically Classification code was retrieved from DrugBank. (c) The tissue specificity (given as a z-value) of the disease protein complexes that interact with metabolites with high binding affinity (Ki1 μM). Hierarchical clustering was performed using Euclidean distances according to Ward’s method. Detailed information for the protein complex IDs is provided in Supplementary Table S4.

The interactions of the ‘non-human’ metabolites with disease complexes were established using an in-house database, which includes 2227 unique proteins that have been associated with a disease OMIM (Online Mendelian Inheritance in Man) ID through data mining (Lage et al., 2008; Figure 2b). Disease protein complexes involved in squamous cell carcinoma, Li-Fraumeni syndrome, lung cancer and bladder cancer were associated with >10 ‘non-human’ metabolites each. Drug metabolism disorders were also ranked highly in the list, as several metabolites interact with complexes that include enzymes of the cytochrome P450 superfamily involved in the metabolism of drugs and other xenobiotics (Supplementary Table S3). Metabolites quercetin and menadione that were found to interact with disease protein complexes associated with a large number of OMIM IDs, seem to have a significant role in the host-bacterial mutualism. Along with these two ‘non-human’ metabolites, hydroquinone and S-carboxylmethyl-D-cystein formed a cluster sharing many of the disease IDs. Quercetin, menadione and S-carboxymethyl-D-cysteine participate in reversible reactions offering a higher degree of flexibility for regulating their concentrations in the gut, whereas hydroquinone appears as product of an irreversible reaction, according to MetaCyc.

Figure 2c illustrates the tissue specificity (given as a z-value) for the disease complexes, where at least one member protein is targeted with high affinity by one of the ‘non-human’ metabolites (detailed information for the disease complex IDs is given in Supplementary Table S4). Five metabolites, namely quercetin, erythronate-4-phosphate, D-glucono-1,5-lactone, 5′,5″-diadenosine triphosphate and hydroquinone, were associated with disease complexes, because of experimental binding activity to a member of the complex of Ki1 μM, which is a common threshold when considering efficient, small compound ligands (Overington et al., 2005). A cluster on the lower left part of this figure caught our attention, with 13 disease complexes related with different types of cancer that are highly expressed in cells and tissues involved in the signaling and shaping of the adaptive immune system (such as NK cells, B cells, dendritic cells, thymus, monocytes). Quercetin shows high affinity interactions with 11 of the complexes, whereas hydroquinone and D-glucono-1,5-lactone interact with one complex each. The protein complex targeted by hydroquinone is involved in DNA repair (GO:0006281, P-value5.10e−15) supporting evidence that this metabolite acts as an inducer of cytogenetic changes and DNA hypomethylation (Ji et al., 2010), whereas the protein complex targeted by D-glucono-1,5-lactone is involved in transcription regulation (GO:0045449, P-value5.29e−22).

Modulation of obesity and IBD by the gut microbiota

The MetaHIT cohort also contains samples from patients with IBD as well as individual fat percentage values for some samples, which can be used to group people as obese, allowing us to test if variations in the gut metabotypes have a role in the progression of these diseases. Following the same approach as the one used for the healthy individuals, we defined the metabolic network of the bacterial community for obese (1487 reactions) and IBD patients (1480 reactions). When clustering the samples for obese and IBD patients, based on binary data, we observed a similar trend as for the healthy individuals with three clusters of high, medium and low metabolic potential (data not shown). As the binary data could not reveal any significant differences in the metabolic potential between healthy and disease samples, we evaluated the abundance of the metabolic reactions per individual in each group (healthy, obese and IBD). The term abundance corresponds to the frequency of the sequences coding for each metabolic protein within the complete set of metabolic genes of the individual. In contrast to the binary data, attempting to cluster the samples of each group based on the abundance values resulted in no distinct clusters. Supplementary Figure S1 shows the most abundant metabolic reactions for healthy, obese and IBD samples. N-succinyl ornithine carbamoyl transferase and GDP-mannose 4,6 dehydratase were the most abundant metabolic genes not only in healthy but also in obese and IBD samples. When comparing the top-20 most abundant metabolic genes between the different groups very few differences were observed. In the comparison of the healthy against the other two groups, 3-aminobutyryl-CoA ammonia lyase, N-acetylhexosamine 1-kinase and formate C-acetyltransferase were the only reactions that did not appear in the top-20 of obese individuals, while pyruvate synthase, dimethylmaleate hydratase, acetyl-CoA-acetyl transferase, pyridoxal 5′-phosphate synthase (glutamine hydrolyzing) and hydrogensulfite reductase are the ones absent in the IBD top-20 most abundant metabolic genes.

In order to find statistically significant differences in the metabolic potential between healthy, obese and IBD samples, we applied a Wilcoxon signed-rank test, and used false discovery rate correction for multiple testing. The abundance of only two metabolic genes was found to be significantly different (P-value<0.05, both lower abundant in IBD individuals). The first reaction (E.C. is catalyzed by fumarate reductase, whereas the second one is the oxidation of menadiol by quinol monooxygenase yielding menadiole, superoxide and H+. As there is an increasing focus on oxidative stress as potential etiological factor and/or triggering factor in Crohn's disease (Iborra et al., 2011), it was interesting to find a reaction containing superoxide significantly different between healthy and IBD patients. While in the literature there are metabolomic-based studies performed in humans or model systems that have reported differences in the concentration of specific metabolites between healthy and non-healthy (obese, IBD) individuals (Waldram et al., 2009; Turnbaugh et al., 2010; Le Gall et al., 2011; Ponnusami et al., 2011), our analysis points out that these differences are not the result of genomic variability in the metabolic potential of the gut microflora. The analysis of transcripts would be the next natural step of the present study, in order to investigate whether there is a uniform functional pattern in healthy, obese and IBD samples or the differences in the metabolome level observed in other studies are the result of differences at the mRNA level.

Drug effects on host-microbiome mutualism

To identify drug treatments that are affected the most by the gut microbiome and vice versa, we followed two approaches: (i) we studied the chemical similarity between all drug molecules present in DrugBank and the metabolites of the metabolic network of healthy individuals. A drug and a compound were considered similar if they had a Tanimoto coefficient0.85 and shared at least one known protein interactor in ChEMBLdb. (ii) We evaluated the sequence similarity (90% similarity and 80% coverage) of the drug targets with the metabolic and non-metabolic proteins of the metagenome of healthy individuals. In total, 603 experimental and approved drugs (among them 42 antibiotics) were predicted to perturb 515 unique metabolic reactions (64 drugs linked to metabolic reactions based on drug-metabolite chemical similarity and 556 drugs based on target homology). From these metabolic reactions, 161 are essential reactions (DEG 6.5) (Figure 3a). Interestingly, 16 experimental drugs and 5 approved drugs are identical to metabolites that act as substrates in the microbiome metabolic network, indicating that their therapeutic activity is potentially affected by bacterial metabolism. Looking into the non-metabolic targets of drugs, based on the sequence similarity of the drug target and the MetaHit sequences, we found 334 drugs (among them 60 antibiotics) that could potentially modulate the activity of 495 proteins (Figure 3b) of the bacterial community (170 essential proteins). From the above drug–microbiome interactions there were several notable cases. Tobramycin interacts with two non-metabolic proteins, which are both assigned as essential. Tobramycin is an antibiotic used to treat various types of bacterial infections and shows diarrhea and abdominal pain as highly common side effects ( Olanzapine and aripirpazol were shown to interact each with one non-metabolic protein of the gut metagenome. These two antipsychotic drugs show diarrhea, dyspepsia, constipation and gain weight (only olanzapine) as common side effects. Cytarabine, an anticancer drug, targets five metabolic proteins (one of them essential) of the bacterial metabolic network. According to SIDER constipation is a common side effect associated with this treatment. In Table 1, we present the 18 drugs and their respective Anatomical Therapeutic Chemical Classification codes with the most relevant, to the gut microbiome, side effects (diarrhea, flatulence and so on).

Figure 3
figure 3

Drug–metagenome interactions. (a) A network of interactions between drug molecules (circles) and metabolic reactions (squares) in healthy individuals. A drug is connected with a metabolic reaction if (i) it is chemically similar with a metabolite that appears as reactant and (ii) the drug and the metabolite share at least one protein interactor in ChEMBLdb. Red font indicates essential reactions and green font indicates antibiotics. (b) A network of interactions between drug molecules (circles) and non-metabolic proteins (squares) in healthy individuals. A drug is connected with a protein if the drug target and the metagenome protein are homologs (90% similarity and 80% coverage). Red font indicates essential reactions and green font indicates antibiotics.

Table 1 The 18 drugs that were found to potentially interact with metabolic reactions and non-metabolic proteins of the gut bacteria (based either on chemical similarity between the drug and the metabolite or sequence similarity of the drug target and a bacterial protein) and for which information on relevant side effects was retrieved from SIDER (in parenthesis is the number of essential reactions)


The role of the human intestinal microbiome in metabolizing dietary constituents and protecting the host against pathogens is crucial to human health (Ley et al., 2008). Analysis of genomic and metagenomic sequences has produced gene catalogs with protein families involved in the predominant functions of the gut bacteria (Qin et al., 2010; Turnbaugh et al., 2010). However, currently lacking is a large-scale systematic analysis of the complex chemical space generated by these bacteria and the interaction with the human proteome. By defining the genetically defined metabotypes of >120 individuals, we concluded that independent of healthy or diseased samples, the individuals are clustered as high, medium or low metabolic potential. Further to this, Prevotella genera appear to be highly sensitive to the complexity of the metabolic chemical space.

A comparison of the metagenome metabolic network with the human metabolic network revealed a large number of metabolites that appear as reactants only in the bacterial metabolism. The finding that from this set of metabolites, 18 are present in DrugBank provides evidence that we carry a natural pharmacy in our guts. In addition, 15 other ‘non-human’ metabolites were found to interact with a set of human protein complexes associated with a wide spectrum of diseases. For example quercetin, a flavonoid compound that is widely distributed in nature, and menadione, a vitamin precursor of K2, were found to interact with disease protein complexes associated with a large number of OMIM IDs. Although there are no official health claims granted to quercetin, several studies have been conducted that report reduction of systolic blood pressure and plasma oxidized low-density lipoprotein concentrations in overweight subjects with a high cardiovascular disease risk phenotype (Egert et al., 2009), potential anti-obesity effects (Yang et al., 2008) and protection from colon and prostate cancer (Senthilkumar et al., 2011; Bae et al., 2012). Using here compound-protein interaction data we associated quercetin with disease protein complexes involved in both obesity and IBD. Menadione is a clinically important chemotherapeutic agent used in the treatment of leukemia and other types of cancer (Laux and Nel, 2001; Matzno et al., 2008). Due to adverse effects associated with it, menadione is used as nutritional supplement only in few developing countries, but is still used in many countries as inexpensive micronutrient for livestock. Understanding the regulation of the metabolic pathways in which the above two and the rest of the 33 metabolites are involved, and monitoring their actual concentration in healthy and disease individuals, could guide us to rationally design diets that promote host homeostasis. However, as genomic data provide only a blueprint or inventory of genes encoding enzymes without any proof of functionality, meta-transcriptomic or meta-metabolomic analysis is necessary before we attempt to shape the bacterial metabolic network activity to benefit the host. Especially, the power of meta-metabolomic analysis for establishing direct links between alterations in the gut microbiome and human host responses has been demonstrated in several studies. Martin et al. (2010) applied microbial profiling of fecal contents in four groups of animals to study dietary-specific modulation of microbial populations and a different profile of amino acids and SCFAs was observed upon modification of the dietary pattern (glucose–lactose mix) in conventional and non-conventional animals. In another notable study Yap et al. (2008) were the first to show that major perturbation of microbiome metabolites manifests with changes to the host’s systemic metabolic phenotype.

Antibiotics have been used effectively for over half a century to treat bacterial infections; however, it is widely accepted that they can also markedly affect the composition of the microbiota. Although the effect of antibiotics is short term, as typically most families and genera of gut microorganisms return to typical levels within weeks of exposure, there is no information on how the bacterial communities are affected by other classes of drugs, especially drugs with long-term use from the human host. From the computational analysis applied in our study there are indications that antibiotics are not the only therapeutic treatment that can result in a dysregulation of the symbiosis between the host and its microbiota. Antipsychotic and anticancer drugs were also found to potentially interact with bacterial proteins, and these interactions might explain observed side effects, such as diarrhea and constipation; however, experimental evidence is required to validate these observations. Zheng et al. (2011) have determined the global metabolic effects and dynamic changes of 223 fecal metabolites in antibiotic-treated rats and revealed alterations in metabolic systems not previously correlated with antibiotics. In parallel, Jakobsson et al. (2010) analyzed the 16S rRNA gene using 454-based pyrosequencing and terminal-restriction fragment length polymorphism of bacterial populations in human fecal samples after antibiotic administration and revealed marked ecological disturbances that naturally influence the abundance of specific metabolites. These are just a few of the experimental approaches that could be used for studying experimentally drug–microbiome interactions. In contrast to antibiotics, antipsychotic and anticancer drugs are long-duration treatments; therefore a mechanistic understanding of their side effects is of utmost importance.