Genome-scale network reconstructions have helped uncover the molecular basis of metabolism. Here we present Recon3D, a computational resource that includes three-dimensional (3D) metabolite and protein structure data and enables integrated analyses of metabolic functions in humans. We use Recon3D to functionally characterize mutations associated with disease, and identify metabolic response signatures that are caused by exposure to certain drugs. Recon3D represents the most comprehensive human metabolic network model to date, accounting for 3,288 open reading frames (representing 17% of functionally annotated human genes), 13,543 metabolic reactions involving 4,140 unique metabolites, and 12,890 protein structures. These data provide a unique resource for investigating molecular mechanisms of human metabolism. Recon3D is available at http://vmh.life.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Communications Biology Open Access 20 November 2023
Speos: an ensemble graph representation learning framework to predict core gene candidates for complex diseases
Nature Communications Open Access 08 November 2023
BMC Bioinformatics Open Access 27 September 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Bui, A.A.T. & Van Horn, J.D. Envisioning the future of 'big data' biomedicine. J. Biomed. Inform. 69, 115–117 (2017).
O'Brien, E.J., Monk, J.M. & Palsson, B.O. Using genome-scale models to predict biological capabilities. Cell 161, 971–987 (2015).
Thiele, I. et al. A community-driven global reconstruction of human metabolism. Nat. Biotechnol. 31, 419–425 (2013).
Duarte, N.C. et al. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc. Natl. Acad. Sci. USA 104, 1777–1782 (2007).
Swainston, N. et al. Recon 2.2: from reconstruction to model of human metabolism. Metabolomics 12, 109 (2016).
Pornputtapong, N., Nookaew, I. & Nielsen, J. Human metabolic atlas: an online resource for human metabolism. Database 2015, bav068 (2015).
Argmann, C.A., Houten, S.M., Zhu, J. & Schadt, E.E. A next generation multiscale view of inborn errors of metabolism. Cell Metab. 23, 13–26 (2016).
Gatto, F. & Nielsen, J. Pan-cancer analysis of the metabolic reaction network. Preprint at bioRxiv https://www.biorxiv.org/content/early/2016/05/17/050187 (2016).
Ji, B. & Nielsen, J. New insight into the gut microbiome through metagenomics. Adv. Genomics Genet. 5, 77–91 (2015).
Heinken, A. & Thiele, I. Systems biology of host-microbe metabolomics. Wiley Interdiscip. Rev. Syst. Biol. Med. 7, 195–219 (2015).
Thiele, I. & Palsson, B.Ø. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat. Protoc. 5, 93–121 (2010).
Chang, M.T. et al. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat. Biotechnol. 34, 155–163 (2016).
Miller, M.L. et al. Pan-cancer analysis of mutation hotspots in protein domains. Cell Syst. 1, 197–209 (2015).
Laskowski, R.A. et al. Integrating population variation and protein structural analysis to improve clinical interpretation of missense variation: application to the WD40 domain. Hum. Mol. Genet. 25, 927–935 (2016).
Niu, B. et al. Protein-structure-guided discovery of functional mutations across 19 cancer types. Nat. Genet. 48, 827–837 (2016).
Zhao, Z., Xie, L., Xie, L. & Bourne, P.E. Delineation of polypharmacology across the human structural kinome using a functional site interaction fingerprint approach. J. Med. Chem. 59, 4326–4341 (2016).
Porta-Pardo, E. & Godzik, A. Mutation drivers of immunological responses to cancer. Cancer Immunol. Res. 4, 789–798 (2016).
Rost, B. Twilight zone of protein sequence alignments. Protein Eng. 12, 85–94 (1999).
Ebrahim, A. et al. Multi-omic data integration enables discovery of hidden biological regularities. Nat. Commun. (2016).
Mih, N., Brunk, E., Bordbar, A. & Palsson, B.O. A multi-scale computational platform to mechanistically assess the effect of genetic variation on drug responses in human erythrocyte metabolism. PLOS Comput. Biol. 12, e1005039 (2016).
Mardinoglu, A. et al. Genome-scale metabolic modelling of hepatocytes reveals serine deficiency in patients with non-alcoholic fatty liver disease. Nat. Commun. 5, 3083 (2014).
Sahoo, S., Haraldsdóttir, H.S., Fleming, R.M.T. & Thiele, I. Modeling the effects of commonly used drugs on human metabolism. FEBS J. 282, 297–317 (2015).
Sahoo, S., Aurich, M.K., Jonsson, J.J. & Thiele, I. Membrane transporters in a human genome-scale metabolic knowledgebase and their implications for disease. Front. Physiol. 5, 91 (2014).
Famiglietti, M.L. et al. Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation. Hum. Mutat. 35, 927–935 (2014).
Nilsson, A., Mardinoglu, A. & Nielsen, J. Predicting growth of the healthy infant using a genome scale metabolic model. NPJ Syst. Biol. Appl. 3, 3 (2017).
Brunk, E. et al. Systems biology of the structural proteome. BMC Syst. Biol. 10, 26 (2016).
Berman, J.H.M. et al. The protein data bank. Nucleic Acids Res. 106, 16972–16977 (2000).
Preciat Gonzalez, G.A. et al. Comparative evaluation of atom mapping algorithms for balanced metabolic reactions: application to Recon3D. J. Cheminform. 9, 39 (2017).
Noronha, A. et al. ReconMap: an interactive visualization of human metabolism. Bioinformatics 33, 605–607 (2017).
Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
Whirl-Carrillo, M. et al. Pharmacogenomics knowledge for personalized medicine. Clin. Pharmacol. Ther. 92, 414–417 (2012).
Ye, Y. & Godzik, A. Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 19 (Suppl. 2), ii246–ii255 (2003).
Kris, M.G. et al. Efficacy of gefitinib, an inhibitor of the epidermal growth factor receptor tyrosine kinase, in symptomatic patients with non-small cell lung cancer: a randomized trial. J. Am. Med. Assoc. 290, 2149–2158 (2003).
von Bülow, R. et al. Defective oligomerization of arylsulfatase a as a cause of its instability in lysosomes and metachromatic leukodystrophy. J. Biol. Chem. 277, 9455–9461 (2002).
Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012).
Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 6, pl1 (2013).
Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).
Villa, G.R. et al. An LXR-cholesterol axis creates a metabolic co-dependency for brain cancers. Cancer Cell 30, 683–693 (2016).
Geng, F. et al. Inhibition of SOAT1 suppresses glioblastoma growth via blocking SREBP-1-mediated lipogenesis. Clin. Cancer Res. 22, 5337–5348 (2016).
Adzhubei, I., Jordan, D.M. & Sunyaev, S.R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. 7, 7.20 (2013).
Zielinski, D.C. et al. Pharmacogenomic and clinical data link non-pharmacokinetic metabolic dysregulation to drug side effect pathogenesis. Nat. Commun. 6, 7101 (2015).
Orth, J.D., Thiele, I. & Palsson, B.Ø. What is flux balance analysis? Nat. Biotechnol. 28, 245–248 (2010).
Lamb, J. et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935 (2006).
Kuhn, M., Campillos, M., Letunic, I., Jensen, L.J. & Bork, P. A side effect resource to capture phenotypic effects of drugs. Mol. Syst. Biol. 6, 343 (2010).
Fischer, A., Sananbenesi, F., Mungenast, A. & Tsai, L.-H. Targeting the correct HDAC(s) to treat cognitive disorders. Trends Pharmacol. Sci. 31, 605–617 (2010).
Xie, L., Xie, L., Kinnings, S.L. & Bourne, P.E. Novel computational approaches to polypharmacology as a means to define responses to individual drugs. Annu. Rev. Pharmacol. Toxicol. 52, 361–379 (2012).
Hopkins, A.L. Network pharmacology. Nat. Biotechnol. 25, 1110–1111 (2007).
Brunk, E. & Rothlisberger, U. Mixed quantum mechanical/molecular mechanical molecular dynamics simulations of biological systems in ground and electronically excited states. Chem. Rev. 115, 6217–6263 (2015).
Bordbar, A. et al. Personalized whole-cell kinetic models of metabolism for discovery in genomics and pharmacodynamics. Cell Syst. 1, 283–292 (2015).
King, Z.A. et al. BiGG Models: a platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res. 44 D1, D515–D522 (2016).
Hastings, J. et al. The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res. 41, D456–D463 (2013).
Brennan, C.W. et al. The somatic genomic landscape of glioblastoma. Cell 155, 462–477 (2013).
Noor, E., Haraldsdóttir, H.S., Milo, R. & Fleming, R.M.T. Consistent estimation of Gibbs energy using component contributions. PLOS Comput. Biol. 9, e1003098 (2013).
Quek, L.-E. et al. Reducing Recon 2 for steady-state flux analysis of HEK cell culture. J. Biotechnol. 184, 172–178 (2014).
Heirendt, L. et al. Creation and analysis of biochemical constraint-based models: the COBRA Toolbox v3.0. Preprint at https://arxiv.org/abs/1710.04038 (2017).
Dawson, P.A., Lan, T. & Rao, A. Bile acid transporters. J. Lipid Res. 50, 2340–2357 (2009).
Xu, D. & Zhang, Y. Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment. Sci. Rep. 3, 1895 (2013).
Zhou, H., Gao, M., Kumar, N. & Skolnick, J. SUNPRO: Structure and function predictions of proteins from representative organisms http://cssb.biology.gatech.edu/sunpro/index.html (2012).
Roy, A., Kucukural, A. & Zhang, Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protoc. 5, 725–738 (2010).
Kim, S. et al. PubChem substance and compound databases. Nucleic Acids Res. 44 D1, D1202–D1213 (2016).
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44 D1, D457–D462 (2016).
Kinsella, R.J. et al. Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database 2011, bar030 (2011).
Rahman, S.A. et al. Reaction Decoder Tool (RDT): extracting features from chemical reactions. Bioinformatics 32, 2065–2066 (2016).
First, E.L., Gounaris, C.E. & Floudas, C.A. Stereochemically consistent reaction mapping and identification of multiple reaction mechanisms through integer linear optimization. J. Chem. Inf. Model. 52, 84–92 (2012).
Kumar, A. & Maranas, C.D. CLCA: maximum common molecular substructure queries within the MetRxn database. J. Chem. Inf. Model. 54, 3417–3438 (2014).
Gatto, F., Miess, H., Schulze, A. & Nielsen, J. Flux balance analysis predicts essential genes in clear cell renal cell carcinoma metabolism. Sci. Rep. 5, 10738 (2015).
Rose, A.S. & Hildebrand, P.W. NGL Viewer: a web application for molecular visualization. Nucleic Acids Res. 43 W1, W576–W579 (2015).
The results here are in whole or part based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/. This work was funded by the Novo Nordisk Foundation Center for Biosustainability and the Technical University of Denmark (grant number NNF10CC1016517), the National Institutes of Health (grant GM057089 to B.O.P.) and by the Luxembourg National Research Fund (FNR) through the National Centre of Excellence in Research (NCER) on Parkinson's disease and the ATTRACT programme (FNR/A12/01), by the European Union's Horizon 2020 research and innovation programme under grant agreement No 668738, by the Institutional Strategy of the University of Tübingen (German Research Foundation DFG, ZUK 63), and by Google Inc. (Summer of Code 2016). RCSB PDB is funded by the National Science Foundation (NSF DBI-1338415 to S.K.B.), the Department of Energy, and the National Institutes of Health (NIGMS and NCI). This research used resources of the National Energy Research Scientific Computing Center. The authors gratefully acknowledge P. Mischel and W. Zheng for experimental help and discussions on GBM, N. Lewis, A. McCammon, J. Mesirov, J.M. Thornton, J. Monk, and J. Lerman for scientific discussions and Z. King for help with Escher integration in RCSB PDB, M. Abrams for manuscript editing, V. Kohler and A.E. Kärcher-Dräger for drawing the platelet and RBC map in Escher, and F. Monteiro and M.A.P. Oliveira for help in reconstructing the dopamine subsystem.
The authors declare no competing financial interests.
Integrated supplementary information
The published version Recon 236 was significantly expanded by addition of new reactions, metabolites, and genes. Simultaneously, the network content was refined for its gene-protein-reaction associations, thermodynamic infeasibility of reactions and reaction directionality, leading to Recon 3. The model was subjected to an expanded metabolic objective tests to ensure broader coverage of biochemical functions. Such iterative model building method greatly improved the content of all the models involved.
A. The reaction content of Recon 3 was categorized as per the major metabolic category. For the corresponding metabolic subsystem, follow Figure S3. B. The new reactions that led to assembly of Recon 3 are shown with their corresponding major metabolic category involved. C. The highest number of newly added metabolites are shown with their corresponding major metabolism involved. D. The newly added genes in Recon 3 are shown with their corresponding major metabolism. Interestingly, lipid metabolism was top scorer in each category.
Shown is the gain in the reaction content per metabolic subsystem. Additionally, ten new subsystems were introduced to include metabolic pathways of aminoacyl-tRNA biosynthesis, hippurate metabolism, leukotriene metabolism, N-glycan metabolism, nucleotide metabolism, peptide metabolism, protein assembly, protein degradation, protein modification, and vitamin K metabolism. On the other hand, cysteine metabolism of Recon 2 was merged with ‘Methionine and cysteine metabolism’ in Recon 3.
Good agreement between the simulated growth curves using Recon3D and HMR 2.00 and with the growth standards from the World Health Organization (WHO). The predicted growth curves show the cumulative weight gain from 180 growth simulations with age dependent nutrient intake, biomass composition and activity level. The discrepancy observed at 3-6 months is due to differences in the Kcal to ATP conversion factor between fat and glucose.
Supplementary Figure 5 GEM-PRO workflow for mapping gene identifiers to the UniProt, RefSeq, and Ensembl databases when considering isoforms.
The example shown here is for Entrez gene ID 314, with two isoforms, 314.1 and 314.2. Taking the gene ID without the Recon 3 isoform IDs, we are able to map it directly to the UniProt database which contains 2 annotated isoforms, and then map them back to the Entrez gene IDs. A separate workflow maps the gene ID (without isoform ID) to the RefSeq database, and transcript names are utilized to assign isoforms. Once a UniProt identifier has been found, we query the PDB database for all corresponding protein structures.
Supplementary Figure 6 Distribution of total energy-related (PSQS) scores for all 3D protein structures.
In (a), distribution of X-ray structure resolution for all PDBs mapped to genes in Recon 3. In light green are all PDB IDs, while in dark green are the selected structures that are best representative of each gene after the QC/QA steps. In (b), total PSQS scores for all homology models in Recon 3. A lower PSQS score indicates higher quality.
Supplementary Figure 7 Predictive accuracy of algorithmically derived atom mappings versus manually curated atom mappings.
A reaction is accurately predicted if each substrate atom is mapped to the correct product atom. Metabolic reactions can be classified by the enzyme that catalyses a reaction using four digits known as EC numbers. The top level EC number indicate the type of reaction that an enzyme carries out, therefore, to test the accuracy of the Reaction Decoder Tool, DREAM, and CLCA algorithms for different reaction types, we illustrate the predictive accuracy for the 512 curated reactions according to the reaction type as defined by their top level EC number.
SBGN-PD map view of the SBML Level 3 Version 1 file with Layout and Render extension that was generated from the manually drawn CellDesigner file of Recon 2.01. This map can be downloaded from https://vmh.life/#downloadview.
This map can be downloaded from https://vmh.life/#downloadview.
Escher map of the human red blood cell, redrawn from iAB-RBC-28343,127.
Escher map of the human platelet cell, redrawn from iAT-PLT-636127.
(a) (Left) All metabolic subsystems that map to damaging or potentially damaging variants in Recon 3D.
(Right) The landscape of protein motifs or domain types after filtering missense mutations using a 3D protein domain hotspot analysis. (b) (Left) We found that 13 of the 26 genes with 3D localized mutation hotspots catalyze metabolic reactions in Recon 3D. Using information from the metabolic network, protein structural domains, and disease associations, we laid out a subset of genes in a disease connectivity network. Visualization of this network reveals the diversity in metabolic roles and biological assemblies among this set of genes. Red outlined ovals indicate the number of 3D localized mutations in a given domain, the green squares indicate the number of representative domains linked to a given gene, the beige rectangles indicate the metabolic subsystems associated with a given gene, and the red triangles represent disease-associated SNPs that overlap with a given missense cancer mutation. In several cases, mutation positions without known effects are found to be associated with other diseases (e.g., Cowden syndrome 1 and Bannayan-Riley-Ruvalcaba syndrome in PTEN). (Right) Striking similarities were found in protein structure within the same subset of genes using a structure-based connectivity network, where links represent the degree of structural overlap (a significant score is typically > 0.4, determined from structural alignment98). The protein chain is indicated by a green square and whether mutations have known effects (K) or unknown effects (U) is annotated in the red outlined oval. These findings enable future studies that compare mutations with known effects to those with unknown effects in structurally similar (but not identical) regions.
Extension of the 3D hotspot analysis was performed on the metabolic SNP database (Table S19; Supplementary Data S3.xlsx). This dataset contains 1,385 unique genes with 3,649 SNPs; of these, 604 SNPs map to protein structures and are considered “deleterious” and 270 map to structures and are found to be “tolerated” (based on their SIFT predictors145). We found that deleterious mutations were much more likely to have co-occurring deleterious mutations in 5 and 10 Angstrom spheres than mutations that are tolerated (using a two-tailed t-test, p <0.5 and p> 0.1 for deleterious and tolerated mutations, respectively).
Supplementary Figure 15 Basic and detailed workflows for identifying drug-induced perturbed pathways and linking them to their indication.
(a) Basic workflow for identifying drug-induced perturbed pathways and linking them to their indication. (b) Detailed workflow for the Genetic Algorthm: 1) Inputs to the algorithm are a set of response variables for each gene expression set (either MetCHANGE scores or gene expression changes), a binary presence/absence vector for whether each sample was treated with a drug that has the side effect or indication, and the desired maximum number of predictor variables desired. The latter was set based on the number of treated gene expression sets in order to minimize the potential for overfitting. 2) At initiation, the genetic algorithm generates a ‘population’ of random guesses at the predictor variables, termed ‘individuals’, and assigns them either a value of -1, 0, or 1. For each individual, all gene expression samples are scored as the response variables (MetCHANGE or gene expression changes) multiplied by the candidate signature. 3) Each gene expression sample is then ranked and a receiver operator characteristic (ROC) curve is generated and area under the curve (AUC) is calculated using the input presence/absence vector for the side effect or indication. The sample AUCs are the maximization objective of the genetic algorithm. 4) The genetic algorithm subroutines are then used to generate a new population, biasing towards higher AUCs. Best solutions are maintained without modification, and lower scoring individuals are combined (‘crossed over’) and modified (‘mutated’) to search the solution space in a heuristic fashion. The termination criteria is typically a number of generations without improvement; however, we applied a simple maximum time termination criteria, as obtaining a global optimum was not deemed essential to gain biological insight. 5) The signature yielding the highest prediction AUC is considered the best predictor set. In the example case, the resultant AUC is 1.0, a perfect predictor for the sample set. 6) To assess overfitting and hence the predictive potential of the metabolic signature, 10-fold cross validation is performed by generating 10 partitions of 90% of the data to train signatures and predict the remaining 10 partitions of 10% of the data. To find signatures that have constant predictive power, the cross validation signatures were summed, and high scoring metabolites were considered the conserved metabolic response signature for the side effect or indication.
Supplementary Figures 1–15 (PDF 2987 kb)
Supplementary tables1–9 and Supplementary notes1–6 (PDF 1987 kb)
Reconstruction; Recon3D (XLSX 2106 kb)
File contains all GEM-PRO related content for Recon3D.Contains Supplementary Data Files 11-14. (XLSX 3195 kb)
File contains all mappings to variant disease SNPs/somatic mutations, FATCAT representative domain annotations and drug indication analyses. Contains Supplementary Data Files 15-26. (XLSX 2972 kb)
Recon 3D GEM-PRO has been consolidated into a shareable JSON file, which can be used to start structural analyses. (ZIP 444 kb)
IndiFinder.m (ZIP 3 kb)
About this article
Cite this article
Brunk, E., Sahoo, S., Zielinski, D. et al. Recon3D enables a three-dimensional view of gene variation in human metabolism. Nat Biotechnol 36, 272–281 (2018). https://doi.org/10.1038/nbt.4072
This article is cited by
Personalised modelling of clinical heterogeneity between medium-chain acyl-CoA dehydrogenase patients
BMC Biology (2023)
BMC Bioinformatics (2023)
Identifying metabolic shifts in Crohn's disease using 'omics-driven contextualized computational metabolic network models
Scientific Reports (2023)
Metabolic modelling-based in silico drug target prediction identifies six novel repurposable drugs for melanoma
Cell Death & Disease (2023)
Communications Biology (2023)