Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Recon3D enables a three-dimensional view of gene variation in human metabolism


Genome-scale network reconstructions have helped uncover the molecular basis of metabolism. Here we present Recon3D, a computational resource that includes three-dimensional (3D) metabolite and protein structure data and enables integrated analyses of metabolic functions in humans. We use Recon3D to functionally characterize mutations associated with disease, and identify metabolic response signatures that are caused by exposure to certain drugs. Recon3D represents the most comprehensive human metabolic network model to date, accounting for 3,288 open reading frames (representing 17% of functionally annotated human genes), 13,543 metabolic reactions involving 4,140 unique metabolites, and 12,890 protein structures. These data provide a unique resource for investigating molecular mechanisms of human metabolism. Recon3D is available at

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Figure 1: The properties and content of the Recon3D knowledge-base.
Figure 2: Linking the human metabolic network to protein structural databases, cheminformatics platforms, and the Protein Data Bank.
Figure 3: Linking the human metabolic network to gene variation and cancer knowledge-bases.
Figure 4: An example of bridging systems biology and structural biology through Recon3D.
Figure 5: Protein structure-guided discovery of mutation hotspots across structurally related genes.
Figure 6: Identification of metabolic signatures linked to drug indications.


  1. Bui, A.A.T. & Van Horn, J.D. Envisioning the future of 'big data' biomedicine. J. Biomed. Inform. 69, 115–117 (2017).

    Article  Google Scholar 

  2. O'Brien, E.J., Monk, J.M. & Palsson, B.O. Using genome-scale models to predict biological capabilities. Cell 161, 971–987 (2015).

    Article  CAS  Google Scholar 

  3. Thiele, I. et al. A community-driven global reconstruction of human metabolism. Nat. Biotechnol. 31, 419–425 (2013).

    Article  CAS  Google Scholar 

  4. Duarte, N.C. et al. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc. Natl. Acad. Sci. USA 104, 1777–1782 (2007).

    Article  CAS  Google Scholar 

  5. Swainston, N. et al. Recon 2.2: from reconstruction to model of human metabolism. Metabolomics 12, 109 (2016).

    Article  Google Scholar 

  6. Pornputtapong, N., Nookaew, I. & Nielsen, J. Human metabolic atlas: an online resource for human metabolism. Database 2015, bav068 (2015).

    Article  Google Scholar 

  7. Argmann, C.A., Houten, S.M., Zhu, J. & Schadt, E.E. A next generation multiscale view of inborn errors of metabolism. Cell Metab. 23, 13–26 (2016).

    Article  CAS  Google Scholar 

  8. Gatto, F. & Nielsen, J. Pan-cancer analysis of the metabolic reaction network. Preprint at bioRxiv (2016).

  9. Ji, B. & Nielsen, J. New insight into the gut microbiome through metagenomics. Adv. Genomics Genet. 5, 77–91 (2015).

    CAS  Google Scholar 

  10. Heinken, A. & Thiele, I. Systems biology of host-microbe metabolomics. Wiley Interdiscip. Rev. Syst. Biol. Med. 7, 195–219 (2015).

    Article  Google Scholar 

  11. Thiele, I. & Palsson, B.Ø. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat. Protoc. 5, 93–121 (2010).

    Article  CAS  Google Scholar 

  12. Chang, M.T. et al. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat. Biotechnol. 34, 155–163 (2016).

    Article  CAS  Google Scholar 

  13. Miller, M.L. et al. Pan-cancer analysis of mutation hotspots in protein domains. Cell Syst. 1, 197–209 (2015).

    Article  CAS  Google Scholar 

  14. Laskowski, R.A. et al. Integrating population variation and protein structural analysis to improve clinical interpretation of missense variation: application to the WD40 domain. Hum. Mol. Genet. 25, 927–935 (2016).

    Article  CAS  Google Scholar 

  15. Niu, B. et al. Protein-structure-guided discovery of functional mutations across 19 cancer types. Nat. Genet. 48, 827–837 (2016).

    Article  CAS  Google Scholar 

  16. Zhao, Z., Xie, L., Xie, L. & Bourne, P.E. Delineation of polypharmacology across the human structural kinome using a functional site interaction fingerprint approach. J. Med. Chem. 59, 4326–4341 (2016).

    Article  CAS  Google Scholar 

  17. Porta-Pardo, E. & Godzik, A. Mutation drivers of immunological responses to cancer. Cancer Immunol. Res. 4, 789–798 (2016).

    Article  CAS  Google Scholar 

  18. Rost, B. Twilight zone of protein sequence alignments. Protein Eng. 12, 85–94 (1999).

    Article  CAS  Google Scholar 

  19. Ebrahim, A. et al. Multi-omic data integration enables discovery of hidden biological regularities. Nat. Commun. (2016).

  20. Mih, N., Brunk, E., Bordbar, A. & Palsson, B.O. A multi-scale computational platform to mechanistically assess the effect of genetic variation on drug responses in human erythrocyte metabolism. PLOS Comput. Biol. 12, e1005039 (2016).

    Article  Google Scholar 

  21. Mardinoglu, A. et al. Genome-scale metabolic modelling of hepatocytes reveals serine deficiency in patients with non-alcoholic fatty liver disease. Nat. Commun. 5, 3083 (2014).

    Article  Google Scholar 

  22. Sahoo, S., Haraldsdóttir, H.S., Fleming, R.M.T. & Thiele, I. Modeling the effects of commonly used drugs on human metabolism. FEBS J. 282, 297–317 (2015).

    Article  CAS  Google Scholar 

  23. Sahoo, S., Aurich, M.K., Jonsson, J.J. & Thiele, I. Membrane transporters in a human genome-scale metabolic knowledgebase and their implications for disease. Front. Physiol. 5, 91 (2014).

    Article  Google Scholar 

  24. Famiglietti, M.L. et al. Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation. Hum. Mutat. 35, 927–935 (2014).

    Article  CAS  Google Scholar 

  25. Nilsson, A., Mardinoglu, A. & Nielsen, J. Predicting growth of the healthy infant using a genome scale metabolic model. NPJ Syst. Biol. Appl. 3, 3 (2017).

    Article  Google Scholar 

  26. Brunk, E. et al. Systems biology of the structural proteome. BMC Syst. Biol. 10, 26 (2016).

    Article  Google Scholar 

  27. Berman, J.H.M. et al. The protein data bank. Nucleic Acids Res. 106, 16972–16977 (2000).

    Google Scholar 

  28. Preciat Gonzalez, G.A. et al. Comparative evaluation of atom mapping algorithms for balanced metabolic reactions: application to Recon3D. J. Cheminform. 9, 39 (2017).

    Article  Google Scholar 

  29. Noronha, A. et al. ReconMap: an interactive visualization of human metabolism. Bioinformatics 33, 605–607 (2017).

    CAS  PubMed  Google Scholar 

  30. Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    Article  CAS  Google Scholar 

  31. Whirl-Carrillo, M. et al. Pharmacogenomics knowledge for personalized medicine. Clin. Pharmacol. Ther. 92, 414–417 (2012).

    Article  CAS  Google Scholar 

  32. Ye, Y. & Godzik, A. Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 19 (Suppl. 2), ii246–ii255 (2003).

    PubMed  Google Scholar 

  33. Kris, M.G. et al. Efficacy of gefitinib, an inhibitor of the epidermal growth factor receptor tyrosine kinase, in symptomatic patients with non-small cell lung cancer: a randomized trial. J. Am. Med. Assoc. 290, 2149–2158 (2003).

    Article  CAS  Google Scholar 

  34. von Bülow, R. et al. Defective oligomerization of arylsulfatase a as a cause of its instability in lysosomes and metachromatic leukodystrophy. J. Biol. Chem. 277, 9455–9461 (2002).

    Article  Google Scholar 

  35. Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

    Article  CAS  Google Scholar 

  36. Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012).

  37. Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 6, pl1 (2013).

    Article  Google Scholar 

  38. Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).

    Article  Google Scholar 

  39. Villa, G.R. et al. An LXR-cholesterol axis creates a metabolic co-dependency for brain cancers. Cancer Cell 30, 683–693 (2016).

    Article  CAS  Google Scholar 

  40. Geng, F. et al. Inhibition of SOAT1 suppresses glioblastoma growth via blocking SREBP-1-mediated lipogenesis. Clin. Cancer Res. 22, 5337–5348 (2016).

    Article  CAS  Google Scholar 

  41. Adzhubei, I., Jordan, D.M. & Sunyaev, S.R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. 7, 7.20 (2013).

    Google Scholar 

  42. Zielinski, D.C. et al. Pharmacogenomic and clinical data link non-pharmacokinetic metabolic dysregulation to drug side effect pathogenesis. Nat. Commun. 6, 7101 (2015).

    Article  Google Scholar 

  43. Orth, J.D., Thiele, I. & Palsson, B.Ø. What is flux balance analysis? Nat. Biotechnol. 28, 245–248 (2010).

    Article  CAS  Google Scholar 

  44. Lamb, J. et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935 (2006).

    Article  CAS  Google Scholar 

  45. Kuhn, M., Campillos, M., Letunic, I., Jensen, L.J. & Bork, P. A side effect resource to capture phenotypic effects of drugs. Mol. Syst. Biol. 6, 343 (2010).

    Article  Google Scholar 

  46. Fischer, A., Sananbenesi, F., Mungenast, A. & Tsai, L.-H. Targeting the correct HDAC(s) to treat cognitive disorders. Trends Pharmacol. Sci. 31, 605–617 (2010).

    Article  CAS  Google Scholar 

  47. Xie, L., Xie, L., Kinnings, S.L. & Bourne, P.E. Novel computational approaches to polypharmacology as a means to define responses to individual drugs. Annu. Rev. Pharmacol. Toxicol. 52, 361–379 (2012).

    Article  CAS  Google Scholar 

  48. Hopkins, A.L. Network pharmacology. Nat. Biotechnol. 25, 1110–1111 (2007).

    Article  CAS  Google Scholar 

  49. Brunk, E. & Rothlisberger, U. Mixed quantum mechanical/molecular mechanical molecular dynamics simulations of biological systems in ground and electronically excited states. Chem. Rev. 115, 6217–6263 (2015).

    Article  CAS  Google Scholar 

  50. Bordbar, A. et al. Personalized whole-cell kinetic models of metabolism for discovery in genomics and pharmacodynamics. Cell Syst. 1, 283–292 (2015).

    Article  CAS  Google Scholar 

  51. King, Z.A. et al. BiGG Models: a platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res. 44 D1, D515–D522 (2016).

    Article  CAS  Google Scholar 

  52. Hastings, J. et al. The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res. 41, D456–D463 (2013).

    Article  CAS  Google Scholar 

  53. Brennan, C.W. et al. The somatic genomic landscape of glioblastoma. Cell 155, 462–477 (2013).

    Article  CAS  Google Scholar 

  54. Noor, E., Haraldsdóttir, H.S., Milo, R. & Fleming, R.M.T. Consistent estimation of Gibbs energy using component contributions. PLOS Comput. Biol. 9, e1003098 (2013).

    Article  CAS  Google Scholar 

  55. Quek, L.-E. et al. Reducing Recon 2 for steady-state flux analysis of HEK cell culture. J. Biotechnol. 184, 172–178 (2014).

    Article  CAS  Google Scholar 

  56. Heirendt, L. et al. Creation and analysis of biochemical constraint-based models: the COBRA Toolbox v3.0. Preprint at (2017).

  57. Dawson, P.A., Lan, T. & Rao, A. Bile acid transporters. J. Lipid Res. 50, 2340–2357 (2009).

    Article  CAS  Google Scholar 

  58. Xu, D. & Zhang, Y. Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment. Sci. Rep. 3, 1895 (2013).

    Article  Google Scholar 

  59. Zhou, H., Gao, M., Kumar, N. & Skolnick, J. SUNPRO: Structure and function predictions of proteins from representative organisms (2012).

  60. Roy, A., Kucukural, A. & Zhang, Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protoc. 5, 725–738 (2010).

    Article  CAS  Google Scholar 

  61. Kim, S. et al. PubChem substance and compound databases. Nucleic Acids Res. 44 D1, D1202–D1213 (2016).

    Article  CAS  Google Scholar 

  62. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44 D1, D457–D462 (2016).

    Article  CAS  Google Scholar 

  63. Kinsella, R.J. et al. Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database 2011, bar030 (2011).

    Article  Google Scholar 

  64. Rahman, S.A. et al. Reaction Decoder Tool (RDT): extracting features from chemical reactions. Bioinformatics 32, 2065–2066 (2016).

    Article  Google Scholar 

  65. First, E.L., Gounaris, C.E. & Floudas, C.A. Stereochemically consistent reaction mapping and identification of multiple reaction mechanisms through integer linear optimization. J. Chem. Inf. Model. 52, 84–92 (2012).

    Article  CAS  Google Scholar 

  66. Kumar, A. & Maranas, C.D. CLCA: maximum common molecular substructure queries within the MetRxn database. J. Chem. Inf. Model. 54, 3417–3438 (2014).

    Article  CAS  Google Scholar 

  67. Gatto, F., Miess, H., Schulze, A. & Nielsen, J. Flux balance analysis predicts essential genes in clear cell renal cell carcinoma metabolism. Sci. Rep. 5, 10738 (2015).

    Article  CAS  Google Scholar 

  68. Rose, A.S. & Hildebrand, P.W. NGL Viewer: a web application for molecular visualization. Nucleic Acids Res. 43 W1, W576–W579 (2015).

    Article  CAS  Google Scholar 

Download references


The results here are in whole or part based upon data generated by the TCGA Research Network: This work was funded by the Novo Nordisk Foundation Center for Biosustainability and the Technical University of Denmark (grant number NNF10CC1016517), the National Institutes of Health (grant GM057089 to B.O.P.) and by the Luxembourg National Research Fund (FNR) through the National Centre of Excellence in Research (NCER) on Parkinson's disease and the ATTRACT programme (FNR/A12/01), by the European Union's Horizon 2020 research and innovation programme under grant agreement No 668738, by the Institutional Strategy of the University of Tübingen (German Research Foundation DFG, ZUK 63), and by Google Inc. (Summer of Code 2016). RCSB PDB is funded by the National Science Foundation (NSF DBI-1338415 to S.K.B.), the Department of Energy, and the National Institutes of Health (NIGMS and NCI). This research used resources of the National Energy Research Scientific Computing Center. The authors gratefully acknowledge P. Mischel and W. Zheng for experimental help and discussions on GBM, N. Lewis, A. McCammon, J. Mesirov, J.M. Thornton, J. Monk, and J. Lerman for scientific discussions and Z. King for help with Escher integration in RCSB PDB, M. Abrams for manuscript editing, V. Kohler and A.E. Kärcher-Dräger for drawing the platelet and RBC map in Escher, and F. Monteiro and M.A.P. Oliveira for help in reconstructing the dopamine subsystem.

Author information

Authors and Affiliations



Conceptualization: E.B., I.T., and D.C.Z.; methodology, reconstruction of metabolic network: S.S., I.T., R.M.T.F., A.D.D., A.H., and M.K.A.; reconstruction of GEM-PRO: E.B., N.M., and A.S.; 3D-hotspot analysis: E.B., A.P., A.S., and P.W.R.; machine learning: D.C.Z.; PDB visualization: A.A., A.P., A.D., R.M.T.F., and S.K.B.; atom–atom mapping: G.A.P.G. and R.M.T.F.; model testing and validation: I.T., R.M.T.F., S.S., M.K.A., D.C.Z., A.N., and F.G.; cell-specific and infant model simulations: M.K.A., A.N., and F.G.); investigation, E.B., D.C.Z., and G.A.P.G.; writing, original draft: E.B. and B.O.P.; writing, review and editing: all authors; funding acquisition: I.T., R.M.T.F., S.K.B., J.N., and B.O.P.; resources, I.T., R.M.T.F., S.K.B., J.N., and B.O.P.; supervision: I.T., R.M.T.F., S.K.B., and B.O.P.

Corresponding authors

Correspondence to Ines Thiele or Bernhard O Palsson.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Iterative model building of Recon 3

The published version Recon 236 was significantly expanded by addition of new reactions, metabolites, and genes. Simultaneously, the network content was refined for its gene-protein-reaction associations, thermodynamic infeasibility of reactions and reaction directionality, leading to Recon 3. The model was subjected to an expanded metabolic objective tests to ensure broader coverage of biochemical functions. Such iterative model building method greatly improved the content of all the models involved.

Supplementary Figure 2 Statistics of Recon 3.

A. The reaction content of Recon 3 was categorized as per the major metabolic category. For the corresponding metabolic subsystem, follow Figure S3. B. The new reactions that led to assembly of Recon 3 are shown with their corresponding major metabolic category involved. C. The highest number of newly added metabolites are shown with their corresponding major metabolism involved. D. The newly added genes in Recon 3 are shown with their corresponding major metabolism. Interestingly, lipid metabolism was top scorer in each category.

Supplementary Figure 3 Subsystem comparison between Recon 2 and Recon 3.

Shown is the gain in the reaction content per metabolic subsystem. Additionally, ten new subsystems were introduced to include metabolic pathways of aminoacyl-tRNA biosynthesis, hippurate metabolism, leukotriene metabolism, N-glycan metabolism, nucleotide metabolism, peptide metabolism, protein assembly, protein degradation, protein modification, and vitamin K metabolism. On the other hand, cysteine metabolism of Recon 2 was merged with ‘Methionine and cysteine metabolism’ in Recon 3.

Supplementary Figure 4 Simulations of infant growth on human breast milk.

Good agreement between the simulated growth curves using Recon3D and HMR 2.00 and with the growth standards from the World Health Organization (WHO). The predicted growth curves show the cumulative weight gain from 180 growth simulations with age dependent nutrient intake, biomass composition and activity level. The discrepancy observed at 3-6 months is due to differences in the Kcal to ATP conversion factor between fat and glucose.

Supplementary Figure 5 GEM-PRO workflow for mapping gene identifiers to the UniProt, RefSeq, and Ensembl databases when considering isoforms.

The example shown here is for Entrez gene ID 314, with two isoforms, 314.1 and 314.2. Taking the gene ID without the Recon 3 isoform IDs, we are able to map it directly to the UniProt database which contains 2 annotated isoforms, and then map them back to the Entrez gene IDs. A separate workflow maps the gene ID (without isoform ID) to the RefSeq database, and transcript names are utilized to assign isoforms. Once a UniProt identifier has been found, we query the PDB database for all corresponding protein structures.

Supplementary Figure 6 Distribution of total energy-related (PSQS) scores for all 3D protein structures.

In (a), distribution of X-ray structure resolution for all PDBs mapped to genes in Recon 3. In light green are all PDB IDs, while in dark green are the selected structures that are best representative of each gene after the QC/QA steps. In (b), total PSQS scores for all homology models in Recon 3. A lower PSQS score indicates higher quality.

Supplementary Figure 7 Predictive accuracy of algorithmically derived atom mappings versus manually curated atom mappings.

A reaction is accurately predicted if each substrate atom is mapped to the correct product atom. Metabolic reactions can be classified by the enzyme that catalyses a reaction using four digits known as EC numbers. The top level EC number indicate the type of reaction that an enzyme carries out, therefore, to test the accuracy of the Reaction Decoder Tool, DREAM, and CLCA algorithms for different reaction types, we illustrate the predictive accuracy for the 512 curated reactions according to the reaction type as defined by their top level EC number.

Supplementary Figure 8 SBGN-PD map

SBGN-PD map view of the SBML Level 3 Version 1 file with Layout and Render extension that was generated from the manually drawn CellDesigner file of Recon 2.01. This map can be downloaded from

Supplementary Figure 9 Escher map view of Recon 2.01.

This map can be downloaded from

Supplementary Figure 10 Escher map of the human red blood cell

Escher map of the human red blood cell, redrawn from iAB-RBC-28343,127.

Supplementary Figure 11 Escher map of the human platelet cell

Escher map of the human platelet cell, redrawn from iAT-PLT-636127.

Supplementary Figure 12 Disease Networks in Recon3D

(a) (Left) All metabolic subsystems that map to damaging or potentially damaging variants in Recon 3D.

(Right) The landscape of protein motifs or domain types after filtering missense mutations using a 3D protein domain hotspot analysis. (b) (Left) We found that 13 of the 26 genes with 3D localized mutation hotspots catalyze metabolic reactions in Recon 3D. Using information from the metabolic network, protein structural domains, and disease associations, we laid out a subset of genes in a disease connectivity network. Visualization of this network reveals the diversity in metabolic roles and biological assemblies among this set of genes. Red outlined ovals indicate the number of 3D localized mutations in a given domain, the green squares indicate the number of representative domains linked to a given gene, the beige rectangles indicate the metabolic subsystems associated with a given gene, and the red triangles represent disease-associated SNPs that overlap with a given missense cancer mutation. In several cases, mutation positions without known effects are found to be associated with other diseases (e.g., Cowden syndrome 1 and Bannayan-Riley-Ruvalcaba syndrome in PTEN). (Right) Striking similarities were found in protein structure within the same subset of genes using a structure-based connectivity network, where links represent the degree of structural overlap (a significant score is typically > 0.4, determined from structural alignment98). The protein chain is indicated by a green square and whether mutations have known effects (K) or unknown effects (U) is annotated in the red outlined oval. These findings enable future studies that compare mutations with known effects to those with unknown effects in structurally similar (but not identical) regions.

Supplementary Figure 13 Single gene deletion simulations for GBM-specific cell line models.

Supplementary Figure 14 3D hotspot analysis statistics

Extension of the 3D hotspot analysis was performed on the metabolic SNP database (Table S19; Supplementary Data S3.xlsx). This dataset contains 1,385 unique genes with 3,649 SNPs; of these, 604 SNPs map to protein structures and are considered “deleterious” and 270 map to structures and are found to be “tolerated” (based on their SIFT predictors145). We found that deleterious mutations were much more likely to have co-occurring deleterious mutations in 5 and 10 Angstrom spheres than mutations that are tolerated (using a two-tailed t-test, p <0.5 and p> 0.1 for deleterious and tolerated mutations, respectively).

Supplementary Figure 15 Basic and detailed workflows for identifying drug-induced perturbed pathways and linking them to their indication.

(a) Basic workflow for identifying drug-induced perturbed pathways and linking them to their indication. (b) Detailed workflow for the Genetic Algorthm: 1) Inputs to the algorithm are a set of response variables for each gene expression set (either MetCHANGE scores or gene expression changes), a binary presence/absence vector for whether each sample was treated with a drug that has the side effect or indication, and the desired maximum number of predictor variables desired. The latter was set based on the number of treated gene expression sets in order to minimize the potential for overfitting. 2) At initiation, the genetic algorithm generates a ‘population’ of random guesses at the predictor variables, termed ‘individuals’, and assigns them either a value of -1, 0, or 1. For each individual, all gene expression samples are scored as the response variables (MetCHANGE or gene expression changes) multiplied by the candidate signature. 3) Each gene expression sample is then ranked and a receiver operator characteristic (ROC) curve is generated and area under the curve (AUC) is calculated using the input presence/absence vector for the side effect or indication. The sample AUCs are the maximization objective of the genetic algorithm. 4) The genetic algorithm subroutines are then used to generate a new population, biasing towards higher AUCs. Best solutions are maintained without modification, and lower scoring individuals are combined (‘crossed over’) and modified (‘mutated’) to search the solution space in a heuristic fashion. The termination criteria is typically a number of generations without improvement; however, we applied a simple maximum time termination criteria, as obtaining a global optimum was not deemed essential to gain biological insight. 5) The signature yielding the highest prediction AUC is considered the best predictor set. In the example case, the resultant AUC is 1.0, a perfect predictor for the sample set. 6) To assess overfitting and hence the predictive potential of the metabolic signature, 10-fold cross validation is performed by generating 10 partitions of 90% of the data to train signatures and predict the remaining 10 partitions of 10% of the data. To find signatures that have constant predictive power, the cross validation signatures were summed, and high scoring metabolites were considered the conserved metabolic response signature for the side effect or indication.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 (PDF 2987 kb)

Life Sciences Reporting Summary (PDF 129 kb)

Supplementary Tables and Supplementary Notes

Supplementary tables1–9 and Supplementary notes1–6 (PDF 1987 kb)

Supplementary Datafiles 1-10

Reconstruction; Recon3D (XLSX 2106 kb)

Supplementary Datafiles 11-14

File contains all GEM-PRO related content for Recon3D.Contains Supplementary Data Files 11-14. (XLSX 3195 kb)

Supplementary Datafiles 15-26

File contains all mappings to variant disease SNPs/somatic mutations, FATCAT representative domain annotations and drug indication analyses. Contains Supplementary Data Files 15-26. (XLSX 2972 kb)

Supplementary Datafile 27

Recon 3D GEM-PRO has been consolidated into a shareable JSON file, which can be used to start structural analyses. (ZIP 444 kb)

Supplementary Software

IndiFinder.m (ZIP 3 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brunk, E., Sahoo, S., Zielinski, D. et al. Recon3D enables a three-dimensional view of gene variation in human metabolism. Nat Biotechnol 36, 272–281 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing