Main

An understanding of metabolism is fundamental to comprehending the phenotypic behavior of all living organisms, including humans, where metabolism is integral to health and is involved in much of human disease. High quality, genome-scale 'metabolic reconstructions' are at the heart of bottom-up systems biology analyses and represent the entire network of metabolic reactions that a given organism is known to exhibit1. The metabolic-network reconstruction procedure is now well-established2 and has been applied to a growing number of model organisms3. Metabolic reconstructions allow for the conversion of biological knowledge into a mathematical format and the subsequent computation of physiological states1,4,5 to address a variety of scientific and applied questions3,6. Reconstructions enable network-wide mechanistic investigations of the genotype-phenotype relationship. A high-quality reconstruction of the metabolic network is thus of interest to the community of researchers focused on the systems biology of metabolism of a target organism.

Of the reconstructions of human metabolism that have appeared to date, perhaps the most widely used is Recon 1 (ref. 7), which represents a knowledgebase and has also been converted into many predictive models. These models have been used for various biomedical applications, including the prediction of biomarkers for inborn errors of metabolism (IEMs)8, cancer drug targets9,10 and off-target drug effects11. Moreover, they have been used to evaluate missing metabolic functions systematically12,13 and to model host-microbe interactions14,15. These studies demonstrated the potential of metabolic modeling to advance understanding of human metabolism in health and disease.

Various reconstructions of the human metabolic network exist, with only partially overlapping content (ref. 16 and Supplementary Note 1). In addition to Recon 1, a global human metabolic reconstruction, EHMN (Edinburgh Human Metabolic Network)17, has also been published. Manually curated cell type–specific reconstructions are also available, including the comprehensive reconstruction of human hepatocytes, HepatoNet1 (ref. 18), a small intestinal enterocyte reconstruction19 and other metabolic models for macrophages14, hepatocytes20 and kidney cells11. Moreover, a module for Recon 1 that models acylcarnitine and fatty-acid oxidation (Ac-FAO) has recently been published8, which includes the metabolic surroundings of many biomarkers measured in the worldwide newborn screening program21. Recon 1 has also been used for the automated generation of cell-specific and tissue-specific models using various 'omics' data sets22,23, and to generate metabolic reconstructions semiautomatically for other mammals, such as the mouse24.

It is clear that 'competing' (that is, different) reconstructions and reconstruction approaches coexist, but all have the common goal of providing an up-to-date, comprehensive and high-quality reconstruction, either at the global or cell-specific scale. Rather than continuing to duplicate efforts, a substantial fraction of the community has pooled resources to generate a consensus human metabolic reconstruction from many of the sources cited above.

Here we describe Recon 2, a community-driven expansion of the global human metabolic reconstruction, Recon 1. Much of this expansion was performed at reconstruction 'jamboree' meetings25, focused events at which domain experts apply their knowledge to refine and consolidate biochemical knowledge from existing reconstructions and published literature. Members of the Saccharomyces cerevisiae26,27 and Salmonella typhimurium LT2 (ref. 28) communities have used such a jamboree approach. The jamboree events provided the opportunity to establish common standards (and suitable links to other databases) for the consensus reconstruction and the format of its content, to simplify its reuse and extension, and to increase its transparency and commitment to its development via participation of many stakeholders in the community. Here we also demonstrate the improved predictive capability of Recon 2 over that of its predecessor. We mapped exometabolomic data29 onto Recon 2 and used proteomics data to generate cell type–specific metabolic models, which we then investigated for their functional properties.

Results

Reconstruction approach

To assemble Recon 2, we added metabolic information present in four different resources (EHMN17, HepatoNet1 (ref. 18), Ac-FAO module8 and the human small intestinal enterocyte reconstruction19) to the content of Recon 1 following a step-wise process (Fig. 1). We added more than 370 transport and exchange reactions, based on a review of literature. We applied unambiguous third-party identifiers for cellular compartments, metabolites, enzymes and reactions. We mapped the content from the DrugBank30 database, which lists experimental and US Food and Drug Administration−approved drugs, to individual enzymes and reactions. Ninety-five percent of metabolic reactions were mass-balanced and charge-balanced5,31 (Supplementary Table 1), except for those containing metabolites that have either no defined chemical formula or a generic formula. We tested Recon 2 for self-consistency2, a process that included gap analysis2 and leak tests32.

Figure 1: Overview of the community-driven reconstruction approach to assemble Recon 2.
figure 1

Comp. EHMN, compartmentalized EHMN; and hs_sIEC611, small intestinal enterocyte reconstruction.

Benchmarking Recon 2 against Recon 1

Recon 2 accounts for 1,789 enzyme-encoding genes, 7,440 reactions and 2,626 unique metabolites distributed over eight cellular compartments, which is a large increase in comprehensiveness relative to Recon 1 (Table 1). Such an increase in scope does not necessarily constitute an improvement in utility over the previous version: expanding the reconstruction to resolve existing gaps and dead-end metabolites may introduce additional gaps and dead ends elsewhere. To demonstrate an improvement of the network, both coverage and functional improvements must be considered.

Table 1 Comparison of major features of Recon 2 and its predecessor Recon 1

To quantify the overall improvements achieved through the community-driven expansion and refinement in the global human metabolic reconstruction, we compared the information coverage, topological and functional properties of Recon 2 with those of Recon 1 (Table 1). The reaction content was almost doubled, much of which belonged to one of the nine new pathways (Fig. 2). Moreover, 62% (61/99) of the existing pathways have been expanded in Recon 2, and reaction coverage in 29 pathways, accounting for 16.5% (1,231/7,440) of the reactions, remained unchanged. A total of 307 dead-end metabolites (metabolites that are either only produced or only consumed in the reconstruction) from Recon 1 were resolved in Recon 2, whereas 32 remained as participants in only one reaction. As a result of the expansion, 1,144 new dead-end metabolites, mostly from EHMN, were introduced. These will need to be connected to the rest of the network in subsequent efforts. Blocked reactions cannot carry a nonzero flux in any steady-state condition because they contain one or more dead-end metabolites or are in a linear pathway with such reactions. The expanded coverage of metabolic information resolved 827 blocked reactions present in Recon 1, and 443 blocked reactions remained (Table 1). The number of remaining and new blocked reactions and dead-end metabolites highlights that this current update is not intended to be the final compendium of human metabolism, but it is a major advance over Recon 1 and represents our current, continually evolving knowledge.

Figure 2: Pathway coverage in Recon 1 and Recon 2.
figure 2

Only pathways with added reaction content in Recon 2 are shown. The 29 pathways that have identical reaction content in both reconstructions are not shown.

A metabolic task is defined as a nonzero flux through a reaction or through a pathway leading to the production of a metabolite B from a metabolite A. Examples of such tasks include the synthesis of all known precursors to produce a cell (biomass reaction; Supplementary Note 2) and the generation of energy via oxidative phosphorylation or fermentation (Supplementary Table 2). A total of 354 metabolic tasks were defined. Although a particular cell type is not capable of fulfilling all these metabolic tasks, Recon 2 should be able to fulfill these tasks because it is a global metabolic reconstruction. Recon 2 carried a nonzero flux for all tasks, compared with Recon 1, which achieved this functionality for only 83% of the tasks (Table 1).

To benchmark the models derived from both reconstructions against an independent data set, we used a manually assembled compendium of IEMs8 as a gold standard. This compendium accounts for 330 IEMs, such as phenylketonuria and orotic aciduria, along with their known metabolite biomarkers. As Recon 2 captured more metabolic genes, more IEMs could be mapped (Table 1). In Recon 2, almost all of the mapped IEMs affected the reaction activity, as no complementary isoenzymes are known for the absent enzymes (consistent with their occurrence as IEMs). We compared the predictive potential of Recon 2 and Recon 1 for associated biomarkers for the mapped IEMs (Fig. 3), in a process analogous to gene-deletion studies in microbial modeling. Recon 2 predicted 54 reported biomarkers for 49 different IEMs, with an accuracy of 77%. The coverage of predicted biomarkers and the accuracy was much lower for Recon 1, with 31 reported biomarkers for 29 IEMs and an accuracy of 63% (Fig. 3). This comparison demonstrates that the increased scope of Recon 2 led to a higher coverage of IEM-related biomarkers mapped and to an increase in predictive power.

Figure 3: Predicted biomarkers for IEMs.
figure 3

(a) Comparison of the prediction accuracy of Recon 1 and Recon 2 against the gold standard8. (b) Correct and incorrect predictions. IEMs and biomarkers are sorted by subsystem. Bright yellow, amino-acid metabolism; green, central metabolism; blue, hormones; yellow, lipid metabolism; pink, nucleotide metabolism; lilac, vitamin and cofactor metabolism. Blue and red shading corresponds to predicted increase and decrease in biomarker, respectively. Blue and red lines represent reported increase and decrease of the biomarker in plasma, respectively. 34dhphe, 3,4-dihydroxy-L-phenylalanine; 3mlda, 3-methylimidazoleacetic acid; 5htrp, 5-hydroxy-L-tryptophan; bhb, (R)-3-hydroxybutanoate; tetdec2crn, tetradecadienoyl carnitine; tetdece1crn, tetradecenoyl carnitine. (See Supplementary Table 5 for complete names of the IEMs.)

Based on the accurate predictive capability of Recon 2 for biomarkers and for the metabolic tasks, the benchmarking demonstrated an increase in both scope and predictive accuracy of Recon 2 relative to its predecessor.

Recon 2 captures the majority of known exometabolites

Recon 2 accounts for 642 extracellular metabolites, which should be found in cell culture medium and in biofluids such as plasma and urine. When comparing these extracellular metabolites with a reported subset of a cancer exometabolome33 consisting of 140 metabolites, the majority of the metabolites were indeed present in the extracellular compartment (Fig. 4a and Supplementary Table 3). Using a flux variability analysis34, we predicted in silico the uptake and release profile of Recon 2. Recon 2 had high sensitivity in predicting correctly the uptake and release for 91 metabolites (sensitivity values of 0.92 and 0.94, respectively; Supplementary Table 3). However, the capability of Recon 2 to predict true negatives (that is, metabolites that cannot be taken up or released) was very low. For instance, 14% (13/91) of the metabolites could not be released by any of the tested cancer cell lines, whereas all but one metabolite was released in silico. As Recon 2 represents the combined metabolic capability of all cells in the human body, not just of neoplastic cells, the low sensitivity is expected. The mismatches could be used to guide the assembly of cancer-specific metabolic models or to refine an existing cancer metabolic model9. It is noteworthy that Recon 2 contains many more extracellular metabolites (551), which were not seen in the experimental exometabolome, presumably in part because of the targeted liquid chromatography–tandem mass spectrometry method used33, as more than 1,000 metabolite peaks can easily be observed in human serum35.

Figure 4: Comparison of metabolomic data with the extracellular metabolites present in Recon 2 metabolites.
figure 4

(a) Comparison with the cancer exometabolome33 (see Supplementary Table 3 for a breakdown of the 18 cancer exometabolites that are absent from Recon 2, the 31 for which no exchange reaction is present, and the 91 for which uptake or release profile comparisons could be performed). (b) Comparison with the HMDB36 (see Supplementary Fig. 6).

We also compared the Recon 2 exometabolome and the metabolites reported in the Human Metabolome Database (HMDB)36 as being detectable in biofluids (Fig. 4b). Biofluid information could be found for about half of the metabolites in Recon 2 identified in the HMDB. About 44% of these metabolites were also present in the extracellular compartment, indicating that there are still some transport and metabolic routes missing in Recon 2.

In summary, Recon 2 captured many of the reported exometabolites and biofluid metabolites, illustrating its comprehensiveness and the value of annotating with multiple metabolite identifiers, permitting the integration of data from many sources.

Generation of draft, cell type–specific models

A metabolic reconstruction is unique to a genome, and thus to an organism, but condition-specific constraints can be applied to create a condition-specific model from the reconstruction. Thus, one reconstruction can give rise to many models. We mapped expression data from the Human Protein Atlas37 for 65 cell types, capturing information for 25% (451/1,789) of the unique gene products in Recon 2 (Supplementary Fig. 1). These data were used together with a published algorithm5,38 to generate 65 draft cell type–specific metabolic models consisting of 2,426 ± 467 reactions (± s.d.) and 1,262 ± 204 transcripts (Fig. 5). Of the 593 core reactions present in all cell-type models, more than half of them were transport reactions (Supplementary Table 4). Furthermore, 33% (2,463/7,440) of the Recon 2 reactions appeared in none of the cell-type models, with almost 40% (968/2,463) of those belonging to the subsystem 'lipids' (Supplementary Fig. 2). Based on the cell type–specific models, 26% (457/1,789) of the genes in Recon 2 represent the core genes (that is, they were found in all cell type–specific models), and 58% of the genes were part of two or more models. Only 32 genes were specific to a particular cell type–specific model. The remaining 14% (245/1,789) of the genes were not captured by any cell type–specific model. The glandular-cell models were highly correlated based on presence and absence of genes and reactions, whereas epithelial-cell models were less correlated (Supplementary Figs. 3,4). Overall, we observed higher correlation on the subsystems level with the 'transport' subsystem being the most correlated one, highlighting the importance of nutrient uptake and secretion for cellular function (Supplementary Figs. 3,5).

Figure 5: Summary properties of the 65 draft cell type–specific models.
figure 5

(a) Distribution of reaction (left) and genes (right) across the models. (b) The protein expression data captured proteins from all subsystems almost evenly. In contrast, most of reactions in the core reaction set, consisting of 593 reactions, belonged to the transport subsystem. The second largest subsystem in Recon 2, lipid metabolism, represented only a small fraction of the core reaction set.

The automatically generated hepatocyte model captured 61% (1,109/1,823) of the reactions present in a published hepatocyte model20, whereas 64% (1,932/3,041) of this model's reactions were not in the published model. When comparing our draft hepatocyte model with the published hepatocyte model, HepatoNet1 (ref. 18), 1,098 reactions were present in both, whereas 64% (1,943/3,041) were unique to our model and 57% (1,441/2,539) of reactions were unique to HepatoNet1. Such discrepancies between our draft model and the published cell-type models may serve as an indication of areas of attention for subsequent manual curation and assessment.

We investigated the metabolic tasks that each cell type–specific model can perform. On average, the cell type–specific models had nonzero flux for 174 ± 32 of 354 metabolic tasks. Thirty-one metabolic tasks could be carried out by all such models, many of which tested different aspects of amino-acid metabolism, and 44 metabolic tasks had a zero flux for all models. Of all of these models, only the hepatocyte model could generate urea, via the urea cycle, which is consistent with current knowledge. Small intestinal enterocytes are also capable of urea synthesis but are currently not captured by the Human Protein Atlas37. Notably, 25% (16/65) of the models had a nonzero flux through the biomass reaction, which means that these models contain all necessary reactions either to take up or synthesize all the precursors defined in the biomass reaction.

Recon 2 includes mappings of drug actions to enzymes

Recon 2 includes 2,657 metabolic enzymes and 1,052 enzymatic complexes, many of which are known drug targets. We queried DrugBank30, a comprehensive resource that includes drug-to-enzyme mappings for >6,000 small-molecule and peptide or protein drugs, to allow drugs (and their actions) to be mapped to the enzymes of Recon 2. We found that 1,290 drugs were mapped to 308 enzyme and enzymatic complexes. This equated to 3,168 drug-enzyme or drug-complex interactions, of which 841 were specified as inhibitory. These mappings are included in both the global reconstruction and the cell type–specific models, providing a starting point for the simulation of drug actions with both constraint-based and kinetic modeling1,39,40.

Discussion

Our results illustrate that (i) Recon 2 is a comprehensive metabolic resource and serves as an effective predictive model; (ii) mapping of exometabolomic data onto Recon 2 provides a starting point for further iterative expansion and refinement; (iii) protein expression data and Recon 2 can be used to generate draft cell type–specific metabolic models; and (iv) comparative analysis of cell type–specific models provides insight into alternate metabolic strategies.

The metabolic reconstruction process is inherently iterative, as increasing biochemical and genomic knowledge is generated about the target organism over time. This calls for periodic updates and expansion in the coverage and content of a reconstruction2; thus we adopted the 'Recon 1' and 'Recon 2' naming convention in analogy with the 'build X' convention for the assembly of the human genome sequence and similar conventions used in the naming of iteratively released metabolic reconstructions in yeast27. The published resources for human metabolism differ in syntax and content; for instance, a comparison of five different resources revealed that only a small number of overlapping reactions were present in all resources (ref. 16 and Supplementary Note 1). These discrepancies make it difficult to compare and combine metabolic reconstructions. We overcame this issue by manually curating overlapping entries. The presented consensus reconstruction of human metabolism is fully semantically annotated41 with references to persistent and publicly available chemical and gene databases, unambiguously identifying its components and increasing its applicability for third-party users (including automated processing by software). Moreover, the work expanded beyond combining existing resources, by adding transport and absorption reactions known to occur in epithelial cells of the gastrointestinal tract and the renal tubules. Drug information was mapped from DrugBank30, providing a comprehensive starting point to investigate off-target effects of drugs11 or to obtain information on known drugs for drug target predictions studies9.

We improved the predictive potential of Recon 2 through the addition of new content. The number of IEMs that Recon 2 captured and the accurate prediction of known biomarkers (Fig. 3) demonstrate the substantial functional advancement achieved through the expansion and refinement. We illustrated the potential of the constraint-based modeling approach by generating a global model, not tailored to a particular application, which can nonetheless predict accurately many distinct biomarkers for many IEMs. In addition, Recon 2 also predicted some previously undescribed biomarkers, which have not yet been measured8. With the increasing sophistication of targeted metabolomic approaches42, biomarker predictions may help to guide such analyses. Additionally, large cohort studies have started to connect genotype with metabotype43. As Recon 2 captures most of the measured metabolites, it provides a resource for investigating the connection between genotype and metabotype and ultimately phenotypes based on these data.

We mapped two large-scale exometabolome resources to Recon 2, showing that Recon 2 captures most of the exometabolites and biofluid metabolites reported therein. Although this illustrated the comprehensiveness of the reconstruction, it also highlighted that the information content in the reconstruction is still not complete. Many algorithms exist that may assist in proposing missing metabolic and transport reactions in Recon 2 and thus aid subsequent manual curation (Supplementary Note 3)12,13,44. It is also clear that reported biofluid metabolites36 might be of dietary and/or microbial origin, as the mammalian gut microbiota has been found to affect blood composition45 and metabolism46 substantially. To determine the origin of the biofluid metabolites, a comprehensive computational model accounting for microbial metabolism, human metabolism and dietary composition is needed.

Recon 1 has been used together with omics data to generate comprehensive sets of draft cell type–specific metabolic reconstructions23,38. We mapped data from the Human Protein Atlas37 onto Recon 2 to generate 65 cell type–specific metabolic models automatically. Although protein expression data were available for only one-quarter of the Recon 2 gene products, to our surprise the draft cell type–specific models contained many reactions and genes (Fig. 5). The size of many published cell type–specific models, which have been at least in part generated automatically, is comparable23,37. In a comparison of our automatically generated hepatocyte model with two published models18,20, we found reasonable overlap, even though these models were assembled using different methods and information. Our automatically generated hepatocyte model contained alternate reactions, which reflect the proteomic input data and the algorithm used. For instance, the protein expression data were limited to those that had a high confidence level, but by including lower-confidence expression data, combined with a weighting scheme, one may obtain alternate draft models. Moreover, the algorithmic approach currently does not consider transcriptional regulation, thermodynamic constraints and the synthesis cost for each enzyme and its building blocks. Ultimately, manual inspection and curation will be necessary to obtain more versatile, comprehensive, predictive cell type–specific models.

To assess the functionality of the automatically generated cell type–specific models, we tested 354 metabolic tasks. One-quarter of the models could produce all defined biomass precursors, thus enabling them to simulate cell growth. A review of literature revealed that some of these cell types are indeed known to divide in vivo upon injury or induction by specific growth factors. The protein expression data are generated in many cases on tissue biopsies. The growth capabilities could not have been determined from the omics data alone, and this highlights the importance of computational modeling to analyze experimental data. Moreover, the metabolic activity profile, combined with the set of active exchange reactions, could be compared with cell type–specific literature to evaluate the correctness of the metabolic content of each model and to refine the models by enabling or disabling additional tasks, exchange reactions and metabolic content.

We anticipate that, as a result of the improvements over its predecessor, Recon 2 will be widely used and will enable the exploration of new frontiers in research in human metabolism and its role in health and disease. The global model is available via a database at http://humanmetabolism.org/, in SBML format at Biomodels (http://identifiers.org/biomodels.db/MODEL1109130000) and as Supplementary Data 1.

Methods

Reconstruction approach.

The reconstruction of the expanded and refined global human metabolic network, Recon 2, was performed in multiple stages. Intermediate versions of Recon 2 are referred to as 'Recon 1.x'.

Jamboree work.

The jamboree meetings were used to discuss strategies combining the content of the two reconstructions and the required quality control of the finished consensus. The strategy was to start with the compartmentalized Recon 1 reconstruction and incrementally add reactions from the EHMN reconstruction.

Initially, automated approaches31 were applied to Recon 1 (ref. 7) and the EHMN17 to solve the problem of inconsistent naming of the components (compartments, metabolites, genes and reactions). The remaining components that could not be automatically matched to existing database entries were manually annotated during the jamboree meetings. Cellular compartments were annotated with Gene Ontology (GO) terms. Metabolites were annotated with terms from the resources Chemical Entities of Biological Interest (ChEBI), Kyoto Encyclopedia of Genes and Genomes (KEGG) Compound, PubChem Compound and HMDB36, and also IUPAC International Chemical Identifier (InChI) terms where possible. Where metabolites were not present in any existing data resources, many were submitted as new entries to ChEBI. Enzymes were annotated with Enzyme Classification (EC) terms, US National Center for Biotechnology Information (NCBI) Gene identifiers and UniProt terms.

The jamboree meetings also focused on manual curation of the reconstruction content. Reactions were curator-validated and annotated with PubMed literature references, standardized GO evidence codes (Supplementary Table 6) and a confidence scoring system ranging from 0 (no evidence) to 4 (biochemical evidence)2. Metabolites and enzymes were assigned to appropriate cellular compartments. Metabolic reactions were checked to ensure correct stoichiometry, irreversibility, correct assignment of gene association and enzyme rules, and mass and charge balancing, and appropriate transport reactions were added.

Post-jamboree work.

For reactions that occurred in both Recon 1 and EHMN, the co-occurrence was considered to be evidence for their inclusion in Recon 1.x. The jamboree teams had manually evaluated reactions from Recon 1, and reactions unique to the compartmentalized EHMN were added to Recon 1.x only if they were mass- and charge-balanced. The charge state of metabolites was calculated at an assumed pH of 7.2. Reactions were mass- and charge-balanced where possible, as defined in Recon 1. Where additional metabolites and reactions were added from sources such as EHMN, the SuBliMinaL Toolbox31 and COBRA Toolbox5 were used to apply mass and charge balancing. In cases where reactions contained 'generic' compounds (containing –R and –X groups), which were used to represent a set of specific compounds to decomplexify the reconstruction, mass and charge balancing was not possible. An investigation into thermodynamic consistency of Recon 1 reactions47 proposed changes to a small subset of reactions, which were also incorporated. We also updated the gene list based on changes introduced from build 35 to build 37, which includes the removal or replacement of obsolete gene entries.

Subsequently, the Recon 1.x content was extended to account for the content in the published HepatoNet1 (ref. 18) reconstruction. The same procedure for converting HepatoNet1 to the syntax of Recon 1.x was used as described above. As HepatoNet1 is a pruned and cell-specific stoichiometric model, some conventions were different from that in Recon 1.x. Most importantly, the implementation of lipids as some specific pools with fixed fatty-acid distributions was incompatible with the flexible and less specified R system in Recon 1.x, and therefore the lipid reactions were excluded from the merging process. HepatoNet1 accounts for multiple extracellular compartments. For simplicity, all of the HepatoNet1 extracellular metabolites and reactions were assumed to be present in a single common extracellular compartment, following the convention of Recon 1.

Content from both the recently published acylcarnitine–fatty acid oxidation module8 and the metabolic reconstruction for human small intestinal enterocytes19 were also added to Recon 1.x. The latter accounts for two extracellular compartments (luminal and blood side). Again, a single extracellular compartment was assumed.

As a final step of expanding the content of the global human metabolic reconstruction, thorough literature research was performed to identify missing transport and absorption reactions in Recon 1.x. The reconstruction of this module was performed as described previously2.

The reconstruction assembly and conversion to a mathematical model was done using rBioNet48. The uniqueness of reactions in Recon 1.x was determined, and multiple rounds of self-consistency testing and functional assessment of the model performed using established procedures2, which included gap analysis2, leak tests32 and a functional analysis.

The resulting reconstruction was finalized and named Recon 2.

Leak analysis.

A model of Recon 2 was used for the leak analysis, in which all unbalanced reactions were considered inactive by constraining their lower and upper bound flux bounds to zero. The test of mass leaks was performed with a simulation procedure defined in the tutorial of FASIMU32, by looking for steady-state flux distributions that either (i) consume no substrates and generate an output or (ii) consume substrates but do not generate products. The flux distributions representing a leak were analyzed manually to identify causative unbalanced reactions. After balancing or deleting the corresponding reaction(s), the test was repeated until no additional leaks were observed. In all simulations with Recon 1 and Recon 2, all metabolites with defined exchange reactions could be taken up and secreted. Reaction constraints applied to the leak-free Recon 2 model that was used for all computations in this study are listed in Supplementary Table 7.

Topological analysis.

Performing flux variability analysis34, while permitting constrained uptake and secretion of all metabolites with defined exchange reactions has been defined, allowed the identification of blocked reactions in Recon 1 and Recon 2, that is, reactions that could not carry any nonzero flux under this simulation condition. Dead-end metabolites were identified as described previously2. Compound participation was calculated as described by previously1 (Supplementary Note 4).

Functional characterization of Recon 1 and Recon 2.

The metabolic capacity of the network was demonstrated by testing nonzero flux values for 354 metabolic tasks, which were based on Recon 1 and ref. 19. For each of the simulations a steady-state flux distribution was calculated. Each metabolic task was optimized individually by choosing the corresponding reaction in either Recon 1 or Recon 2, if present, as objective function and maximized the flux through the reaction (see Supplementary Table 2).

Mapping the compendium of inborn errors of metabolism and predicting biomarkers.

The compendium of IEMs8 contains information about causative genes and known biomarkers for 330 distinct IEMs. Using gene-reaction associations, the IEMs were mapped onto Recon 1 and Recon 2. A gene-associated reaction was affected if no isoenzyme existed that was not known to cause an IEM. To predict biomarkers for each IEM affecting one or more reaction, the method reported in ref. 49 was followed. Changes in biomarkers resulting from an IEM were compared with reported biomarkers for each IEM. A Fisher's exact test was applied to compute the hypergeometic P value. This analysis was performed for Recon 1 and Recon 2. Mapped IEMs are listed in Supplementary Table 5.

Cancer exometabolome mapping.

Information related to a cancer exometabolome33, which reported consumed and secreted compounds in the culture medium, was manually mapped to the metabolites of Recon 2 (Supplementary Table 3). Flux variability analysis was performed on the exchange reactions, and the uptake-secretion capability was compared with reported consumption and secretion capability of the cancer cells.

Biofluid metabolome mapping.

Information about the biofluid location of metabolites was obtained from the MNDB (version August 2012)36. Mapping was performed through HMDB identifiers, which were recorded during the reconstruction process.

Protein expression data mapping.

Protein expression data were downloaded from the Human Protein Atlas37 in May 2012 and Ensembl identifiers were mapped onto Recon 2 gene products. Gene products with moderate/medium and strong/high levels of expression were assumed to be present, and all others were assumed to be absent. Gene products without data in a cell type were assumed to be absent.

Generation of a draft cell type–specific model compendium.

For each cell type, the presence and absence information from the protein expression data was used as input for the MinMax algorithm38, which was implemented in the COBRA Toolbox5. The biomass reaction and a reaction representing ATP hydrolysis were subsequently added to each model. The IEM compendium was mapped onto each cell type–specific draft model as described above. For each model, its capability to perform the defined metabolic tasks was tested.

SBML models.

Recon 2 is available in the Systems Biology Markup Language format (SBML)50, which is compliant with the Minimal Information Required In the Annotation of Models (MIRIAM) standard41 at http://humanmetabolism.org/, Biomodels (http://identifiers.org/biomodels.db/MODEL1109130000) and as Supplementary Data 1 (which also contains all derived cell type–specific versions). The programming library libAnnotationSBML51 was used to apply unified cross-references in the form of MIRIAM identifiers to most components in the models. Systems Biology Ontology (SBO) terms were applied to specify metabolites, polypeptides and protein complexes, and to make the distinction between biochemical and transport reactions.

DrugBank data mapping.

DrugBank30 was queried via its XML file download, and enzymes were mapped to their equivalent in Recon 2 by consideration of both UniProt identifiers and specified intracellular compartmentalization. Drugs (and their actions) were mapped to enzymes and included in the distributed SBML50 versions of the Recon 2 models as annotations on enzyme species.

Software.

All computations were carried out in the Matlab programming environment (MathWorks, Inc.) using the COBRA Toolbox5 and Tomlab cplex as the linear programming solver (TomOpt, Inc.).