Integrating high-throughput and computational data elucidates bacterial networks


The flood of high-throughput biological data has led to the expectation that computational (or in silico) models can be used to direct biological discovery, enabling biologists to reconcile heterogeneous data types, find inconsistencies and systematically generate hypotheses1,2,3. Such a process is fundamentally iterative, where each iteration involves making model predictions, obtaining experimental data, reconciling the predicted outcomes with experimental ones, and using discrepancies to update the in silico model. Here we have reconstructed, on the basis of information derived from literature and databases, the first integrated genome-scale computational model of a transcriptional regulatory and metabolic network. The model accounts for 1,010 genes in Escherichia coli, including 104 regulatory genes whose products together with other stimuli regulate the expression of 479 of the 906 genes in the reconstructed metabolic network. This model is able not only to predict the outcomes of high-throughput growth phenotyping and gene expression experiments, but also to indicate knowledge gaps and identify previously unknown components and interactions in the regulatory and metabolic networks. We find that a systems biology approach that combines genome-scale experimentation and computation can systematically generate hypotheses on the basis of disparate data sources.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Growth phenotype study.
Figure 2: Characterization of the regulatory network related to the aerobic–anaerobic shift.
Figure 3: Biological network elucidation by a model-centric approach.


  1. 1

    Ideker, T., Galitski, T. & Hood, L. A new approach to decoding life: systems biology. Annu. Rev. Genomics Hum. Genet. 2, 343–372 (2001)

  2. 2

    Palsson, B. O. The challenges of in silico biology. Nature Biotechnol. 18, 1147–1150 (2000)

  3. 3

    Kitano, H. Systems biology: a brief overview. Science 295, 1662–1664 (2002)

  4. 4

    Reed, J. L., Vo, T. D., Schilling, C. H. & Palsson, B. O. An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol. 4, R54.1–R54.12 (2003)

  5. 5

    Bochner, B. R. New technologies to assess genotype–phenotype relationships. Nature Rev. Genet. 4, 309–314 (2003)

  6. 6

    Glasner, J. D. et al. ASAP, a systematic annotation package for community analysis of genomes. Nucleic Acids Res. 31, 147–151 (2003)

  7. 7

    Forster, J., Famili, I., Palsson, B. O. & Nielsen, J. Large-scale evaluation of in silico gene knockouts in Saccharomyces cerevisiae. Omics 7, 193–202 (2003)

  8. 8

    Edwards, J. S. & Palsson, B. O. The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities. Proc. Natl Acad. Sci. USA 97, 5528–5533 (2000)

  9. 9

    Covert, M. W. & Palsson, B. O. Transcriptional regulation in constraints-based metabolic models of Escherichia coli. J. Biol. Chem. 277, 28058–28064 (2002)

  10. 10

    Herrgard, M. J., Covert, M. W. & Palsson, B. O. Reconciling gene expression data with known genome-scale regulatory network structures. Genome Res. 13, 2423–2434 (2003)

  11. 11

    Salmon, K. et al. Global gene expression profiling in Escherichia coli K12. The effects of oxygen availability and FNR. J. Biol. Chem. 278, 29837–29855 (2003)

  12. 12

    Ideker, T. et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292, 929–934 (2001)

  13. 13

    Compan, I. & Touati, D. Anaerobic activation of arcA transcription in Escherichia coli: roles of Fnr and ArcA. Mol. Microbiol. 11, 955–964 (1994)

  14. 14

    Cotter, P. A., Melville, S. B., Albrecht, J. A. & Gunsalus, R. P. Aerobic regulation of cytochrome d oxidase (cydAB) operon expression in Escherichia coli: roles of Fnr and ArcA in repression and activation. Mol. Microbiol. 25, 605–615 (1997)

  15. 15

    Griffin, T. J. et al. Complementary profiling of gene expression at the transcriptome and proteome levels in Saccharomyces cerevisiae. Mol. Cell Proteomics 1, 323–333 (2002)

  16. 16

    Reed, J. L. & Palsson, B. O. Thirteen years of building constraint-based in silico models of Escherichia coli. J. Bacteriol. 185, 2692–2699 (2003)

  17. 17

    Edwards, J. S., Ibarra, R. U. & Palsson, B. O. In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nature Biotechnol. 19, 125–130 (2001)

  18. 18

    Price, N. D., Papin, J. A., Schilling, C. H. & Palsson, B. O. Genome-scale microbial in silico models: the constraints-based approach. Trends. Biotechnol. 21, 162–169 (2003)

  19. 19

    Segre, D., Vitkup, D. & Church, G. M. Analysis of optimality in natural and perturbed metabolic networks. Proc. Natl Acad. Sci. USA 99, 15112–15117 (2002)

  20. 20

    Burgard, A. P. & Maranas, C. D. Probing the performance limits of the Escherichia coli metabolic network subject to gene additions or deletions. Biotechnol. Bioeng. 74, 364–375 (2001)

  21. 21

    Burgard, A. P., Vaidyaraman, S. & Maranas, C. D. Minimal reaction sets for Escherichia coli metabolism under different growth requirements and uptake environments. Biotechnol. Prog. 17, 791–797 (2001)

  22. 22

    Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. & Barabasi, A. L. The large-scale organization of metabolic networks. Nature 407, 651–654 (2000)

  23. 23

    Shen-Orr, S. S., Milo, R., Mangan, S. & Alon, U. Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genet. 31, 64–68 (2002)

  24. 24

    Gutierrez-Rios, R. M. et al. Regulatory network of Escherichia coli: consistency between literature knowledge and microarray profiles. Genome Res. 13, 2435–2443 (2003)

  25. 25

    Bar-Joseph, Z. et al. Computational discovery of gene modules and regulatory networks. Nature Biotechnol. 21, 1337–1342 (2003)

  26. 26

    Covert, M. W., Schilling, C. H. & Palsson, B. Regulation of gene expression in flux balance models of metabolism. J. Theor. Biol. 213, 73–88 (2001)

  27. 27

    Blattner, F. R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1474 (1997)

  28. 28

    Datsenko, K. A. & Wanner, B. L. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc. Natl Acad. Sci. USA 97, 6640–6645 (2000)

  29. 29

    Li, C. & Wong, W. H. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl Acad. Sci. USA 98, 31–36 (2001)

  30. 30

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995)

Download references


We thank K. Stadsklev and A. Fleming for assistance with computation; Z. Zhang and A. Raghunathan for experimental assistance; the Perna and Blattner laboratories for access to the high-throughput phenotyping data in the ASAP database; and the NIH for funding and support. M.W.C. and B.O.P. designed the project and were involved in all phases of the study; E.M.K. carried out experiments; J.L.R. reconstructed the model, ran simulations and did the phenotyping analysis; M.J.H. did the statistical analysis of the gene expression data.

Author information

Correspondence to Bernhard O. Palsson.

Ethics declarations

Competing interests

UCSD has licensed patent applications to a spin-off company, Genomatica, that may relate to the present paper. UCSD and some of the authors hold shares in Genomatica.

Supplementary information

Suppplementary Notes

Text describing the contents of all the supplementary Excel files in more detail, together with a case-by-case study of inconsistent environments and strains from Figure 1 in the main text, and a completed MIAME checklist. (DOC 118 kb)

Supplementary Data 1

Regulatory Model Rules (iMC1010v1). A list of the genes accounted for by the model, together with the regulatory rules, if any. (XLS 126 kb)

Supplementary Data 2

Simulation Parameters. A detailed list of all parameters used to run the simulations described in the manuscript. (XLS 47 kb)

Supplementary Data 3

Regulatory Model Abbreviations. Abbreviations used in the model to represent metabolites or metabolic reactions. (XLS 81 kb)

Supplementary Data 4

Phenotype-Model Comparison. A more detailed version of the phenotype model comparison shown in Figure 1 of the main text. (XLS 220 kb)

Supplementary Data 5

Phenotype sensitivity analysis. Sensitivity analysis of the phenotype cutoff parameter. (XLS 19 kb)

Supplementary Data 6

Anaerobic-aerobic culture data. Growth, substrate uptake and by-product secretion of wild-type and 6 knockout E. coli strains under aerobic and anaerobic conditions. (XLS 18 kb)

Supplementary Data 7

Normalized Array Data. A table with all the dChip-normalized array data from our experiments. (XLS 9871 kb)

Supplementary Data 8

Detailed hypothesis list. A detailed list of the regulatory interaction hypotheses generated by this study. Includes new regulatory rules implemented in iMC1010v2. (XLS 54 kb)

Supplementary Data 9

qPCR cross-validation. Results of qPCR validation of various changes in gene expression from the microarray data set. (XLS 17 kb)

Rights and permissions

Reprints and Permissions

About this article

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.