Nature Biotechnology
22, 1218 - 1219 (2004)
doi:10.1038/nbt1004-1218
Two-dimensional annotation of genomesBernhard PalssonBernhard Palsson is in the Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive #0412, La Jolla, California 92093-0412, USA. He serves on the scientific advisory board of Genomatica, Inc. palsson@ucsd.edu. Most sequence annotation efforts have been focused on information about chromosomal location, open reading frames, putative function and associated regulatory sequences. Here, I argue that adopting an extended description of all of the chemical interactions caused by the gene products of a genome in the form of a two-dimensional matrix will be key to the analysis, interpretation and prediction of the genotype-phenotype relation-ships.
Principles of systems analysis The ability to generate detailed lists of biological components, determine their interactions and generate genome-wide data sets has driven the need for large-scale systems analysis in cell and molecular biology. Such systems analysis is often viewed as comprising four principal steps:
- Enumerate as complete a list as possible of the biochemical components that participate in the process of interest.
- Study the interactions between these components, and define the 'wiring diagrams'a process referred to as network reconstruction1,
2.
- Describe the properties of the reconstructed network mathematically, allowing computer representations of such networks.
- 'In silico' models can then be interrogated to analyze, interpret and predict the biological functions that can arise from reconstructed networks3.
Model predictions generate specific hypotheses that can then be experimentally tested. These in silico models of reconstructed networks, in turn, are improved in an iterative fashion4,
5. This process has recently been implemented at the genome-scale for microbial organisms6.
With regard to these four steps required for systems analysis, there has been much creative work leading to development of numerous high-throughput ('omic') technologies that are increasingly able to provide lists of components (step 1). Similarly, many different mathematical methods have been formulated for the analysis of biochemical reaction networks (step 3), and the 'phenotypic space,' explored by experimentation (step 4), is essentially infinite.
In contrast, the network reconstruction effort (step 2) should fundamentally lead to a unique result. Interactions between biological components are chemical reactions and associations. Thus, in principle, the reconstruction process should culminate in the identification of a finite set of chemical transformations that take place in a cell and underlie its function.
Moving up a dimension Because it is a unique description of all of the interactions of a genome and its products, a network reconstruction can be thought of as a two-dimensional annotation of a genome (see Fig. 1). The classic component annotation of a genome leads to the identification of open reading frames, their location, assignment of putative function and sometimes the corresponding DNA regulatory sequences. The two-dimensional, systemic annotation, however, accounts not only for the components but also for each of their chemical states (represented as rows in the matrix in Fig. 1). The links between these components (represented as columns in the matrix) can be characterized by the stoichiometric coefficients corresponding to the chemical transformations that take place. Additional information about these chemical transformations (e.g., kinetics, thermodynamics or cellular location) can be associated with each column. When encompassing the full genome-scale stoichiometric matrix, this represents what is essentially a two-dimensional annotation of a genome. Once represented in a computer, it forms the basis for a chemically and genetically consistent database structure that can be used to analyze large data sets, and it also forms the basis for any subsequent mathematical representation and computer simulation. This matrix thus forms a common denominator among analysis methods aimed at studying system properties of a genome's wiring diagram.
 | |  | A call for the formulation of this matrix is perhaps as bold as asking 15 years ago for the full base-pair sequence of the human genome. But like the incipient human genome project in 1990, such a call at this time is not unrealistic, as demonstrated by notable, albeit preliminary, progress toward completion of the matrix. Genome-scale metabolic networks have been reconstructed for several microorganisms3,
7,
8,
9,
10. These systemic, two-dimensional annotations contain up to 1,000 reactions and several hundred compounds.
The process of defining signaling networks and transcriptional regulatory networks has also begun11,
12,
13. The possible states of signaling molecules are now being made available through Molecular Web Pages (http://www.signaling-gateway.org/) that show all the chemical transformations in which a signaling molecule participates14. Transcriptional regulatory networks in microorganisms may have 200 or 300 transcription factors and on the order of 1,000 defined links between components. Although events in such networks are often known chemically, sometimes only causal relationships have been determined15. Ultimately, such causal relationships will have to be converted into chemical equations once the underlying chemical mechanisms are defined to complete the network reconstruction and the associated matrix. A growing number of genome-scale matrices are being formulated. I expect that work over the coming 5 to 10 years will determine how completely we can specify such matrices for model organisms.
Making modeling manageable Two-dimensional genome annotations enable unbiased delineations of functioning networks within cells. Metabolites are members of signaling and transcriptional regulatory networks, and enzymes are members of metabolic networks. Thus, our conceptual division of cellular functions into metabolism, regulation and signaling is biased, according to our respective education and research fields. In a living cell, however, these networks and other cellular functions16,
17 are actually components of a genome-scaleor cell-scalereaction network. Although not all the links in such large-scale networks are currently known, their size seems to be manageable by modern database methods. Defining and studying network functional states may, on the other hand, constitute a much greater mathematical and computational challenge.
Toward meeting this challenge, a hierarchical approach in our conceptualization of the two-dimensional annotation will prove useful. We are quite used to thinking about DNA in a hierarchical fashion. We think about a base pair as the irreducible unit of DNA sequence. At progressively higher levels, we think about codons, introns, exons, alleles, chromosomes, genomes and other measures of DNA size. We will need to adopt a similar hierarchical thinking about the genome-scale stoichiometric matrix. The irreducible elements in a biochemical reaction network are the elementary chemical reactions. These can combine into enzymatic reaction mechanisms, many reactions can combine into 'modules' or 'motifs,' pathways can form, sectors can be defined and so on.
Our understanding of how to hierarchically decompose a network is rudimentary at present, but is likely to improve as we begin to build genome-scale networks and are able to define their properties. Mathematical methods to 'block' diagonalize the stoichiometric matrix would represent one way of decomposing a large-scale network in a 'top-down' fashion. A number of 'bottom-up' methods also exist. Components that always function together in steady or dynamic states would naturally form modules. Correlated subsets of reactions do appear in the delineation of steady-state properties of networks18,
19. Time-scale separation, used for temporal decomposition of complex systems, can be used to define the formation of dynamic 'pools,' which are groupings of compound concentrations that move the same way on given time scales20,
21. These approaches are examples of how to coarse-grain a network in an unbiased fashion, and they will be important in helping us elucidate the relationship between network components and network functional states, and thus establish a mechanistic interpretation of the genotype-phenotype relationship.
Genome-scale analysis in cell and molecular biology is now upon us. Such analysis has thus far been dominated by component delineation and is now progressing toward studying functional states of networks that represent observable phenotypic states. A key and unifying step in this process is a comprehensive reconstruction of the links that interrelate biological components in the organism. All such interactions are ultimately represented by a genome-scale stoichiometric matrixa two-dimensional genome annotation. Such a matrix represents a common denominator that we should now strive to define for model organisms like Escherichia coli and yeast, and ultimately for differentiated human cells.
REFERENCES
- Goryanin, I., Hodgman, T.C. & Selkov, E. Bioinformatics 15, 749−758 (1999). | Article | PubMed | ISI | ChemPort |
- Covert, M.W. et al. Trends Biochem. Sci. 26, 179−186 (2001). | Article | PubMed | ISI | ChemPort |
- Price, N.D., Papin, J.A., Schilling, C.H. & Palsson, B. Trends Biotechnol. 21, 162−169 (2003). | Article | PubMed | ISI | ChemPort |
- Ideker, T. et al. Science 292, 929−934 (2001). | Article | PubMed | ISI | ChemPort |
- Palsson, B.O. The challenges of in silico biology. Nat. Biotechnol. 18, 1147−1150 (2000). | Article | PubMed | ISI | ChemPort |
- Covert, M.W., Knight, E.M., Reed, J.L., Herrgard, M.J. & Palsson, B.O. Nature 429, 92−96 (2004). | Article | PubMed | ISI | ChemPort |
- Edwards, J.S. & Palsson, B.O. J. Biol. Chem. 274, 17410−17416 (1999). | Article | PubMed | ISI | ChemPort |
- Forster, J., Famili, I., Fu, P.C., Palsson, B.O. & Nielsen, J. Genome Res. 13, 244−253 (2003). | Article | PubMed | ISI | ChemPort |
- Reed, J.L. & Palsson, B.O. J. Bacteriol. 185, 2692−2699 (2003). | Article | PubMed | ISI | ChemPort |
- Lovley, D.R. Nat. Rev. Microbiol. 1, 35−44 (2003). | Article | PubMed | ISI | ChemPort |
- Gilman, A.G. et al. Nature 420, 703−706 (2002). | Article | PubMed | ISI | ChemPort |
- Papin, J.A. & Subramaniam, S. Curr. Opin. Biotechnol. 15, 78−81 (2004). | Article | PubMed | ISI | ChemPort |
- Herrgard, M.J., Covert, M.W. & Palsson, B.O. Curr. Opin. Biotechnol. 15, 70−77 (2004). | Article | PubMed | ISI | ChemPort |
- Li, J. et al. Nature 420, 716−717 (2002). | Article | PubMed | ISI | ChemPort |
- Ideker, T. & Lauffenburger, D. Trends Biotechnol. 21, 255−262 (2003). | Article | PubMed | ISI | ChemPort |
- Fussenegger, M., Bailey, J.E. & Varner, J. Nat. Biotechnol. 18, 768−774 (2000). | Article | PubMed | ISI | ChemPort |
- Novak, B. et al. Biophys Chem. 72, 185−200 (1998). | Article | PubMed | ISI | ChemPort |
- Schuster, S., Fell, D.A. & Dandekar, T. Nat. Biotechnol. 18, 326−332 (2000). | Article | PubMed | ISI | ChemPort |
- Papin, J.A., Price, N.D. & Palsson, B.O. Genome Res. 12, 1889−900 (2002). | Article | PubMed | ISI | ChemPort |
- Reich, J.G. & Sel'kov, E.E. Energy Metabolism of the Cell: A Theoretical Treatise (Academic Press, London, 1981).
- Palsson, B.O., Joshi, A. & Ozturk, S.S. Fed. Proc. 46, 2485−2489 (1987). | PubMed | ISI | ChemPort |
|