Key Points
-
Many 'omics' data sets are becoming available for various model organisms that can be used to describe many aspects of the cell for a given time and/or condition. They can be broadly classified as components data, which describe the specific molecular contents of the cell; interactions data, which detail the connectivity between cellular components; or functional-states data, which reveal the overall behaviour, or phenotype, of the cell or system in response to genetic and/or environmental perturbations.
-
Even though each of these genome-scale data types can be powerful on their own, researchers are gaining valuable additional insights into cellular phenomena through the integration of 'omics' data sets.
-
The computational tools that have been developed for integrating 'omics' data generally tackle three specific tasks: first, identifying the network scaffold by delineating the connections that exist between cellular components; second, decomposing the network scaffold into its constituent parts in an attempt to understand the overall network structure; and third, developing cellular or system models to simulate and predict the network behaviour that gives rise to particular cellular phenotypes.
-
In addition to the development of methods, many researchers are using 'omics' integration to drive studies that are aimed at delineating systems-wide behaviour. For example, many efforts have been devoted to using genome-scale data integration to completely map the cellular pathways that are responsible for the observed cellular responses to environmental perturbations or developmental events. In some cases, these studies have also led to the development of biomarkers, or patterns of cellular-component expression that are associated with medical disorders, such as various cancers.
-
Researchers are also using omics integration to address fundamental evolutionary questions that were previously beyond the scope and scale of standard techniques. Specifically, omics data-integration techniques have been used to examine cellular differences that are associated with speciation, and other studies have used them to study selective pressures that are likely to have arisen due to cellular-network structure.
-
The integration of omics data has primarily affected basic research efforts so far. Increasingly, however, this strategy is taking on significant roles in clinically relevant areas, as shown by its stimulation of the fields of toxicogenomics and nutrigenomics, which are applying genome-scale technologies and integrative analyses to problems in toxicology and nutrition, respectively. Even though many challenges related to data quality and accessibility remain, researchers continue to work towards meeting the ultimate future goals of employing these strategies to drug-development applications and in personalized medicine.
Abstract
Various technologies can be used to produce genome-scale, or 'omics', data sets that provide systems-level measurements for virtually all types of cellular components in a model organism. These data yield unprecedented views of the cellular inner workings. However, this abundance of information also presents many hurdles, the main one being the extraction of discernable biological meaning from multiple omics data sets. Nevertheless, researchers are rising to the challenge by using omics data integration to address fundamental biological questions that would increase our understanding of systems as a whole.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Fleischmann, R. D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995).
Ehrenman, G. Mining what others miss: highlighting the subtleties in 1012 bytes of data, technology tries to clear up its own complex mess. Mechanical Engineering-CIME 127, 26 (2005).
Hays, C. L. What Wal-Mart Knows About Customers' Habits. New York Times (14 Nov 2004).
Hand, D. J., Blunt, G., Kelly, M. G. & Adams, N. M. Data mining for fun and profit. Stat. Sci. 15, 111–131 (2000).
Kluger, Y., Yu, H., Qian, J. & Gerstein, M. Relationship between gene co-expression and probe localization on microarray slides. BMC Genomics 4, 49 (2003).
Quackenbush, J. Data standards for 'omic' science. Nature Biotechnol. 22, 613–614 (2004). A short, incisive report that introduces some of the problems that the omics sciences face with regards to data quality and representation standards.
Bader, G. D. & Hogue, C. W. Analyzing yeast protein–protein interaction data obtained from different sources. Nature Biotechnol. 20, 991–997 (2002).
Ge, H., Walhout, A. J. & Vidal, M. Integrating 'omic' information: a bridge between genomics and systems biology. Trends Genet. 19, 551–560 (2003).
Liolios, K., Tavernarakis, N., Hugenholtz, P. & Kyrpides, N. C. The genomes on line database (GOLD) v.2: a monitor of genome projects worldwide. Nucleic Acids Res. 34, D332–D334 (2006).
Kellis, M., Patterson, N., Endrizzi, M., Birren, B. & Lander, E. S. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254 (2003).
Chimpanzee Sequencing And Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005).
Delsuc, F., Brinkmann, H. & Philippe, H. Phylogenomics and the reconstruction of the tree of life. Nature Rev. Genet. 6, 361–375 (2005).
Tompa, M. et al. Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnol. 23, 137–144 (2005).
Brasch, M. A., Hartley, J. L. & Vidal, M. ORFeome cloning and systems biology: standardized mass production of the parts from the parts-list. Genome Res. 14, 2001–2009 (2004).
Hardiman, G. Microarray platforms — comparisons and contrasts. Pharmacogenomics 5, 487–502 (2004).
Harbers, M. & Carninci, P. Tag-based approaches for transcriptome research and genome annotation. Nature Methods 2, 495–502 (2005).
Li, L. & Akashi, K. Unraveling the molecular components and genetic blueprints of stem cells. Biotechniques 35, 1233–1239 (2003).
Rhodes, D. R. & Chinnaiyan, A. M. Integrative analysis of the cancer transcriptome. Nature Genet. 37, S31–S37 (2005).
Jenner, R. G. & Young, R. A. Insights into host responses against pathogens from transcriptional profiling. Nature Rev. Microbiol. 3, 281–294 (2005).
Mata, J., Marguerat, S. & Bahler, J. Post-transcriptional control of gene expression: a genome-wide perspective. Trends Biochem. Sci. 30, 506–514 (2005).
Patterson, S. D. & Aebersold, R. H. Proteomics: the first decade and beyond. Nature Genet. 33 (Suppl.), 311–323 (2003).
Ghaemmaghami, S. et al. Global analysis of protein expression in yeast. Nature 425, 737–741 (2003).
Yates, J. R. 3rd, Gilchrist, A., Howell, K. E. & Bergeron, J. J. Proteomics of organelles and large cellular structures. Nature Rev. Mol. Cell Biol. 6, 702–714 (2005).
Kuster, B., Schirle, M., Mallick, P. & Aebersold, R. Scoring proteomes with proteotypic peptide probes. Nature Rev. Mol. Cell Biol. 6, 577–583 (2005).
Griffin, J. L. & Bollard, M. E. Metabonomics: its potential as a tool in toxicology for safety assessment and data integration. Curr. Drug Metab. 5, 389–398 (2004).
Nielsen, J. & Oliver, S. The next wave in metabolome analysis. Trends Biotechnol. 23, 544–546 (2005).
Dunn, W. B., Bailey, N. J. & Johnson, H. E. Measuring the metabolome: current analytical technologies. Analyst 130, 606–625 (2005).
Fridman, E. & Pichersky, E. Metabolomics, genomics, proteomics, and the identification of enzymes and their substrates and products. Curr. Opin. Plant Biol. 8, 242–248 (2005).
Markuszewski, M. J., Szczykowska, M., Siluk, D. & Kaliszan, R. Human red blood cells targeted metabolome analysis of glycolysis cycle metabolites by capillary electrophoresis using an indirect photometric detection method. J. Pharm. Biomed. Anal. 39, 636–642 (2005).
Wu, L. et al. Quantitative analysis of the microbial metabolome by isotope dilution mass spectrometry using uniformly 13C-labeled cell extracts as internal standards. Anal. Biochem. 336, 164–171 (2005).
Memelink, J. Tailoring the plant metabolome without a loose stitch. Trends Plant Sci. 10, 305–307 (2005).
Robertson, D. G. Metabonomics in toxicology: a review. Toxicol. Sci. 85, 809–822 (2005).
Gibney, M. J. et al. Metabolomics in human nutrition: opportunities and challenges. Am. J. Clin. Nutr. 82, 497–503 (2005).
Arita, M., Robert, M. & Tomita, M. All systems go: launching cell simulation fueled by integrated experimental biology data. Curr. Opin. Biotechnol. 16, 344–349 (2005).
Huh, W. K. et al. Global analysis of protein localization in budding yeast. Nature 425, 686–691 (2003).
Dupuy, D. et al. A first version of the Caenorhabditis elegans promoterome. Genome Res. 14, 2169–2175 (2004).
Guda, C. & Subramaniam, S. pTARGET: a new method for predicting protein subcellular localization in eukaryotes. Bioinformatics 21, 3963–3969 (2005).
Coulton, G. Are histochemistry and cytochemistry 'Omics'? J. Mol. Histol. 35, 603–613 (2004).
Wenk, M. R. The emerging field of lipidomics. Nature Rev. Drug Discov. 4, 594–610 (2005).
Shriver, Z., Raguram, S. & Sasisekharan, R. Glycomics: a pathway to a class of new and improved therapeutics. Nature Rev. Drug Discov. 3, 863–873 (2004).
Levine, M. & Davidson, E. H. Gene regulatory networks for development. Proc. Natl Acad. Sci. USA 102, 4936–4942 (2005).
Mockler, T. C. et al. Applications of DNA tiling arrays for whole-genome analysis. Genomics 85, 1–15 (2005).
Buck, M. J. & Lieb, J. D. ChIP–chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics 83, 349–360 (2004).
Herring, C. D. et al. Immobilization of Escherichia coli RNA polymerase and location of binding sites by use of chromatin immunoprecipitation and microarrays. J. Bacteriol. 187, 6166–6174 (2005).
Pokholok, D. K., Hannett, N. M. & Young, R. A. Exchange of RNA polymerase II initiation and elongation factors during gene expression in vivo. Mol. Cell 9, 799–809 (2002).
Kim, T. H. et al. A high-resolution map of active promoters in the human genome. Nature 436, 876–880 (2005).
Harbison, C. T. et al. Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004).
Li, Z. et al. A global transcriptional regulatory role for c-Myc in Burkitt's lymphoma cells. Proc. Natl Acad. Sci. USA 100, 8164–8169 (2003).
Martone, R. et al. Distribution of NF-κB-binding sites across human chromosome 22. Proc. Natl Acad. Sci. USA 100, 12247–12252 (2003).
Cawley, S. et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116, 499–509 (2004).
Zhang, X. et al. Genome-wide analysis of cAMP-response element binding protein occupancy, phosphorylation, and target gene activation in human tissues. Proc. Natl Acad. Sci. USA 102, 4459–4464 (2005).
Pokholok, D. K. et al. Genome-wide map of nucleosome acetylation and methylation in yeast. Cell 122, 517–527 (2005).
Cusick, M., Klitgord, N., Vidal, M. & Hill, D. E. Interactome: gateway into systems biology. Hum. Mol. Genet. 14, R171–R181 (2005).
Fields, S. High-throughput two-hybrid analysis. The promise and the peril. FEBS J. 272, 5391–5399 (2005).
Ben-Hur, A. & Noble, W. S. Kernel methods for predicting protein–protein interactions. Bioinformatics 21 (Suppl. 1), i38–i46 (2005).
Pazos, F., Ranea, J. A., Juan, D. & Sternberg, M. J. Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome. J. Mol. Biol. 352, 1002–1015 (2005).
Droit, A., Poirier, G. G. & Hunter, J. M. Experimental and bioinformatic approaches for interrogating protein–protein interactions to determine protein function. J. Mol. Endocrinol. 34, 263–280 (2005).
Butland, G. et al. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature 433, 531–537 (2005).
Rain, J. C. et al. The protein–protein interaction map of Helicobacter pylori. Nature 409, 211–215 (2001).
Lacount, D. J. et al. A protein interaction network of the malaria parasite Plasmodium falciparum. Nature 438, 103–107 (2005).
Ito, T. et al. Roles for the two-hybrid system in exploration of the yeast protein interactome. Mol. Cell Proteomics 1, 561–566 (2002).
Formstecher, E. et al. Protein interaction mapping: a Drosophila case study. Genome Res. 15, 376–384 (2005).
Li, S. et al. A map of the interactome network of the metazoan C. elegans. Science 303, 540–543 (2004).
Stelzl, U. et al. A human protein–protein interaction network: a resource for annotating the proteome. Cell 122, 957–968 (2005).
Scholtens, D., Vidal, M. & Gentleman, R. Local modeling of global interactome networks. Bioinformatics 21, 3548–3557 (2005).
Hahn, M. W., Conant, G. C. & Wagner, A. Molecular evolution in large genetic networks: does connectivity equal constraint? J. Mol. Evol. 58, 203–211 (2004).
Sprinzak, E., Sattath, S. & Margalit, H. How reliable are experimental protein–protein interaction data? J. Mol. Biol. 327, 919–923 (2003).
Roehrl, M. H., Wang, J. Y. & Wagner, G. A general framework for development and data analysis of competitive high-throughput screens for small-molecule inhibitors of protein–protein interactions by fluorescence polarization. Biochemistry 43, 16056–16066 (2004).
Bochner, B. R. New technologies to assess genotype–phenotype relationships. Nature Rev. Genet. 4, 309–314 (2003).
Bredel, M. & Jacoby, E. Chemogenomics: an emerging strategy for rapid target and drug discovery. Nature Rev. Genet. 5, 262–275 (2004).
Dykxhoorn, D. M. & Lieberman, J. The silent revolution: RNA interference as basic biology, research tool, and therapeutic. Annu. Rev. Med. 56, 401–423 (2005).
Tong, A. H. et al. Global mapping of the yeast genetic interaction network. Science 303, 808–813 (2004).
Sauer, U. High-throughput phenomics: experimental methods for mapping fluxomes. Curr. Opin. Biotechnol. 15, 58–63 (2004).
Li, H. & Wang, W. Dissecting the transcription networks of a cell using computational genomics. Curr. Opin. Genet. Dev. 13, 611–616 (2003).
Wang, W. et al. Inference of combinatorial regulation in yeast transcriptional networks: a case study of sporulation. Proc. Natl Acad. Sci. USA 102, 1998–2003 (2005).
Bar-Joseph, Z. et al. Computational discovery of gene modules and regulatory networks. Nature Biotechnol. 21, 1337–1342 (2003). Introduces the GRAM algorithm that can be used to identify gene modules or groups of co-expressed genes that share a common transcriptional regulator. This approach is useful for inferring transcriptional-regulatory networks from omics data sets.
Gat-Viks, I., Tanay, A. & Shamir, R. Modeling and analysis of heterogeneous regulation in biological networks. J. Comput. Biol. 11, 1034–1049 (2004).
Yeang, C. H. et al. Validation and refinement of gene-regulatory pathways on a network of physical interactions. Genome Biol. 6, R62 (2005).
Jansen, R. et al. A Bayesian networks approach for predicting protein–protein interactions from genomic data. Science 302, 449–453 (2003).
Rhodes, D. R. et al. Probabilistic model of the human protein–protein interaction network. Nature Biotechnol. 23, 951–959 (2005). This study illustrates the use of a Bayesian classification strategy to predict the structure of molecular networks — orthologous protein–protein interactions, transcriptomics and genomics data were integrated to develop a Bayesian model that predicts 40,000 human protein–protein interactions.
Yeger-Lotem, E. et al. Network motifs in integrated cellular networks of transcription-regulation and protein–protein interaction. Proc. Natl Acad. Sci. USA 101, 5934–5939 (2004). This work presents a methodology to decompose cellular networks into their constituent basic building blocks, or network motifs. Although the technique can be applied to networks of any type, this study focuses on the analysis of a S. cerevisiae network derived from genome-scale protein–protein- and protein–DNA-interaction data sets.
Yeger-Lotem, E. & Margalit, H. Detection of regulatory circuits by integrating the cellular networks of protein–protein interactions and transcription regulation. Nucleic Acids Res. 31, 6053–6061 (2003).
Zhang, L. V. et al. Motifs, themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network. J. Biol. 4, 6 (2005).
Luscombe, N. M. et al. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431, 308–312 (2004).
Han, J. D. et al. Evidence for dynamically organized modularity in the yeast protein–protein interaction network. Nature 430, 88–93 (2004).
Tanay, A., Sharan, R., Kupiec, M. & Shamir, R. Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proc. Natl Acad. Sci. USA 101, 2981–2986 (2004).
Ideker, T., Ozier, O., Schwikowski, B. & Siegel, A. F. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18 (Suppl. 1), S233–S240 (2002).
Wong, S. L. et al. Combining biological networks to predict genetic interactions. Proc. Natl Acad. Sci. USA 101, 15682–15687 (2004).
Kelley, R. & Ideker, T. Systematic interpretation of genetic interactions using protein networks. Nature Biotechnol. 23, 561–566 (2005).
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. & Hattori, M. The KEGG resource for deciphering the genome. Nucleic Acids Res. 32, D277–D280 (2004).
Price, N. D., Reed, J. L. & Palsson, B. O. Genome-scale models of microbial cells: evaluating the consequences of constraints. Nature Rev. Microbiol. 2, 886–897 (2004). This review discusses the COBRA approach to modelling genome-scale molecular networks by integrating genome-scale data sets with a specific emphasis on the many recent analytical methods that are associated with these models for studying characteristics and capabilities of microorganisms.
Reed, J. L., Famili, I., Thiele, I. & Palsson, B. O. Towards multidimensional genome annotation. Nature Rev. Genet. 7, 130–141 (2006).
Palsson, B. Two-dimensional annotation of genomes. Nature Biotechnol. 22, 1218–1219 (2004).
Patil, K. R., Akesson, M. & Nielsen, J. Use of genome-scale microbial models for metabolic engineering. Curr. Opin. Biotechnol. 15, 64–69 (2004).
Patil, K. R. & Nielsen, J. Uncovering transcriptional regulation of metabolism by using metabolic network topology. Proc. Natl Acad. Sci. USA 102, 2685–2689 (2005).
Covert, M. W., Knight, E. M., Reed, J. L., Herrgard, M. J. & Palsson, B. O. Integrating high-throughput and computational data elucidates bacterial networks. Nature 429, 92–96 (2004).
Papin, J. A. & Palsson, B. O. The JAK–STAT signaling network in the human B-cell: an extreme signaling pathway analysis. Biophys. J. 87, 37–46 (2004).
Longabaugh, W. J., Davidson, E. H. & Bolouri, H. Computational representation of developmental genetic regulatory networks. Dev. Biol. 283, 1–16 (2005). The reconstruction and modelling of developmental gene-regulatory networks is detailed by integrating various data types using the BioTapestry modelling software.
Saghatelian, A. & Cravatt, B. F. Global strategies to integrate the proteome and metabolome. Curr. Opin. Chem. Biol. 9, 62–68 (2005).
Begley, T. J., Rosenbach, A. S., Ideker, T. & Samson, L. D. Hot spots for modulating toxicity identified by genomic phenotyping and localization mapping. Mol. Cell 16, 117–125 (2004).
Lee, W. et al. Genome-wide requirements for resistance to functionally distinct DNA-damaging agents. PLoS Genet. 1, e24 (2005).
Haugen, A. C. et al. Integrating phenotypic and expression profiles to map arsenic-response networks. Genome Biol. 5, R95 (2004).
Kim, J. K. et al. Functional genomic analysis of RNA interference in C. elegans. Science 308, 1164–1167 (2005).
Tewari, M. et al. Systematic interactome mapping and genetic perturbation analysis of a C. elegans TGF-β signaling network. Mol. Cell 13, 469–482 (2004).
Boulton, S. J. et al. Combined functional genomic maps of the C. elegans DNA damage response. Science 295, 127–131 (2002).
Gunsalus, K. C. et al. Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis. Nature 436, 861–865 (2005). This study integrated transcriptomics, protein–protein interactions and RNAi-based phenomics to map the molecular network topology of genes associated with early embryogenesis in C. elegans . The resulting structure is used to infer potential network organizational and functional properties such as interacting molecular complexes and cellular-process crosstalk.
Oksman-Caldentey, K. M. & Saito, K. Integrating genomics and metabolomics for engineering plant metabolic pathways. Curr. Opin. Biotechnol. 16, 174–179 (2005).
Kristensen, C. et al. Metabolic engineering of dhurrin in transgenic Arabidopsis plants with marginal inadvertent effects on the metabolome and transcriptome. Proc. Natl Acad. Sci. USA 102, 1779–1784 (2005). This study used omics data integration to diagnose unexpected impacts of genomic manipulations on the phenotype of the organism. Metabolomic and transcriptomic data were integrated to assess the systems-wide impact of introducing exogenous high-flux pathways to A. thaliana.
Hirai, M. Y. et al. Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in Arabidopsis thaliana. Proc. Natl Acad. Sci. USA 101, 10205–10210 (2004).
Ippolito, J. E. et al. An integrated functional genomics and metabolomics approach for defining poor prognosis in human neuroendocrine cancers. Proc. Natl Acad. Sci. USA 102, 9901–9906 (2005). The utility of integrating omics data to identify biomarkers is shown in this work, which integrated transcriptomics and metabolomics data to determine a molecular signature that is associated with poor-prognosis human neuroendocrine cancers.
Yan, W. et al. System-based proteomic analysis of the interferon response in human liver cells. Genome Biol. 5, R54 (2004).
Enard, W. et al. Intra- and interspecific variation in primate gene expression patterns. Science 296, 340–343 (2002).
Khaitovich, P. et al. Parallel patterns of evolution in the genomes and transcriptomes of humans and chimpanzees. Science 309, 1850–1854 (2005).
Khaitovich, P. et al. Regional patterns of gene expression in human and chimpanzee brains. Genome Res. 14, 1462–1473 (2004).
Ihmels, J. et al. Rewiring of the yeast transcriptional network through the evolution of motif usage. Science 309, 938–940 (2005). Genomics and transcriptomics data are integrated to identify a cis -regulatory element associated with the evolutionary emergence of rapid anaerobic growth capacity in certain yeast species. This study highlights the potential of integrating omics data sets to address fundamental evolutionary questions.
Tanay, A., Regev, A. & Shamir, R. Conservation and evolvability in regulatory networks: the evolution of ribosomal regulation in yeast. Proc. Natl Acad. Sci. USA 102, 7203–7208 (2005).
Shields, R. MIAME, we have a problem. Trends Genet. 22, 65–66 (2006).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). One of the most widely used and broadly accessible software packages designed to facilitate omics data integration and analysis, known as Cytoscape, is detailed in this report.
Hucka, M. et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531 (2003).
Novere, N. L. et al. Minimum information requested in the annotation of biochemical models (MIRIAM). Nature Biotechnol. 23, 1509–1515 (2005).
Stierum, R., Heijne, W., Kienhuis, A., van Ommen, B. & Groten, J. Toxicogenomics concepts and applications to study hepatic effects of food additives and chemicals. Toxicol. Appl. Pharmacol. 207, 179–188 (2005).
Corthesy-Theulaz, I. et al. Nutrigenomics: the impact of biomics technology on nutrition research. Ann. Nutr. Metab. 49, 355–365 (2005).
Desiere, F. Towards a systems biology understanding of human health: interplay between genotype, environment and nutrition. Biotechnol. Annu. Rev. 10, 51–84 (2004).
Roche, H. M., Phillips, C. & Gibney, M. J. The metabolic syndrome: the crossroads of diet and genetics. Proc. Nutr. Soc. 64, 371–377 (2005).
Ibrahim, S. M. & Gold, R. Genomics, proteomics, metabolomics: what is in a word for multiple sclerosis? Curr. Opin. Neurol. 18, 231–235 (2005).
Khalil, I. G. & Hill, C. Systems biology for cancer. Curr. Opin. Oncol. 17, 44–48 (2005).
Nikolsky, Y., Nikolskaya, T. & Bugrim, A. Biological networks and analysis of experimental data in drug discovery. Drug Discov. Today 10, 653–662 (2005).
Billings, P. R. et al. Ready for genomic medicine? Perspectives of health care decision makers. Arch. Intern. Med. 165, 1917–1919 (2005).
Deeds, E. J., Ashenberg, O. & Shakhnovich, E. I. A simple physical model for scaling in protein–protein interaction networks. Proc. Natl Acad. Sci. USA 103, 311–316 (2006).
Author information
Authors and Affiliations
Ethics declarations
Competing interests
Bernhard Ø. Palsson serves on the scientific advisory board of Genomatica Inc.
Related links
Glossary
- Terabyte
-
A unit of computer-information-storage capacity that is equal to one trillion bytes or one thousand gigabytes.
- Data mining
-
An analytical discipline that is focused on finding unsuspected relationships and summarizing often large observational data sets in new ways that are both understandable and useful to the data owner.
- Omics data set
-
A generic term that describes the genome-scale data sets that are emerging from high-throughput technologies. Examples include whole-genome sequencing data (genomics) and microarray-based genome-wide expression profiles (transcriptomics).
- Serial analysis of gene expression
-
(SAGE). An experimental technique for transcriptome analysis through the massive sequential analysis of short cDNA sequence tags. The cDNA tags are derived from cellular or tissue mRNA for which the corresponding genes can be identified, and the total count of cDNA tags for each gene represents an accurate measurement of its expression level.
- Mass spectrometry
-
An analytical technique that identifies biochemical molecules (such as proteins, metabolites or fatty acids) on the basis of their mass and charge.
- Vibrational spectroscopy
-
An analytical technique that can be used to investigate the composition of biological samples by the characteristic frequencies at which chemical bonds vibrate.
- Metabolic engineering
-
An applied discipline that is devoted to the targeted improvement in cellular properties or metabolite production by experimental manipulation of specific metabolic or signal-transduction pathways.
- In silico prediction
-
A general term that refers to a computational prediction that usually results from the analysis of a mathematical or computational model.
- Histocytomics
-
A developing field that is scaling up the traditional techniques of histochemistry and cytochemistry, such that many cellular species can be identified and localized in a cell or tissue sample in a high-throughput manner.
- Tiling array
-
A high-density microarray that contains evenly spaced, or 'tiled', sets of probes that span the genome or chromosome, and can be used in many experimental applications such as transcriptome characterization, gene discovery, alternative-splicing analysis, ChIP–chip, DNA-methylation analysis, DNA-polymorphism analysis, comparative genome analysis and genome resequencing.
- ChIP–chip
-
A high-throughput experimental technique that combines chromatin immunoprecipitation (ChIP) and microarray technology (chip) that directly identifies protein–DNA interactions.
- Power-law distribution
-
Networks that exhibit a power-law distribution, also known as scale-free networks, are non-uniform, with most nodes having very few links, whereas a few so-called hub nodes have a very large number of links. Notably, many biological networks follow a power-law distribution as does the internet, for example.
- Network scaffold
-
Refers to the structure of a network that specifies the components of the network and the interactions between them, and represents the end product of the network-reconstruction process.
- Network module
-
A portion of a biological network that is composed of multiple molecular entities (such as genes, proteins or metabolites) that work together as a distinct unit within the cell, for example, in response to certain stimuli or as part of a developmental or differentiation programme.
- Bayesian model
-
A probabilistic model that generally specifies the likelihood of an observation occurring, on the basis of the presence of various characteristics that are known or assumed to be associated with the observation according to prior information.
- Synthetic lethal
-
This term refers to the lethal or significantly impaired phenotype that results from mutations in two non-essential genes that, individually, result in viability. Such an interaction possibly indicates their activity within the same essential pathway or parallel non-essential pathways.
- Bipartite graph
-
A set of graph vertices that is partitioned into two distinct sets such that no two graph vertices within the same set are adjacent. For example, one set can represent genes, and the other set can represent characteristics that describe the function(s) of those genes.
- Simulated annealing-based search algorithm
-
A global optimization technique that traverses a search space by testing random mutations on an individual solution, keeping all better solutions, and accepting worse solutions probabilistically on the basis of the difference in solutions and a decreasing temperature parameter.
- Training set
-
A collection of data that has known characteristics and is used to develop a predictive model in data-mining and machine-learning applications (for example, in Bayesian-model approaches). The characteristics learned from the training set are used to make subsequent predictions about new data.
- Log-odds scoring scheme
-
A statistical procedure that is designed to assess the significance of an observation by calculating a quantity that considers the observed frequency relative to the expected frequency, if the observation was random.
- Constraint-based reconstruction and analysis
-
(COBRA). A genome-scale modelling approach that involves: first, the reconstruction of biochemical reaction networks; then, applying constraints to the network; and finally, analysing the characteristics and capabilities of the network using various computational techniques.
- Network reconstruction
-
The process of integrating different data sources to create a representation of the chemical events that underlie a biochemical reaction network.
- Governing constraints
-
Biochemical networks and cellular systems are constrained by natural law. These governing constraints include physico-chemical constraints (such as enzyme turnover), topobiological constraints (such as cellular crowding), environmental constraints (such as nutrient availability) and regulatory constraints (such as gene repression in response to external signals).
- Omics data integration
-
The simultaneous analysis of high-throughput genome-scale data that is aimed at developing models of biological systems to assess their properties and behaviour.
- Biomarker
-
A distinctive biochemical indicator that is associated with a biological process or event (for example, the presence of a protein, or set of proteins, that are characteristic of cancerous cells).
- Metabolic syndrome
-
An increasingly common, complex and multi-factorial disorder that is characterized by glucose intolerance, abdominal obesity, hypertension and abnormal cholesterol levels that increases an individual's risk of developing coronary heart disease and type 2 diabetes.
- Personalized genomic medicine
-
The idea that genome-scale technologies will allow clinicians to apply treatment regimens that are tailored specifically to an individual patient on the basis of their genetic makeup and associated predispositions.
- Gödel's incompleteness theorem
-
A prominent result from mathematical logic that basically states that for any formal theory in which basic arithmetical facts (or axioms) are provable, it is possible to construct an arithmetical statement that is true but neither provable nor refutable within the theory. Therefore, despite having all axioms available, certain truths may not be provable or readily apparent.
Rights and permissions
About this article
Cite this article
Joyce, A., Palsson, B. The model organism as a system: integrating 'omics' data sets. Nat Rev Mol Cell Biol 7, 198–210 (2006). https://doi.org/10.1038/nrm1857
Issue Date:
DOI: https://doi.org/10.1038/nrm1857
This article is cited by
-
Harnessing large language models (LLMs) for candidate gene prioritization and selection
Journal of Translational Medicine (2023)
-
High-resolution temporal profiling of E. coli transcriptional response
Nature Communications (2023)
-
Diversifying the concept of model organisms in the age of -omics
Communications Biology (2023)
-
A scoping review on deep learning for next-generation RNA-Seq. data analysis
Functional & Integrative Genomics (2023)
-
Applying genomics in regulatory toxicology: a report of the ECETOC workshop on omics threshold on non-adversity
Archives of Toxicology (2023)