Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Standards for systems biology

Key Points

  • Several standards are contributing to advancement of knowledge in biology, however most standardization initiatives are still in the investment stage for biologists.

  • Developing a complete and self-contained standard in biology involves four steps: conceptual model design, model formalization, development of a data exchange format and implementation of the supporting tools.

  • In life sciences, standards development typically is done by grass roots movements, and it is difficult to persuade funding agencies to fund such activities.

  • Although it might be faster for a single organization to develop its own standards, a bottom-up community consensus approach is key to the long-term acceptance and usefulness of standards.

  • Developing and deploying a standard creates an overhead, which can be expensive. Standards related to a particular technology have a life span that is no longer than the technology itself and there is only a limited period of time in which the overheads can be paid off.

  • The body of biological knowledge is incomplete and expanding rapidly; therefore, standards that describe biological knowledge have to be flexible, and a mechanism of change must be a part of the standard.

  • To avoid proliferation of standards, common features of existing standards should be re-used wherever possible. Simplicity, but not oversimplification, is the key to success.

Abstract

High-throughput technologies are generating large amounts of complex data that have to be stored in databases, communicated to various data analysis tools and interpreted by scientists. Data representation and communication standards are needed to implement these steps efficiently. Here we give a classification of various standards related to systems biology and discuss various aspects of standardization in life sciences in general. Why are some standards more successful than others, what are the prerequisites for a standard to succeed and what are the possible pitfalls?

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Development of components of data exchange standards: the MGED example.
Figure 2: Proteomics standards.

Similar content being viewed by others

References

  1. Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000). GO has been a true success story: it has been taken up by the entire scientific community as the main means for annotation of gene products.

    Article  CAS  Google Scholar 

  2. Brazma, A. et al. Minimum Information About a Microarray Experiment (MIAME) — toward standards for microarray data. Nature Genet. 29, 365–371 (2001). The first result of the microarray data standardization effort was a community agreement about the level of detail necessary to make data exchange meaningful (MIAME). MIAME set a pace for such standards (Minimum Information About XYZ) in other domains.

    Article  CAS  Google Scholar 

  3. Hucka, M. et al. The Systems Biology Markup Language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531 (2003). SBML has been evolving since the early 2000s through the efforts of an international group of software developers and users. Today, SBML is supported by over 90 software systems.

    Article  CAS  Google Scholar 

  4. Lloyd, C. M., Halstead M. D. & Nielsen P. F. CellML: its future, present and past. Prog. Biophys. Mol. Biol. 85, 433–450 (2004).

    Article  CAS  Google Scholar 

  5. Quackenbush, J. Data standards for 'omic' science. Nature Biotechnol. 22, 613–614 (2004).

    Article  CAS  Google Scholar 

  6. Stoeckert, C. J. Jr, Causton, H. C. & Ball, C. A. Microarray databases: standards and ontologies. Nature Genet. 32, S469–S473 (2002).

    Article  Google Scholar 

  7. Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).

    Article  Google Scholar 

  8. Brazma, A. On the importance of standardisation in life sciences. Bioinformatics 17, 113–114 (2001).

    Article  CAS  Google Scholar 

  9. Brazma, A., Robinson, A., Cameron, G. & Ashburner, M. One-stop shop for microarray data. Commentary. Nature 403, 699–700 (2000).

    Article  CAS  Google Scholar 

  10. Spellman, P. A status report on MAGE. Bioinformatics 21, 3459–3460 (2005).

    Article  CAS  Google Scholar 

  11. Whetzel, P. L. et al. The MGED Ontology; a resource for semantics-based description of microarray experiments. Bioinformatics 22, 866–873 (2006).

    Article  CAS  Google Scholar 

  12. Eyre, T. A. et al. The HUGO Gene Nomenclature Database, updates. Nucleic Acids Res. 34, D319–D321 (2006).

    Article  CAS  Google Scholar 

  13. Schlitt, T. & Brazma A. Modelling gene networks at different organisational levels. FEBS Lett. 579, 1859–1866 (2005).

    Article  CAS  Google Scholar 

  14. Schlitt, T. & Brazma A. Modelling in molecular biology: describing transcription regulatory networks. Philos. Trans. R. Soc. B 361, 483–494 (2006).

    Article  CAS  Google Scholar 

  15. Bard, J., Rhee, S.Y. & Ashburner, M. An ontology for cell types. Genome Biol. 6, R21 (2005).

    Article  Google Scholar 

  16. Kelso, J. et al. eVOC: a controlled vocabulary for unifying gene expression data. Genome Res. 13, 1222–1230 (2003).

    Article  CAS  Google Scholar 

  17. Bard, J. B. & Rhee, S.Y. Ontologies in biology: design, applications and future challenges. Nature Rev. Genet. 5, 213–222 (2004).

    Article  CAS  Google Scholar 

  18. Hermjakob, H. et al. The HUPO PSI's molecular interaction format — a community standard for the representation of protein interaction data. Nature Biotechnol. 22, 177–183 (2004). The PSI aims to define community standards for data representation in proteomics to facilitate data comparison, exchange and verification. The data exchange format for protein–protein interactions PSI-MI was designed by a group of people including representatives from database providers and users in both academia and industry, and is supported by the DIP, MINT, IntAct, BIND and HPRD databases.

    Article  CAS  Google Scholar 

  19. Luciano, J. S. PAX of mind for pathway researchers. Drug Discov. Today. 10, 937–942 (2005).

    Article  CAS  Google Scholar 

  20. Joshi-Tope, G. et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33, D428–D432 (2005).

    Article  CAS  Google Scholar 

  21. Tyson, J. J. Modeling the cell division cycle: cdc2 and cyclin interactions. Proc. Natl Acad. Sci. USA 88, 7328–7332 (1991).

    Article  CAS  Google Scholar 

  22. Huang, C. Y. & Ferrell, J. E. Jr. Ultrasensitivity in the mitogen-activated protein kinase cascade. Proc. Natl Acad. Sci. USA 93, 10078–10083 (1996).

    Article  CAS  Google Scholar 

  23. Stromback, L. & Lambrix, P. Representations of molecular pathways: an evaluation of SBML, PSI MI and BioPAX. Bioinformatics. 21, 4401–4407 (2005).

    Article  CAS  Google Scholar 

  24. Le Novere, N. et al. Minimum information requested in the annotation of biochemical models (MIRIAM). Nature Biotechnol. 23, 1509–1515 (2005).

    Article  CAS  Google Scholar 

  25. Le Novere, N. et al. BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res. 34, D689–D691 (2006).

    Article  CAS  Google Scholar 

  26. Ball, C. A. et al. Submission of microarray data to public repositories. PLoS Biol. e317 (2004).

  27. Stoeckert, C. J., Quackenbush, J., Brazma, A. & Ball, C. A. Minimum information about a functional genomics experiment: the state of microarray standards and their extension to other technologies. Drug Discov. Today 3, 159–164 (2004).

    Article  CAS  Google Scholar 

  28. Brazma, A. et al. ArrayExpress — a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31, 68–71 (2003).

    Article  CAS  Google Scholar 

  29. Barrett, T. et al. NCBI GEO: mining millions of expression profiles — database and tools. Nucleic Acids Res. 33, D562–D566 (2005).

    Article  CAS  Google Scholar 

  30. Gollub, J. et al. The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res. 31, 94–96 (2003).

    Article  CAS  Google Scholar 

  31. Sarkans, U. et al. The ArrayExpress gene expression database: a software engineering and implementation perspective. Bioinformatics, 21, 1495–1501 (2005).

    Article  CAS  Google Scholar 

  32. Orchard, S., Hermjakob, H., Taylor, C., Aebersold, R. & Apweiler, R. Human Proteome Organisation Proteomics Standards Initiative. Pre-Congress Initiative. Proteomics 5, 4651–4652 (2005).

    Article  CAS  Google Scholar 

  33. Orchard, S. et al. Common interchange standards for proteomics data: public availability of tools and schema. Proteomics 4, 490–491 (2004).

    Article  CAS  Google Scholar 

  34. Taylor, C. F. et al. A systematic approach to modeling, capturing, and disseminating proteomics experimental data. Nature Biotechnol. 21, 247–254 (2003).

    Article  CAS  Google Scholar 

  35. Jenkins, H. et al. A proposed framework for the description of plant metabolomics experiments and their results. Nature Biotechnol. 22, 1601–1606 (2004).

    Article  CAS  Google Scholar 

  36. Fogh, R. et al. The CCPN project: an interim report on a data model for the NMR community. Nature Struct. Biol. 9, 416–418 (2002).

    Article  CAS  Google Scholar 

  37. Lindon, J. C. et al. Standard Metabolic Reporting Structures working group. Summary recommendations for standardization and reporting of metabolic analyses. Nature Biotechnol. 23, 833–838 (2005). The SMRS group aims to supply an open, community-driven specification for the reporting of metabonomic/metabolomic experiments and a standard file transfer format for the data. Participants in the SMRS include leaders in the fields of metabonomics and metabolomics from both industry and academia.

    Article  CAS  Google Scholar 

  38. Goldberg, I. G. et al. The Open Microscopy Environment (OME) data model and XML file: open tools for informatics and quantitative analysis in biological imaging. Genome Biol. 6, R47 (2005).

    Article  Google Scholar 

  39. Jones, A., Hunt, E., Wastling, J. M., Pizarro, A. & Stoeckert, C. J. Jr. An object model and database for functional genomics. Bioinformatics 20, 1583–1590 (2004).

    Article  CAS  Google Scholar 

  40. Xirasagar, S. et al. CEBS object model for systems biology data, SysBio-OM. Bioinformatics 20, 2004–2015 (2004).

    Article  CAS  Google Scholar 

  41. Rendl, M., Lewis, L. & Fuchs, E. Molecular dissection of mesenchymal–epithelial interactions in the hair follicle. PLoS Biol. 3, e331 (2005).

    Article  Google Scholar 

  42. Cassman, M. Barriers to progress in systems biology. Nature 438, 1079 (2005).

    Article  CAS  Google Scholar 

  43. Quackenbush, J. et al. Top-down standards will not serve systems biology. Nature 440, 24 (2006).

    Article  CAS  Google Scholar 

  44. Raychaudhuri, S., Chang, J. T., Sutphin, P. D. & Altman, R. B. Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Res. 12, 203–214 (2002).

    Article  CAS  Google Scholar 

  45. [Editorial] Microarray standards at last. Nature 419, 323 (2002).

  46. Dolin, R. H. et al. HL7 clinical document architecture, Release 2. J. Am. Med. Inform. Assoc. 13, 30–39 (2006).

    Article  Google Scholar 

  47. Carr, S. et al. Working Group on Publication Guidelines for Peptide and Protein Identification Data. The need for guidelines in publication of peptide and protein identification data. Mol. Cell. Proteomics 3, 531–533 (2004).

    Article  CAS  Google Scholar 

  48. Jones, A., Wastling, J. & Hunt, E. Proposal for a standard representation of two-dimensional gel electrophoresis data. Comp. Funct. Genomics 5, 492–501 (2003).

    Article  Google Scholar 

  49. Pedrioli, P. G. et al. A common open representation of mass spectrometry data and its application to proteomics research. Nature Biotechnol. 22, 1459–1466 (2004).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We would like to thank M. Ashburner, C. Brooksbank, H. Hermjakob and N. Le Novere for reading the manuscript and providing valuable comments. The work on this survey was partly funded by the MolPAGE grant from the European Commission and a grant from the US National Human Genome Research Institute and National Institute of Biomedical Imaging and Bioengineering.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alvis Brazma.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

FURTHER INFORMATION

ArrayExpress

BioModels Database

BioPAX

CellML

Extensible Markup Language

Gene Expression Omnibus

Gene Ontology

Human Genome Organisation

Human Proteome Organisation

Life Sciences Identifier

MIAME and MAGE-OM Terms Explained web site

MIAME

Microarray and Gene Expression

Microarray Gene Expression Data Society web site

MISFISHIE Standard Working Group web page

MolPAGE

mzData

Open Biological Ontology

Proteomics Standards Initiative — Molecular Interactions

Stanford Microarray Database

Summary Report — W3C Workshop on Semantic Web for Life Sciences

Systems Biology Markup Language

The Law of Standard

UniProtKB

W3C Semantic Web Health Care and Life Sciences Interest Group web site

Glossary

Domain

A field of study.

Conceptual model

In information engineering, a model that is meant to facilitate human communication; it does not need to be absolutely precise, as opposed to a formal model that has strictly defined semantics.

Data exchange format

A file or message format that is formally defined so that software can be built that 'knows' where to find various pieces of information.

Ontology

A model that describes a domain and can be used to reason about objects and relationships between them.

Tool

In softwa re engineering, a program or set of programs that enables a certain task(s).

Directed acyclic graph

A graph consisting of nodes and edges, where edges have direction (that is, can be traversed only one way), and it is not possible to find a set of edges that form a closed loop.

Diagram

A visual representation of concepts and relationships, used in information engineering to facilitate human communication.

Semantics

The meaning of something; in computer science, it is usually used in opposition to syntax (that is, format).

Reporting requirements

An agreed set of information items that needs to be provided for meaningful information communication (reporting).

Metabolomics

The study of metabolite profiles in individual cells and cell types.

Metabonomics

The study of systemic response to the pathophysiological stimuli and regulation of function in the whole organism through analysis of biofluids and tissues.

Visual language

In computer science and computer engineering, an agreed set of conventions for drawing diagrams that formally describe a model or a program.

Class

A concept used in ontology engineering and model building for referring to a set of objects with similar properties.

Graph

A visual representation of information in the form of edges (lines) and nodes (connection points). In biology, graphs can be represented as boxes (nodes) and lines between boxes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brazma, A., Krestyaninova, M. & Sarkans, U. Standards for systems biology. Nat Rev Genet 7, 593–605 (2006). https://doi.org/10.1038/nrg1922

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg1922

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing