Abstract
Progress in systems biology is seriously hindered by slow production of suitable software infrastructures. Biologists need infrastructure that easily connects to work that is done in other laboratories, for which standardization is helpful. However, the infrastructure must also accommodate the specifics of their biological system, but appropriate mechanisms to support variation are currently lacking. We argue that a minimal computer language, and a software tool called a generator, can be used to quickly produce customized software infrastructures that 'systems biologists really want to have'.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Brazma, A., Krestyaninova, M. & Sarkans, U. Standards for systems biology. Nature Rev. Genet. 7, 593–605 (2006).
Stein, L. Creating a bioinformatics nation. Nature 417, 119–120 (2002).
Abiola, O. et al. The nature and identification of quantitative trait loci: a community's view. Nature Rev. Genet. 4, 911–916 (2003).
Jansen, R. C. & Nap, J. P. Genetical genomics: the added value from segregation. Trends Genet. 17, 388–391 (2001).
Bystrykh, L. et al. Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics'. Nature Genet. 37, 225–232 (2005).
Chesler, E. J. et al. Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nature Genet. 37, 233–242 (2005).
Alberts, R. et al. Combining microarrays and genetic analysis. Brief. Bioinformatics 6, 135–145 (2005).
Keurentjes, J. J. et al. The genetics of plant metabolism. Nature Genet. 38, 842–849 (2006).
Li, Y. et al. Mapping determinants of gene expression plasticity by genetical genomics in C. elegans. PLoS Genet. (2006).
Ravichandran, V. & Sriram, R. D. Toward data standards for proteomics. Nature Biotech. 23, 373–376 (2005).
Brazma, A. et al. Minimum information about a microarray experiment (MIAME) — toward standards for microarray data. Nature Genet. 29, 365–371 (2001).
Spellman, P. T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, RESEARCH0046 (2002).
Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Etzold, T., Ulyanov, A. & Argos, P. SRS: Information retrieval system for molecular biology data banks. Meth. Enzymol. 266, 114–128 (1996).
Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European molecular biology open software suite. Trends Genet. 16, 276–277 (2000).
Ihaka, R. & Gentleman, R. C. R: A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 399–414 (1996).
Wang, X. S., Gorlitsky, R. & Almeida, J. S. From XML to RDF: how semantic web technologies will change the design of 'omic' standards. Nature Biotech. 23, 1099–1103 (2005).
Foster, I. Service-oriented science. Science 308, 814–817 (2005).
Saal, L. H. et al. BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol. 3, SOFTWARE0003 (2002).
Stein, L. D. et al. The Generic Genome Browser: a building block for a model organism system database. Genome Res. 12, 1599–1610 (2002).
Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Ameur, A., Yankovski, V., Enroth, S., Spjuth, O. & Komorowski, J. The LCB Data Warehouse. Bioinformatics 22, 1024–1026 (2006).
Matthews, K. A., Kaufman, T. C. & Gelbart, W. M. Research resources for Drosophila: the expanding universe. Nature Rev. Genet. 6, 179–193 (2005).
Alberts, R., Terpstra, P., Bystrykh, L. V., de Haan, G. & Jansen, R. C. A statistical multiprobe model for analyzing cis and trans genes in genetical genomics experiments with short-oligonucleotide arrays. Genetics 171, 1437–1439 (2005).
Washietl, S., Hofacker, I. L. & Stadler, P. F. Fast and reliable prediction of noncoding RNAs. Proc. Natl Acad. Sci. USA 102, 2454–2459 (2005).
Cassman, M. Barriers to progress in systems biology. Nature 438, 1079 (2005).
Mattes, W. B., Pettit, S. D., Sansone, S. A., Bushel, P. R. & Waters, M. D. Database development in toxicogenomics: Issues and efforts. Environ. Health Perspect. 112, 495–505 (2004).
Zimmermann, P. et al. MIAME/Plant — adding value to plant microarrray experiments. Plant Methods 2, 1 (2006).
Xirasagar, S. et al. CEBS object model for systems biology data, SysBio-OM. Bioinformatics 20, 2004–2015 (2004).
Jones, A., Hunt, E., Wastling, J. M., Pizarro, A. & Stoeckert, C. J. An object model and database for functional genomics. Bioinformatics 20, 1583–1590 (2004).
Fogh, R. H. et al. A framework for scientific data modeling and automated software development. Bioinformatics 21, 1678–1684 (2005).
Quackenbush, J. Top-down standards will not serve systems biology. Nature 440, 24 (2006).
Baxter, S. M., Day, S. W., Fetrow, J. S. & Reisinger, S. J. Scientific software development is not an oxymoron. PLoS Comput. Biol. 2, e87 (2006).
Hunt, A. & Thomas, D. The Pragmatic Programmer: From Journeyman To Master (Addison–Wesley, Boston, 1999).
Tseng, M. M. & Jiao, J. in Handbook of Industrial Engineering, Technology and Operation Management (John Wiley & Sons, New York, 2001).
Czarnecki, K. & Eisenecker, U. W. Generative Programming: Methods, Techniques, and Applications (Addison–Wesley, Boston, 2000).
Clements, P. & Northrop, L. Salion, Inc.: A Software Product Line Case Study. Technical Report Carnegie Mellon CMU/SEI-2002-TR-038 [online], (2002).
Clements, P. & Northrop, L. Software Product Lines: Practices and Patterns (Adisson–Wesley, Boston, 2001).
Weiss, D. M. & Lai, C. T. R. Software Product-Line Engineering: A Family Based Software Development Process (Addison–Wesley, Boston, 1999).
Brownsword, L. & Clements, P. A Case Study In Successful Product Line Development. Technical Report Carnegie Mellon CMU/SEI-96-TR-016 [online], (1996).
Nadkarni, P. M. et al. Managing attribute–value clinical trials data using the ACT/DB client–server database system. J. Am. Med. Inform. Assoc. 5, 139–151 (1998).
Eker, J. et al. Taming heterogeneity – The ptolemy approach. Proc. IEEE Comput. Syst. Bioinform. Conf. 91, 127–143 (2003).
Fall, A. & Fall, J. A domain-specific language for models of landscape dynamics. Ecol. Modell. 141, 1–18 (2001).
Jaring, M., Krikhaar, R. L. & Bosch, J. Representing variability in a family of MRI scanners. Softw. Pract. Exper. 34, 69–100 (2004).
Covitz, P. A. et al. caCORE: A common infrastructure for cancer informatics. Bioinformatics 19, 2404–2412 (2003).
Swertz, M. A. et al. Molecular Genetics Information System (MOLGENIS): alternatives in developing local experimental genomics databases. Bioinformatics 20, 2075–2083 (2004).
Wilkinson, M. D. & Links, M. BioMOBY: an open source biological web services proposal. Brief. Bioinform. 3, 331–341 (2002).
Oinn, T. et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20, 3045–3054 (2004).
Letondal, C. A web interface generator for molecular biology programs in Unix. Bioinformatics 17, 73–82 (2001).
Shah, S. P. et al. Pegasys: software for executing and integrating analyses of biological sequences. BMC Bioinformatics 5, 40 (2004).
Garcia, C. A., Thoraval, S., Garcia, L. J. & Ragan, M. A. Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator. BMC Bioinformatics 6, 87 (2005).
Tang, F. et al. Wildfire: distributed, Grid-enabled workflow construction and execution. BMC Bioinformatics 6, 69 (2005).
Garwood, K. L. et al. Pedro: a configurable data entry tool for XML. Bioinformatics 20, 2463–2465 (2004).
Sarkans, U. et al. The ArrayExpress gene expression database: a software engineering and implementation perspective. Bioinformatics 21, 1495–1501 (2005).
Goesmann, A. et al. Building a BRIDGE for the integration of heterogeneous data from functional genomics into a platform for systems biology. J. Biotechnol. 106, 157–167 (2003).
Tobias, J. et al. The CAP cancer protocols — a case study of caCORE based data standards implementation to integrate with the Cancer Biomedical Informatics Grid. BMC Med. Inform. Decis. Mak. 6, 25 (2006).
Kuipers, O. P. et al. Transcriptome analysis and related databases of Lactococcus lactis. Antonie Van Leeuwenhoek 82, 113–122 (2002).
Wilkinson, M., Schoof, H., Ernst, R. & Haase, D. BioMOBY successfully integrates distributed heterogeneous bioinformatics web services. The PlaNet exemplar case. Plant Physiol. 138, 5–17 (2005).
Stevens, R. D. et al. Exploring Williams–Beuren syndrome using myGrid. Bioinformatics 2, I303–I310 (2004).
Lampel, J. & Mintzberg, H. Customizing customization. Sloan Manage. Rev. 38, 21 (1996).
Ulrich, K. The role of product architecture in the manufacturing firm. Res. Policy 24, 419–440 (1995).
Bass, L., Clements, P. & Kazman, R. Software Architecture in Practice (Addison–Wesley, Boston, (2003).
Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P. & Stal, M. Pattern-Oriented Software Architecture: A System of Patterns (John Wiley & Sons, New York, 1996).
Fowler, M. Patterns of Enterprise Application Architecture (Addison–Wesley, Boston, 2002).
Batory, D., Cardone, R. & Smaragdakis, Y. Proceedings of the 1st Software Product-line Conference (Kluwer Academic, 2006).
van Deursen, A. & Klint, P. Little languages: little maintenance? J. Softw. Maint. Evol. 10, 75–92 (1998).
Van Ommering, R. Building product populations with software components. Proc. 24th Conf. on Software Engineering 255–265 (ACM, New York, 2002).
Krueger, C. Eliminating the adoption barrier. IEEE Softw. 19, 29–31 (2002).
Acknowledgements
The authors would like to thank the reviewers for their valuable suggestions, and R. W. Williams, F. C. P. Holstege, J. P. Nap, E. O. de Brock, and R. Breitling for their comments on an earlier version of this article. Furthermore, we would like to thank R. A. Scheltema, B. M. Tesson and D. I. Matthijssen for assistance with the development of the Showcase. This work was supported by grants from The Netherlands Organization for Scientific Research, Program Earth and Life Sciences and The Netherlands Ministry of Economic Affairs, Programs Biopartner and Biorange.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Related links
Related links
FURTHER INFORMATION
Generic Model Organism Database
Groningen Bioinformatics Centre
Microarray Gene Expression Data Society
Glossary
- Analysis workflow
-
The transformation of raw data into biological evidence by applying algorithms, tools and services in a certain order.
- Code generator
-
A code generator translates a domain-specific language into a general language (such as Java), which is then translated (by Java) into a separate program for execution later.
- Design pattern
-
A general, repeatable solution to a commonly occurring problem. It is a description or template for how to solve the problem.
- Domain-specific language
-
A minimal language to describe features for a certain domain in a compact and easy way.
- Genetical genomics
-
A strategy to map genetic determinants that underlie variations in transcript, protein or metabolite abundance that are observed in genetically different individuals.
- Interpreter
-
An interpreter translates domain-specific language directly into machine code for execution.
- Module
-
A unit of functionality that has a clear interface so it can be easily assembled and (re)used interchangeably. Good modules hide implementation details so that a change inside one module does not require changes in other modules.
- Software architecture
-
Software components, the externally visible properties of those components and the relationships among them.
Rights and permissions
About this article
Cite this article
Swertz, M., Jansen, R. Beyond standardization: dynamic software infrastructures for systems biology. Nat Rev Genet 8, 235–243 (2007). https://doi.org/10.1038/nrg2048
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrg2048
This article is cited by
-
Causal effect between gut microbiota and pancreatic cancer: a two-sample Mendelian randomization study
BMC Cancer (2023)
-
RD-Connect, NeurOmics and EURenOmics: collaborative European initiative for rare diseases
European Journal of Human Genetics (2018)
-
GeNN: a code generation framework for accelerated brain simulations
Scientific Reports (2016)
-
OntoCAT -- simple ontology search and integration in Java, R and REST/JavaScript
BMC Bioinformatics (2011)
-
Comparative analysis of the human hepatic and adipose tissue transcriptomes during LPS-induced inflammation leads to the identification of differential biological pathways and candidate biomarkers
BMC Medical Genomics (2011)