Beyond standardization: dynamic software infrastructures for systems biology

Article metrics

Abstract

Progress in systems biology is seriously hindered by slow production of suitable software infrastructures. Biologists need infrastructure that easily connects to work that is done in other laboratories, for which standardization is helpful. However, the infrastructure must also accommodate the specifics of their biological system, but appropriate mechanisms to support variation are currently lacking. We argue that a minimal computer language, and a software tool called a generator, can be used to quickly produce customized software infrastructures that 'systems biologists really want to have'.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Software infrastructure for systems biology.
Figure 2: Cost effectiveness of development strategies.

References

  1. 1

    Brazma, A., Krestyaninova, M. & Sarkans, U. Standards for systems biology. Nature Rev. Genet. 7, 593–605 (2006).

  2. 2

    Stein, L. Creating a bioinformatics nation. Nature 417, 119–120 (2002).

  3. 3

    Abiola, O. et al. The nature and identification of quantitative trait loci: a community's view. Nature Rev. Genet. 4, 911–916 (2003).

  4. 4

    Jansen, R. C. & Nap, J. P. Genetical genomics: the added value from segregation. Trends Genet. 17, 388–391 (2001).

  5. 5

    Bystrykh, L. et al. Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics'. Nature Genet. 37, 225–232 (2005).

  6. 6

    Chesler, E. J. et al. Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nature Genet. 37, 233–242 (2005).

  7. 7

    Alberts, R. et al. Combining microarrays and genetic analysis. Brief. Bioinformatics 6, 135–145 (2005).

  8. 8

    Keurentjes, J. J. et al. The genetics of plant metabolism. Nature Genet. 38, 842–849 (2006).

  9. 9

    Li, Y. et al. Mapping determinants of gene expression plasticity by genetical genomics in C. elegans. PLoS Genet. (2006).

  10. 10

    Ravichandran, V. & Sriram, R. D. Toward data standards for proteomics. Nature Biotech. 23, 373–376 (2005).

  11. 11

    Brazma, A. et al. Minimum information about a microarray experiment (MIAME) — toward standards for microarray data. Nature Genet. 29, 365–371 (2001).

  12. 12

    Spellman, P. T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, RESEARCH0046 (2002).

  13. 13

    Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).

  14. 14

    Etzold, T., Ulyanov, A. & Argos, P. SRS: Information retrieval system for molecular biology data banks. Meth. Enzymol. 266, 114–128 (1996).

  15. 15

    Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European molecular biology open software suite. Trends Genet. 16, 276–277 (2000).

  16. 16

    Ihaka, R. & Gentleman, R. C. R: A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 399–414 (1996).

  17. 17

    Wang, X. S., Gorlitsky, R. & Almeida, J. S. From XML to RDF: how semantic web technologies will change the design of 'omic' standards. Nature Biotech. 23, 1099–1103 (2005).

  18. 18

    Foster, I. Service-oriented science. Science 308, 814–817 (2005).

  19. 19

    Saal, L. H. et al. BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol. 3, SOFTWARE0003 (2002).

  20. 20

    Stein, L. D. et al. The Generic Genome Browser: a building block for a model organism system database. Genome Res. 12, 1599–1610 (2002).

  21. 21

    Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

  22. 22

    Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

  23. 23

    Ameur, A., Yankovski, V., Enroth, S., Spjuth, O. & Komorowski, J. The LCB Data Warehouse. Bioinformatics 22, 1024–1026 (2006).

  24. 24

    Matthews, K. A., Kaufman, T. C. & Gelbart, W. M. Research resources for Drosophila: the expanding universe. Nature Rev. Genet. 6, 179–193 (2005).

  25. 25

    Alberts, R., Terpstra, P., Bystrykh, L. V., de Haan, G. & Jansen, R. C. A statistical multiprobe model for analyzing cis and trans genes in genetical genomics experiments with short-oligonucleotide arrays. Genetics 171, 1437–1439 (2005).

  26. 26

    Washietl, S., Hofacker, I. L. & Stadler, P. F. Fast and reliable prediction of noncoding RNAs. Proc. Natl Acad. Sci. USA 102, 2454–2459 (2005).

  27. 27

    Cassman, M. Barriers to progress in systems biology. Nature 438, 1079 (2005).

  28. 28

    Mattes, W. B., Pettit, S. D., Sansone, S. A., Bushel, P. R. & Waters, M. D. Database development in toxicogenomics: Issues and efforts. Environ. Health Perspect. 112, 495–505 (2004).

  29. 29

    Zimmermann, P. et al. MIAME/Plant — adding value to plant microarrray experiments. Plant Methods 2, 1 (2006).

  30. 30

    Xirasagar, S. et al. CEBS object model for systems biology data, SysBio-OM. Bioinformatics 20, 2004–2015 (2004).

  31. 31

    Jones, A., Hunt, E., Wastling, J. M., Pizarro, A. & Stoeckert, C. J. An object model and database for functional genomics. Bioinformatics 20, 1583–1590 (2004).

  32. 32

    Fogh, R. H. et al. A framework for scientific data modeling and automated software development. Bioinformatics 21, 1678–1684 (2005).

  33. 33

    Quackenbush, J. Top-down standards will not serve systems biology. Nature 440, 24 (2006).

  34. 34

    Baxter, S. M., Day, S. W., Fetrow, J. S. & Reisinger, S. J. Scientific software development is not an oxymoron. PLoS Comput. Biol. 2, e87 (2006).

  35. 35

    Hunt, A. & Thomas, D. The Pragmatic Programmer: From Journeyman To Master (Addison–Wesley, Boston, 1999).

  36. 36

    Tseng, M. M. & Jiao, J. in Handbook of Industrial Engineering, Technology and Operation Management (John Wiley & Sons, New York, 2001).

  37. 37

    Czarnecki, K. & Eisenecker, U. W. Generative Programming: Methods, Techniques, and Applications (Addison–Wesley, Boston, 2000).

  38. 38

    Clements, P. & Northrop, L. Salion, Inc.: A Software Product Line Case Study. Technical Report Carnegie Mellon CMU/SEI-2002-TR-038 [online], (2002).

  39. 39

    Clements, P. & Northrop, L. Software Product Lines: Practices and Patterns (Adisson–Wesley, Boston, 2001).

  40. 40

    Weiss, D. M. & Lai, C. T. R. Software Product-Line Engineering: A Family Based Software Development Process (Addison–Wesley, Boston, 1999).

  41. 41

    Brownsword, L. & Clements, P. A Case Study In Successful Product Line Development. Technical Report Carnegie Mellon CMU/SEI-96-TR-016 [online], (1996).

  42. 42

    Nadkarni, P. M. et al. Managing attribute–value clinical trials data using the ACT/DB client–server database system. J. Am. Med. Inform. Assoc. 5, 139–151 (1998).

  43. 43

    Eker, J. et al. Taming heterogeneity – The ptolemy approach. Proc. IEEE Comput. Syst. Bioinform. Conf. 91, 127–143 (2003).

  44. 44

    Fall, A. & Fall, J. A domain-specific language for models of landscape dynamics. Ecol. Modell. 141, 1–18 (2001).

  45. 45

    Jaring, M., Krikhaar, R. L. & Bosch, J. Representing variability in a family of MRI scanners. Softw. Pract. Exper. 34, 69–100 (2004).

  46. 46

    Covitz, P. A. et al. caCORE: A common infrastructure for cancer informatics. Bioinformatics 19, 2404–2412 (2003).

  47. 47

    Swertz, M. A. et al. Molecular Genetics Information System (MOLGENIS): alternatives in developing local experimental genomics databases. Bioinformatics 20, 2075–2083 (2004).

  48. 48

    Wilkinson, M. D. & Links, M. BioMOBY: an open source biological web services proposal. Brief. Bioinform. 3, 331–341 (2002).

  49. 49

    Oinn, T. et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20, 3045–3054 (2004).

  50. 50

    Letondal, C. A web interface generator for molecular biology programs in Unix. Bioinformatics 17, 73–82 (2001).

  51. 51

    Shah, S. P. et al. Pegasys: software for executing and integrating analyses of biological sequences. BMC Bioinformatics 5, 40 (2004).

  52. 52

    Garcia, C. A., Thoraval, S., Garcia, L. J. & Ragan, M. A. Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator. BMC Bioinformatics 6, 87 (2005).

  53. 53

    Tang, F. et al. Wildfire: distributed, Grid-enabled workflow construction and execution. BMC Bioinformatics 6, 69 (2005).

  54. 54

    Garwood, K. L. et al. Pedro: a configurable data entry tool for XML. Bioinformatics 20, 2463–2465 (2004).

  55. 55

    Sarkans, U. et al. The ArrayExpress gene expression database: a software engineering and implementation perspective. Bioinformatics 21, 1495–1501 (2005).

  56. 56

    Goesmann, A. et al. Building a BRIDGE for the integration of heterogeneous data from functional genomics into a platform for systems biology. J. Biotechnol. 106, 157–167 (2003).

  57. 57

    Tobias, J. et al. The CAP cancer protocols — a case study of caCORE based data standards implementation to integrate with the Cancer Biomedical Informatics Grid. BMC Med. Inform. Decis. Mak. 6, 25 (2006).

  58. 58

    Kuipers, O. P. et al. Transcriptome analysis and related databases of Lactococcus lactis. Antonie Van Leeuwenhoek 82, 113–122 (2002).

  59. 59

    Wilkinson, M., Schoof, H., Ernst, R. & Haase, D. BioMOBY successfully integrates distributed heterogeneous bioinformatics web services. The PlaNet exemplar case. Plant Physiol. 138, 5–17 (2005).

  60. 60

    Stevens, R. D. et al. Exploring Williams–Beuren syndrome using myGrid. Bioinformatics 2, I303–I310 (2004).

  61. 61

    Lampel, J. & Mintzberg, H. Customizing customization. Sloan Manage. Rev. 38, 21 (1996).

  62. 62

    Ulrich, K. The role of product architecture in the manufacturing firm. Res. Policy 24, 419–440 (1995).

  63. 63

    Bass, L., Clements, P. & Kazman, R. Software Architecture in Practice (Addison–Wesley, Boston, (2003).

  64. 64

    Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P. & Stal, M. Pattern-Oriented Software Architecture: A System of Patterns (John Wiley & Sons, New York, 1996).

  65. 65

    Fowler, M. Patterns of Enterprise Application Architecture (Addison–Wesley, Boston, 2002).

  66. 66

    Batory, D., Cardone, R. & Smaragdakis, Y. Proceedings of the 1st Software Product-line Conference (Kluwer Academic, 2006).

  67. 67

    van Deursen, A. & Klint, P. Little languages: little maintenance? J. Softw. Maint. Evol. 10, 75–92 (1998).

  68. 68

    Van Ommering, R. Building product populations with software components. Proc. 24th Conf. on Software Engineering 255–265 (ACM, New York, 2002).

  69. 69

    Krueger, C. Eliminating the adoption barrier. IEEE Softw. 19, 29–31 (2002).

Download references

Acknowledgements

The authors would like to thank the reviewers for their valuable suggestions, and R. W. Williams, F. C. P. Holstege, J. P. Nap, E. O. de Brock, and R. Breitling for their comments on an earlier version of this article. Furthermore, we would like to thank R. A. Scheltema, B. M. Tesson and D. I. Matthijssen for assistance with the development of the Showcase. This work was supported by grants from The Netherlands Organization for Scientific Research, Program Earth and Life Sciences and The Netherlands Ministry of Economic Affairs, Programs Biopartner and Biorange.

Author information

Correspondence to Ritsert C. Jansen.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

FURTHER INFORMATION

Apache Tomcat

BioArray Software Environment

Bioconductor

BioMOBY

Bio* toolkits

CaCORE

CCPN

Code Generation Network

Complex Trait Consortium

Cytoscape

Eclipse

EMBOSS

GeneNetwork

Generic Model Organism Database

Groningen Bioinformatics Centre

MIAME

Microarray Gene Expression Data Society

MOLGENIS

MySQL

Online Showcase

PHP

PISE

PostgreSQL

Software Engineering Institute

Taverna

The Apache Velocity Project

The R Project

UCSC Genome Browser

W3C Semantic Web

W3C Web Services Activity

Glossary

Analysis workflow

The transformation of raw data into biological evidence by applying algorithms, tools and services in a certain order.

Code generator

A code generator translates a domain-specific language into a general language (such as Java), which is then translated (by Java) into a separate program for execution later.

Design pattern

A general, repeatable solution to a commonly occurring problem. It is a description or template for how to solve the problem.

Domain-specific language

A minimal language to describe features for a certain domain in a compact and easy way.

Genetical genomics

A strategy to map genetic determinants that underlie variations in transcript, protein or metabolite abundance that are observed in genetically different individuals.

Interpreter

An interpreter translates domain-specific language directly into machine code for execution.

Module

A unit of functionality that has a clear interface so it can be easily assembled and (re)used interchangeably. Good modules hide implementation details so that a change inside one module does not require changes in other modules.

Software architecture

Software components, the externally visible properties of those components and the relationships among them.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Swertz, M., Jansen, R. Beyond standardization: dynamic software infrastructures for systems biology. Nat Rev Genet 8, 235–243 (2007) doi:10.1038/nrg2048

Download citation

Further reading

  • Modeling Crop Genetic Resources Phenotyping Information Systems

    • Christoph U. Germeier
    •  & Stefan Unger

    Frontiers in Plant Science (2019)

  • Systems biology: perspectives on multiscale modeling in research on endocrine-related cancers

    • Robert Clarke
    • , John J Tyson
    • , Ming Tan
    • , William T Baumann
    • , Lu Jin
    • , Jianhua Xuan
    •  & Yue Wang

    Endocrine-Related Cancer (2019)

  • RD-Connect, NeurOmics and EURenOmics: collaborative European initiative for rare diseases

    • Hanns Lochmüller
    • , Dorota M. Badowska
    • , Rachel Thompson
    • , Nine V. Knoers
    • , Annemieke Aartsma-Rus
    • , Ivo Gut
    • , Libby Wood
    • , Tina Harmuth
    • , Andre Durudas
    • , Holm Graessner
    • , Franz Schaefer
    •  & Olaf Riess

    European Journal of Human Genetics (2018)

  • The impact of ICT on systems biology and how to assess it

    • Imme Petersen

    Innovation: The European Journal of Social Science Research (2017)

  • CNV-association meta-analysis in 191,161 European adults reveals new loci associated with anthropometric traits

    • Aurélien Macé
    • , Marcus A. Tuke
    • , Patrick Deelen
    • , Kati Kristiansson
    • , Hannele Mattsson
    • , Margit Nõukas
    • , Yadav Sapkota
    • , Ursula Schick
    • , Eleonora Porcu
    • , Sina Rüeger
    • , Aaron F. McDaid
    • , David Porteous
    • , Thomas W. Winkler
    • , Erika Salvi
    • , Nick Shrine
    • , Xueping Liu
    • , Wei Q. Ang
    • , Weihua Zhang
    • , Mary F. Feitosa
    • , Cristina Venturini
    • , Peter J. van der Most
    • , Anders Rosengren
    • , Andrew R. Wood
    • , Robin N. Beaumont
    • , Samuel E. Jones
    • , Katherine S. Ruth
    • , Hanieh Yaghootkar
    • , Jessica Tyrrell
    • , Aki S. Havulinna
    • , Harmen Boers
    • , Reedik Mägi
    • , Jennifer Kriebel
    • , Martina Müller-Nurasyid
    • , Markus Perola
    • , Markku Nieminen
    • , Marja-Liisa Lokki
    • , Mika Kähönen
    • , Jorma S. Viikari
    • , Frank Geller
    • , Jari Lahti
    • , Aarno Palotie
    • , Päivikki Koponen
    • , Annamari Lundqvist
    • , Harri Rissanen
    • , Erwin P. Bottinger
    • , Saima Afaq
    • , Mary K. Wojczynski
    • , Petra Lenzini
    • , Ilja M. Nolte
    • , Thomas Sparsø
    • , Nicole Schupf
    • , Kaare Christensen
    • , Thomas T. Perls
    • , Anne B. Newman
    • , Thomas Werge
    • , Harold Snieder
    • , Timothy D. Spector
    • , John C. Chambers
    • , Seppo Koskinen
    • , Mads Melbye
    • , Olli T. Raitakari
    • , Terho Lehtimäki
    • , Martin D. Tobin
    • , Louise V. Wain
    • , Juha Sinisalo
    • , Annette Peters
    • , Thomas Meitinger
    • , Nicholas G. Martin
    • , Naomi R. Wray
    • , Grant W. Montgomery
    • , Sarah E. Medland
    • , Morris A. Swertz
    • , Erkki Vartiainen
    • , Katja Borodulin
    • , Satu Männistö
    • , Anna Murray
    • , Murielle Bochud
    • , Sébastien Jacquemont
    • , Fernando Rivadeneira
    • , Thomas F. Hansen
    • , Albertine J. Oldehinkel
    • , Massimo Mangino
    • , Michael A. Province
    • , Panos Deloukas
    • , Jaspal S. Kooner
    • , Rachel M. Freathy
    • , Craig Pennell
    • , Bjarke Feenstra
    • , David P. Strachan
    • , Guillaume Lettre
    • , Joel Hirschhorn
    • , Daniele Cusi
    • , Iris M. Heid
    • , Caroline Hayward
    • , Katrin Männik
    • , Jacques S. Beckmann
    • , Ruth J. F. Loos
    • , Dale R. Nyholt
    • , Andres Metspalu
    • , Johan G. Eriksson
    • , Michael N. Weedon
    • , Veikko Salomaa
    • , Lude Franke
    • , Alexandre Reymond
    • , Timothy M. Frayling
    •  & Zoltán Kutalik

    Nature Communications (2017)