Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges

Key Points

  • A biological cyberinfrastructure facilitates online collaboration, data sharing and algorithm sharing in an increasingly information-driven field.

  • A fully developed cyberinfrastructure system consists of network-accessible data repositories, computational and analysis services, and an online communication and collaboration system.

  • The explosive growth of genomic information is forcing the field to move from traditional centralized databases and static paper publications to distributed information resources and interactive online publications.

  • Integration of diverse resources remains a challenge at the technical level owing to the competing demands of interoperability and the need for flexibility and diversity.

  • The most promising cyberinfrastructure systems combine a flexible, semantically driven framework for sharing information with a strong social and community-building component.

  • The emerging biology cyberinfrastructure has the potential to be a great leveller, giving all researchers equal access to data and compute facilities regardless of their geographical location or data-handling abilities.

Abstract

Wiki pages and commenting Biology is an information-driven science. Large-scale data sets from genomics, physiology, population genetics and imaging are driving research at a dizzying rate. Simultaneously, interdisciplinary collaborations among experimental biologists, theorists, statisticians and computer scientists have become the key to making effective use of these data sets. However, too many biologists have trouble accessing and using these electronic data sets and tools effectively. A 'cyberinfrastructure' is a combination of databases, network protocols and computational services that brings people, information and computational tools together to perform science in this information-driven world. This article reviews the components of a biological cyberinfrastructure, discusses current and pending implementations, and notes the many challenges that lie ahead.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The components of a cyberinfrastructure.
Figure 2: The process of bioinformatics research now and in the future.
Figure 3: The Taverna workflow manager.
Figure 4: A mash-up mock-up.

Similar content being viewed by others

References

  1. caBIG Strategic Planning Workspace. The Cancer Biomedical Informatics Grid (caBIG): infrastructure and applications for a worldwide research community. Medinfo 12, 330–334 (2007).

  2. Martone, M. E., Gupta, A. & Ellisman, M. H. E-neuroscience: challenges and triumphs in integrating distributed data from molecules to brains. Nature Neurosci. 7, 467–472 (2004).

    Article  CAS  PubMed  Google Scholar 

  3. Wheeler, D. L. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 36, D13–D21 (2008).

    Article  CAS  PubMed  Google Scholar 

  4. Flicek, P. et al. Ensembl 2008. Nucleic Acids Res. 36, D707–D714 (2008).

    Article  CAS  PubMed  Google Scholar 

  5. Kanehisa, M. et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480–D484 (2008).

    Article  CAS  PubMed  Google Scholar 

  6. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  PubMed  Google Scholar 

  7. Ilic, K. et al. The plant structure ontology, a unified vocabulary of anatomy and morphology of a flowering plant. Plant Physiol. 143, 587–599 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Fields, S., Song, O. A novel genetic system to detect protein–protein interactions. Nature 340, 245–246 (1989).

    Article  CAS  PubMed  Google Scholar 

  9. Maglott, D., Ostell, J., Pruitt, K. D. & Tatusova, T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 35, D26–D31 (2007).

    Article  CAS  PubMed  Google Scholar 

  10. UniProt Consortium. The universal protein resource (UniProt). Nucleic Acids Res. 36, D190–D195 (2008).

  11. Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. King, D. C. et al. Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res. 15, 1051–1060 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Kent, W. J. BLAT — the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Dowell, R. D., Jokerst, R. M., Day, A., Eddy, S. R. & Stein, L. The distributed annotation system. BMC Bioinformatics 2, 7 (2001). This paper describes an early biological cyberinfrastructure system that uses a common syntactic protocol to exchange data about genome annotations, but it has the problem of weak semantics.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Stevens, R. D., Robinson, A. J. & Goble C. A. myGrid: personalised bioinformatics on the information grid. Bioinformatics 19 (Suppl 1), i302–i304 (2003).

    Article  PubMed  Google Scholar 

  16. Qiao, W., McLennan, M., Kennel, R., Ebert D. S., & Klimeck, G. Hub-based simulation and graphics hardware accelerated visualization for nanotechnology applications. IEEE Trans. Vis. Comput. Graph. 12, 1061–1068 (2006).

    Article  PubMed  Google Scholar 

  17. Stein, L. D. et al. The generic genome browser: a building block for a model organism system database. Genome Res. 12, 1599–1610 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Mungall, C. J., Emmert D. B. & FlyBase Consortium . A Chado case study: an ontology-based modular schema for representing genome-associated biological information. Bioinformatics 23, i337–i346 (2007). This paper describes a cyberinfrastructure approach built on a tightly coupled shared common-data model.

    Article  CAS  PubMed  Google Scholar 

  19. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Noy, N. F. et al. Protégé-2000: an open-source ontology-development and knowledge-acquisition environment. AMIA Annu. Symp. Proc. 2003, 953 (2003).

    PubMed Central  Google Scholar 

  21. Reich, M. et al. GenePattern 2.0. Nature Genet. 38, 500–501 (2006).

    Article  CAS  PubMed  Google Scholar 

  22. Giardine, B. et al. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15, 1451–1455 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Sotomayor, B. & Childers, L. Globus Toolkit 4: Programming Java Services 1st edn (Morgan Kaufmann, San Fransisco, 2005).

    Google Scholar 

  24. Wilkinson, M. D. & Links, M. BioMOBY: an open source biological web services proposal. Brief Bioinform. 3, 331–341 (2002).

    Article  PubMed  Google Scholar 

  25. Wilkinson, M., Schoof, H., Ernst, R. & Haase, D. BioMOBY successfully integrates distributed heterogeneous bioinformatics Web Services. The PlaNet exemplar case. Plant Physiol. 138, 5–17 (2005). This paper describes a large-scale attempt to integrate multiple resources using web services.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000). This is the foundational paper for the Gene Ontology, a system for describing the molecular function of genes in a way that allows gene-based resources to be integrated at the semantic level.

    Article  CAS  PubMed  Google Scholar 

  27. Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Lacy, L. W. Owl: Representing Information Using the Web Ontology Language (Trafford Publishing, Victoria, Canada, 2005).

    Google Scholar 

  29. Oinn, T. et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20, 3045–3054 (2004). This paper describes Taverna, an exemplar platform for integrating bioinformatics workflows across loosely coupled sites and technologies that share common semantics.

    Article  CAS  PubMed  Google Scholar 

  30. Lord, P. et al. Applying semantic web services to bioinformatics: experiences gained, lessons learnt. International Semantic Web Conference 350–364 [online], (2004).

    Google Scholar 

  31. Buck, M. J. & Lieb, J. D. ChIP–chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics 83, 349–360 (2004).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

I wish to thank the staff of myGrid, BIRN, caBIG, iPlant, EcoliHub and nanoHub for their assistance during the research phase of this Review. I would also like to thank the three anonymous reviewers who took the time to review this article in manuscript stage and to make comments and suggestions. This work was supported in part by a grant from the National Science Foundation Division of Emerging Frontiers (0735191).

Author information

Authors and Affiliations

Authors

Ethics declarations

Competing interests

In the interests of full disclosure, the author has been directly or indirectly involved in the following projects discussed in this article: DAS, GMOD, BioMOBY, SSWAP, caBIG and iPC.

Related links

Related links

FURTHER INFORMATION

WIKI features and commenting

Lincoln Stein's homepage

BioMOBY

Biomedical Informatics Research Network (BIRN)

Cancer Bioinformatics Grid (caBIG)

EcoliWiki

Ensembl Genome Browser

EU Framework Programme 7 (FP7)

Generic Model Organism Database (GMOD) project

Globus Toolkit

iPlant Collaborative (iPC)

myExperiment

myGrid

NeuroNames

NCBI Taxonomy

Open Bioinformatics Ontologies (OBI)

Phenotype & Trait Ontology (PATO)

Simple Object Access Protocol (SOAP)

Simple Semantic Web Architecture and Protocol (SSWAP)

SSWAP protocol

UCSC Genome Browser

WikiPathways

Web Services Description Language (WSDL)

Virtual Plant Information Network (VPIN)

Glossary

WIKI

A popular web page authoring system that allows individuals to collaborate on large communal documents. Wikipedia is the best known example, but there are many tens of thousands of WIKIs in use. The name comes from the Hawaiian word for quick.

Ontology

An enumeration of the concepts used in a particular domain of knowledge, their definitions and the relationships between them.

Web service

A web-based resource that can be programmatically invoked to perform a database search or a computation, or to provide some other service.

Web Services Description Language

(WSDL). An XML-based language used to describe the nature of SOAP web services.

Simple Object Access Protocol

(SOAP). The dominant messaging protocol for defining and invoking web services.

OWL

A dyslexic acronym for Web Ontology Language. It is an XML-based language used to describe ontologies. A variant of OWL called OWL Description Logics (OWL DL) is particularly suited for creating semantic webs of ontologies that can be traversed by reasoning engines.

Representational State Transfer

(REST). An alternative web services protocol that is sometimes more suitable than SOAP for particular web services.

Semantic web

An interrelated network of ontologies that together describe resources available on the web.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stein, L. Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nat Rev Genet 9, 678–688 (2008). https://doi.org/10.1038/nrg2414

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg2414

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing