Key Points
-
A biological cyberinfrastructure facilitates online collaboration, data sharing and algorithm sharing in an increasingly information-driven field.
-
A fully developed cyberinfrastructure system consists of network-accessible data repositories, computational and analysis services, and an online communication and collaboration system.
-
The explosive growth of genomic information is forcing the field to move from traditional centralized databases and static paper publications to distributed information resources and interactive online publications.
-
Integration of diverse resources remains a challenge at the technical level owing to the competing demands of interoperability and the need for flexibility and diversity.
-
The most promising cyberinfrastructure systems combine a flexible, semantically driven framework for sharing information with a strong social and community-building component.
-
The emerging biology cyberinfrastructure has the potential to be a great leveller, giving all researchers equal access to data and compute facilities regardless of their geographical location or data-handling abilities.
Abstract
Wiki pages and commenting Biology is an information-driven science. Large-scale data sets from genomics, physiology, population genetics and imaging are driving research at a dizzying rate. Simultaneously, interdisciplinary collaborations among experimental biologists, theorists, statisticians and computer scientists have become the key to making effective use of these data sets. However, too many biologists have trouble accessing and using these electronic data sets and tools effectively. A 'cyberinfrastructure' is a combination of databases, network protocols and computational services that brings people, information and computational tools together to perform science in this information-driven world. This article reviews the components of a biological cyberinfrastructure, discusses current and pending implementations, and notes the many challenges that lie ahead.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
caBIG Strategic Planning Workspace. The Cancer Biomedical Informatics Grid (caBIG): infrastructure and applications for a worldwide research community. Medinfo 12, 330–334 (2007).
Martone, M. E., Gupta, A. & Ellisman, M. H. E-neuroscience: challenges and triumphs in integrating distributed data from molecules to brains. Nature Neurosci. 7, 467–472 (2004).
Wheeler, D. L. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 36, D13–D21 (2008).
Flicek, P. et al. Ensembl 2008. Nucleic Acids Res. 36, D707–D714 (2008).
Kanehisa, M. et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480–D484 (2008).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Ilic, K. et al. The plant structure ontology, a unified vocabulary of anatomy and morphology of a flowering plant. Plant Physiol. 143, 587–599 (2007).
Fields, S., Song, O. A novel genetic system to detect protein–protein interactions. Nature 340, 245–246 (1989).
Maglott, D., Ostell, J., Pruitt, K. D. & Tatusova, T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 35, D26–D31 (2007).
UniProt Consortium. The universal protein resource (UniProt). Nucleic Acids Res. 36, D190–D195 (2008).
Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
King, D. C. et al. Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res. 15, 1051–1060 (2005).
Kent, W. J. BLAT — the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Dowell, R. D., Jokerst, R. M., Day, A., Eddy, S. R. & Stein, L. The distributed annotation system. BMC Bioinformatics 2, 7 (2001). This paper describes an early biological cyberinfrastructure system that uses a common syntactic protocol to exchange data about genome annotations, but it has the problem of weak semantics.
Stevens, R. D., Robinson, A. J. & Goble C. A. myGrid: personalised bioinformatics on the information grid. Bioinformatics 19 (Suppl 1), i302–i304 (2003).
Qiao, W., McLennan, M., Kennel, R., Ebert D. S., & Klimeck, G. Hub-based simulation and graphics hardware accelerated visualization for nanotechnology applications. IEEE Trans. Vis. Comput. Graph. 12, 1061–1068 (2006).
Stein, L. D. et al. The generic genome browser: a building block for a model organism system database. Genome Res. 12, 1599–1610 (2002).
Mungall, C. J., Emmert D. B. & FlyBase Consortium . A Chado case study: an ontology-based modular schema for representing genome-associated biological information. Bioinformatics 23, i337–i346 (2007). This paper describes a cyberinfrastructure approach built on a tightly coupled shared common-data model.
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Noy, N. F. et al. Protégé-2000: an open-source ontology-development and knowledge-acquisition environment. AMIA Annu. Symp. Proc. 2003, 953 (2003).
Reich, M. et al. GenePattern 2.0. Nature Genet. 38, 500–501 (2006).
Giardine, B. et al. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15, 1451–1455 (2005).
Sotomayor, B. & Childers, L. Globus Toolkit 4: Programming Java Services 1st edn (Morgan Kaufmann, San Fransisco, 2005).
Wilkinson, M. D. & Links, M. BioMOBY: an open source biological web services proposal. Brief Bioinform. 3, 331–341 (2002).
Wilkinson, M., Schoof, H., Ernst, R. & Haase, D. BioMOBY successfully integrates distributed heterogeneous bioinformatics Web Services. The PlaNet exemplar case. Plant Physiol. 138, 5–17 (2005). This paper describes a large-scale attempt to integrate multiple resources using web services.
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000). This is the foundational paper for the Gene Ontology, a system for describing the molecular function of genes in a way that allows gene-based resources to be integrated at the semantic level.
Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005).
Lacy, L. W. Owl: Representing Information Using the Web Ontology Language (Trafford Publishing, Victoria, Canada, 2005).
Oinn, T. et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20, 3045–3054 (2004). This paper describes Taverna, an exemplar platform for integrating bioinformatics workflows across loosely coupled sites and technologies that share common semantics.
Lord, P. et al. Applying semantic web services to bioinformatics: experiences gained, lessons learnt. International Semantic Web Conference 350–364 [online], (2004).
Buck, M. J. & Lieb, J. D. ChIP–chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics 83, 349–360 (2004).
Acknowledgements
I wish to thank the staff of myGrid, BIRN, caBIG, iPlant, EcoliHub and nanoHub for their assistance during the research phase of this Review. I would also like to thank the three anonymous reviewers who took the time to review this article in manuscript stage and to make comments and suggestions. This work was supported in part by a grant from the National Science Foundation Division of Emerging Frontiers (0735191).
Author information
Authors and Affiliations
Ethics declarations
Competing interests
In the interests of full disclosure, the author has been directly or indirectly involved in the following projects discussed in this article: DAS, GMOD, BioMOBY, SSWAP, caBIG and iPC.
Related links
Related links
FURTHER INFORMATION
Biomedical Informatics Research Network (BIRN)
Cancer Bioinformatics Grid (caBIG)
EU Framework Programme 7 (FP7)
Generic Model Organism Database (GMOD) project
Open Bioinformatics Ontologies (OBI)
Phenotype & Trait Ontology (PATO)
Simple Object Access Protocol (SOAP)
Simple Semantic Web Architecture and Protocol (SSWAP)
Glossary
- WIKI
-
A popular web page authoring system that allows individuals to collaborate on large communal documents. Wikipedia is the best known example, but there are many tens of thousands of WIKIs in use. The name comes from the Hawaiian word for quick.
- Ontology
-
An enumeration of the concepts used in a particular domain of knowledge, their definitions and the relationships between them.
- Web service
-
A web-based resource that can be programmatically invoked to perform a database search or a computation, or to provide some other service.
- Web Services Description Language
-
(WSDL). An XML-based language used to describe the nature of SOAP web services.
- Simple Object Access Protocol
-
(SOAP). The dominant messaging protocol for defining and invoking web services.
- OWL
-
A dyslexic acronym for Web Ontology Language. It is an XML-based language used to describe ontologies. A variant of OWL called OWL Description Logics (OWL DL) is particularly suited for creating semantic webs of ontologies that can be traversed by reasoning engines.
- Representational State Transfer
-
(REST). An alternative web services protocol that is sometimes more suitable than SOAP for particular web services.
- Semantic web
-
An interrelated network of ontologies that together describe resources available on the web.
Rights and permissions
About this article
Cite this article
Stein, L. Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nat Rev Genet 9, 678–688 (2008). https://doi.org/10.1038/nrg2414
Issue Date:
DOI: https://doi.org/10.1038/nrg2414
This article is cited by
-
Digital technology and the conservation of nature
Ambio (2015)
-
VisInfo: a digital library system for time series research data based on exploratory search—a user-centered design approach
International Journal on Digital Libraries (2015)
-
Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases
BMC Bioinformatics (2013)