Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Opinion
  • Published:

e-Science: relieving bottlenecks in large-scale genome analyses

Abstract

The development of affordable, high-throughput sequencing technology has led to a flood of publicly available bacterial genome-sequence data. The availability of multiple genome sequences presents both an opportunity and a challenge for microbiologists, and new computational approaches are needed to extract the knowledge that is required to address specific biological problems and to analyse genomic data. The field of e-Science is maturing, and Grid-based technologies can help address this challenge.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Publicly available bacterial genomes.
Figure 2: Analysis of the Bacillus subtilis secretome as an in silico experiment.
Figure 3: A heat map that shows the prevalence of secreted protein families across 12 Bacillus species.

Similar content being viewed by others

References

  1. Luciano, J. S. & Stevens, R. D. e-Science and biological pathway semantics. BMC Bioinformatics 8, S3 (2007).

    Article  Google Scholar 

  2. de Roure, D., Goble, C. & Stevens, R. in Proc. 2007 IEEE Conf. eScience Grid Comput. 603–610 (2007).

  3. Foster, I., Kesselman, C. & Tuecke, S. The anatomy of the Grid: enabling scalable virtual organizations. Int. J. High Perform. Comput. Appl. 15, 200–222 (2001).

    Article  Google Scholar 

  4. Foster, I. & Kesselman, C. Globus: a metacomputing infrastructure toolkit. Int. J. High Perform. Comput. Appl. 11, 115–128 (1997).

    Google Scholar 

  5. Thain, D., Tannenbaum, T. & Livny, M. in Grid Computing (eds Berman, F., Fox, G. & Hey, T.) 299–335 (2003).

    Book  Google Scholar 

  6. Stajich, J. E. et al. The Bioperl toolkit: perl modules for the life sciences. Genome Res. 12, 1611–1618 (2002).

    Article  CAS  Google Scholar 

  7. Chapman, B. & Chang, J. Biopython: Python tools for computational biology. ACM SIGBIO Newsl. 20, 15–19 (2000).

    Article  Google Scholar 

  8. Pocock, M., Down, T. & Hubbard, T. BioJava: open source components for bioinformatics. ACM SIGBIO Newsl. 20, 10–12 (2000).

    Article  Google Scholar 

  9. Oinn, T. et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20, 3045–3054 (2004).

    Article  CAS  Google Scholar 

  10. Stevens, R. D., Robinson, A. J. & Goble, C. A. myGrid: personalised bioinformatics on the information grid. Bioinformatics 19, I302–I304 (2003).

    Article  Google Scholar 

  11. Curbera, F. et al. Unraveling the web services web: an introduction to SOAP, WSDL, and UDDI. IEEE Internet Comput. 6, 86–93 (2002).

    Article  Google Scholar 

  12. Khare, R. & Taylor, R. N. in Proc. 26th Int. Conf. Software Eng. (ed. Taylor, R. N.) 428–437 (2004).

  13. Wilkinson, M. D. & Links, M. BioMOBY: an open source biological web services proposal. Brief. Bioinformatics 3, 331–341 (2002).

    Article  Google Scholar 

  14. Foster, I., Kesselman, C., Nick, J. M. & Tuecke, S. Grid services for distributed system integration. Computer 35, 37–46 (2002).

    Article  Google Scholar 

  15. Hull, D. et al. Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34, W729–W732 (2006).

    Article  CAS  Google Scholar 

  16. Pillai, S. et al. SOAP-based services provided by the European Bioinformatics Institute. Nucleic Acids Res. 33, W25–W28 (2005).

    Article  CAS  Google Scholar 

  17. Senger, M., Rice, P. & Oinn, T. in UK e-Science All Hands Meet. 2003 (ed. Cox, S. J.) 509–513 (2003).

    Google Scholar 

  18. Majithia, S., Shields, M., Taylor, I. & Wang, I. in Proc. IEEE Intern. Conf. Web Services (ed. Shields, M.) 514–521 (2004).

    Google Scholar 

  19. Castro, A. G., Thoraval, S., Garcia, L. J. & Ragan, M. A. Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator. BMC Bioinform. 6, 87 (2005).

    Article  Google Scholar 

  20. Ludäscher, B. et al. Scientific workflow management and the kepler system. Concurr. Comput. Pract. Exper. 18, 1039–1065 (2006).

    Article  Google Scholar 

  21. Stevens, R. et al. myGrid and the drug discovery process. Drug Discov. Today 2, 140–148 (2004).

    Article  CAS  Google Scholar 

  22. Fisher, P. et al. A systematic strategy for large-scale analysis of genotype phenotype correlations: identification of candidate genes involved in African trypanosomiasis. Nucleic Acids Res. 35, 5625–5633 (2007).

    Article  CAS  Google Scholar 

  23. Agostini, F. P., Soares-Pinto, D. O., Moret, M. A., Osthoff, C. & Pascutti, P. G. Generalized simulated annealing applied to protein folding studies. J. Comput. Chem. 11, 1142–1152 (2006).

    Article  Google Scholar 

  24. Craddock, T., Lord, P., Harwood, C. R. & Wipat, A. in Proc. 5th UK e-Science All Hands Meet. 788–795 (2006).

    Google Scholar 

  25. Harwood, C. R. & Cranenburgh, R. Bacillus protein secretion: an unfolding story. Trends Microbiol. 16, 73–79 (2008).

    Article  CAS  Google Scholar 

  26. Juncker, A. S. et al. Prediction of lipoprotein signal peptides in Gram-negative bacteria. Protein Sci. 12, 1652–1662 (2003).

    Article  CAS  Google Scholar 

  27. Bendtsen, J. D., Nielsen, H., von Heijne, G. & Brunak, S. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783–795 (2004).

    Article  Google Scholar 

  28. Sonnhammer, E. L. L., von Heijne, G. & Krogh, A. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 6, 175–182 (1998).

    CAS  PubMed  Google Scholar 

  29. Jones, D. T., Taylor, W. R. & Thornton, J. M. A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry 33, 3038–3049 (1994).

    Article  CAS  Google Scholar 

  30. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  Google Scholar 

  31. Enright, A. J., Kunin, V. & Ouzounis, C. A. Protein families and TRIBES in genome sequence space. Nucleic Acids Res. 31, 4632–4638 (2003).

    Article  CAS  Google Scholar 

  32. Sun, Y. et al. Exploring microbial genome sequences to identify protein families on the Grid. IEEE Trans. Inf. Technol. Biomed., 11 (2007).

  33. Sun, Y. et al. in 2005 IEEE International Symposium on Cluster Computing and the Grid 977–984 (2005).

    Google Scholar 

  34. Hedeler, C. et al. e-Fungi: a data resource for comparative analysis of fungal genomes. BMC Genomics 8, 426 (2007).

    Article  Google Scholar 

  35. Chaudhuri, R. R. et al. xBASE2: a comprehensive resource for comparative bacterial genomics. Nucleic Acids Res. 36, D543–D546 (2008).

    Article  CAS  Google Scholar 

  36. Sulakhe, D. et al. Gnare: automated system for high-throughput genome analysis with Grid computational backend. J. Clin. Monit. Comput. 19, 361–369 (2005).

    Article  Google Scholar 

  37. Sulakhe, D., Rodriguez, A., Wilde, M., Foster, I. A. & Maltsev, N. A. Interoperability of GADU in using heterogeneous grid resources for bioinformatics applications. IEEE Trans. Inf. Technol. Biomed. 12, 241–246 (2008).

    Article  Google Scholar 

  38. Seshadri, R., Kravitz, S. A., Smarr, L., Gilna, P. & Frazier, M. CAMERA: a community resource for metagenomics. PLoS Biol. 5, e75 (2007).

    Article  Google Scholar 

  39. Maltsev, N. A. et al. PUMA2-grid-based high-throughput analysis of genomes and metabolic pathways. Nucleic Acids Res. 34, D369–D372 (2006).

    Article  CAS  Google Scholar 

  40. Schulze-Kremer, S. Ontologies for molecular biology. in Proc. 3rd Pacific Symp. Biocomput., 693–704 (1998).

  41. Kohler, J. et al. Graph-based analysis and visualization of experimental results with ONDEX. Bioinformatics 22, 1383–1390 (2006).

    Article  CAS  Google Scholar 

  42. Papanikou, E., Karamanou, S. & Economou, A. Bacterial protein secretion through the translocase nanomachine. Nature Rev. Microbiol. 5, 839–851 (2007).

    Article  CAS  Google Scholar 

  43. Berks, B. C., Palmer, T. & Sargent, F. Protein targeting by the bacterial twin-arginine translocation (Tat) pathway. Curr. Opin. Microbiol. 8, 174–181 (2005).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors acknowledge funding from the UK Engineering and Physical Sciences Research Council and Non-linear Dynamics for a CASE (collaborative awards in science and engineering) studentship to T.C., from Research Councils UK for a fellowship to J.H. and from the European Union (Bacell Health; grant number LSH-2002-1.1.0-1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Colin R. Harwood.

Related links

Related links

DATABASES

Entrez Genome Project

Staphylococcus aureus

FURTHER INFORMATION

Anil Wipat's homepage

Colin R. Harwood's homepage

Jennifer Hallinan's homepage

BioCatalogue

BioMart

BioMoby

EMBL–EBI (Genome Pages — Bacteria)

Ensembl

Facebook

GenBank

Gene Ontology

KEGG bacterial genomes

Microbase

myExperiment

myGrid project

NCBI Entrez Genome

NCBI Entrez Utilities Web Service

OMIM

ONDEX

RefSeq

Taverna

Triana

WABI

Rights and permissions

Reprints and permissions

About this article

Cite this article

Craddock, T., Harwood, C., Hallinan, J. et al. e-Science: relieving bottlenecks in large-scale genome analyses. Nat Rev Microbiol 6, 948–954 (2008). https://doi.org/10.1038/nrmicro2031

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrmicro2031

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing